pruning old data / mysql / csv
It is not uncommon to have a database with records that just accumulate and accumulate over time. Think of log files, telephony billing records, traffic data and so forth.
The chances that you’ll ever need this data again are very slim. And letting your database grow indefinitely is not particularly smart. Time to prune!
Two things you need to worry about while pruning your data:
- Throwing it all away without a backup doesn’t feel right. But moving it to a different database is overkill. Store them as CSV. It’s easy to read, easy to parse and therefore easy to reload in case you feel you went a bit far with the pruning.
- Dumping and deleting data on a live database can be disastrous for running processes. The locking can create such severe delays that the users of the database time out. Dreadful for business.
This MySQL dump-to-CSV script with pruning (view) capability takes care of these two issues for you. It dumps the tables you specify to CSV and prunes them afterwards, all up until the date you specify and keeping locking to a minimum by fetching and deleting only small bits at a time.
See this example:
$ ./mysql2csv.py --prune localhost root voip_db asterisk_cdr calldate 2010-01-01 MySQL server settings: root@localhost/voip_db Table to dump: asterisk_cdr (ordered by calldate, from begin to 2010-01-01) Filename template: "voip_db.asterisk_cdr.%Y%m.csv" Enter your MySQL password to begin dumping AND PRUNING!!! Password: Processing voip_db.asterisk_cdr (ordered by calldate) ++++++++ ...pruning
Afterwards you’ll be left with
asterisk_cdr records younger than (or
equal to) 2010-01-01 00:00:00 and you’ll get pretty .csv files for the
deleted records in return.
$ ls voip_db.asterisk_cdr.200905.csv voip_db.asterisk_cdr.200906.csv voip_db.asterisk_cdr.200907.csv voip_db.asterisk_cdr.200908.csv voip_db.asterisk_cdr.200909.csv voip_db.asterisk_cdr.200910.csv voip_db.asterisk_cdr.200911.csv voip_db.asterisk_cdr.200912.csv mysql2csv.py $ head -n2 voip_db.asterisk_cdr.200906.csv id,calldate,clid,src,dst,dcontext,channel,dstchannel,lastapp,lastdata,duration,billsec,disposition,amaflags,accountcode,uniqueid,userfield 23831,"2009-06-01 07:12:25","""31900555112"" <31900555112>","31900555112","31333311199","call","SIP/22.214.171.124-c3c421ff","Local/0123456789@call_out-13fb,1","Dial","Local/0123456789@call_out/n|50|g",62,20,8,3,"","voip.example.com-1243826945.5",""
You may want to import these again too.
mysqlimport(1) will do this
for you. If the following conditions are met:
- You need
GRANT FILE ON *.* TO YOURNAME@'%';
- The file needs to be named as the table. E.g.
/path/to/asterisk_cdr.csvif the table is
- The CSV file needs to be on the filesystem of the MySQL daemon.
$ mysqlimport --ignore-lines=1 \ --fields-terminated-by=, --fields-optionally-enclosed-by='"' \ -h HOST -u NAME -p DATABASE /path/to/asterisk_cdr.csv