pruning old data / mysql / csv

pruning old data / mysql / csv

  • Written by
    Walter Doekes
  • Published on

It is not uncommon to have a database with records that just accumulate and accumulate over time. Think of log files, telephony billing records, traffic data and so forth.

The chances that you’ll ever need this data again are very slim. And letting your database grow indefinitely is not particularly smart. Time to prune!

Two things you need to worry about while pruning your data:

  • Throwing it all away without a backup doesn’t feel right. But moving it to a different database is overkill. Store them as CSV. It’s easy to read, easy to parse and therefore easy to reload in case you feel you went a bit far with the pruning.
  • Dumping and deleting data on a live database can be disastrous for running processes. The locking can create such severe delays that the users of the database time out. Dreadful for business.

This MySQL dump-to-CSV script with pruning (view) capability takes care of these two issues for you. It dumps the tables you specify to CSV and prunes them afterwards, all up until the date you specify and keeping locking to a minimum by fetching and deleting only small bits at a time.

See this example:

$ ./mysql2csv.py --prune localhost root voip_db asterisk_cdr calldate 2010-01-01
MySQL server settings: root@localhost/voip_db
Table to dump: asterisk_cdr (ordered by calldate, from begin to 2010-01-01)
Filename template: "voip_db.asterisk_cdr.%Y%m.csv"
Enter your MySQL password to begin dumping AND PRUNING!!!
Password:
Processing voip_db.asterisk_cdr (ordered by calldate)
++++++++ ...pruning

Afterwards you’ll be left with asterisk_cdr records younger than (or equal to) 2010-01-01 00:00:00 and you’ll get pretty .csv files for the deleted records in return.

$ ls
voip_db.asterisk_cdr.200905.csv
voip_db.asterisk_cdr.200906.csv
voip_db.asterisk_cdr.200907.csv
voip_db.asterisk_cdr.200908.csv
voip_db.asterisk_cdr.200909.csv
voip_db.asterisk_cdr.200910.csv
voip_db.asterisk_cdr.200911.csv
voip_db.asterisk_cdr.200912.csv
mysql2csv.py
$ head -n2 voip_db.asterisk_cdr.200906.csv
id,calldate,clid,src,dst,dcontext,channel,dstchannel,lastapp,lastdata,duration,billsec,disposition,amaflags,accountcode,uniqueid,userfield
23831,"2009-06-01 07:12:25","""31900555112"" <31900555112>","31900555112","31333311199","call","SIP/22.22.22.22-c3c421ff","Local/0123456789@call_out-13fb,1","Dial","Local/0123456789@call_out/n|50|g",62,20,8,3,"","voip.example.com-1243826945.5",""

Update 2012-02-03

You may want to import these again too. mysqlimport(1) will do this for you. If the following conditions are met:

  • You need File_priv privileges: GRANT FILE ON *.* TO YOURNAME@'%';
  • The file needs to be named as the table. E.g. /path/to/asterisk_cdr.csv if the table is asterisk_cdr.
  • The CSV file needs to be on the filesystem of the MySQL daemon.

Run this:

$ mysqlimport --ignore-lines=1 \
    --fields-terminated-by=, --fields-optionally-enclosed-by='"' \
    -h HOST -u NAME -p DATABASE /path/to/asterisk_cdr.csv

Back to overview Newer post: asterisk / nat keepalive / round robin dns Older post: port forwarded ssh / port 22