brief / dbase backup bonanza

Written by
Walter Doekes
Published on 2013-04-15

Of course you do daily backups of your data. For your databases, you generally need a dump of the data for it to be useful. For your and my reference, here are a few database dump scripts. Make sure /var/backups is backed up daily.

Observe that we keep extra backups around. I’ve found that the need to access an old database is far more common than accessing old files (from the backup storage). Keeping them close means quick access in time of need.

You can choose to rotate the backups for every backup run, or you can use weekday-names to keep older versions. They both have their advantages. Numeric rotation makes it easier to add an extra backup just before you attempt some operation. The weekday-name method makes it easier on the file backups: the backup rsync(1) won’t need to fetch the entire backup directory, but only the latest backup.

If 7 backups are too much for you, you can always add this in the mix:\

rm -f /somepath/$db.`LC_ALL=C date +%a -d '-3 days' | tr '[:upper:]' '[:lower:]'`.bz2

posgreSQL

Run as postgres user from your crontab. Here we prefer using /etc/crontab over the crontab(1) program because it is more likely that /etc is in your backups than /var/spool/cron.

Create /var/backups/postgres with a 0700 file mode and postgres ownership.

#!/bin/sh
# (OSSO/2013)
day=`LC_ALL=C date +%a | tr '[:upper:]' '[:lower:]'`
for db in `psql -ltAF, | sed -e '/,/!d;s/,.*//;/^template0$/d'`; do
    pg_dump $db >/var/backups/postgres/$db.$day.sql &&
      rm -f /var/backups/postgres/$db.$day.sql.bz2 &&
      bzip2 /var/backups/postgres/$db.$day.sql
done

And did I mention that the postgres data dump structure is so much more convenient than MySQLs need for --skip-extended-insert when running a diff of a dump?

MySQL

See the remarks in the file about user permissions. If you’re not running Debian or a derivative (Ubuntu), you’ll need to tweak the $mysqlargs as well.

#!/bin/sh
# (OSSO/2013)
# We assume that the debian.cnf user has enough permissions.  If it doesn't,
# you may need to add permissions as appropriate.  The user needs read access
# to mysql.proc to be able to dump functions and stored procedures.
mysqlargs=--defaults-file=/etc/mysql/debian.cnf
dstpath=/var/backups/mysql

day=`LC_ALL=C date +%a | tr '[:upper:]' '[:lower:]'`
mkdir -p "$dstpath"
chmod 700 "$dstpath"

# If you have a small database, you can consider adding --skip-extended-insert
# to the mysqldump.  If you're running a master/slave setup, you can add
# --master-data or --dump-slave to get "CHANGE MASTER TO" statements added.
mysql $mysqlargs -eSHOW\ DATABASES |
  sed '-e1d;/^information_schema$/d;/^performance_schema$/d' |
  while read db; do
    mysqldump $mysqlargs --ignore-table=mysql.event --quick --routines $db \
        >"$dstpath/$db.$day.sql" &&
      rm -f "$dstpath/$db.$day.sql.bz2" &&
      bzip2 "$dstpath/$db.$day.sql"
done

Update 2013-04-22

Added --ignore-table=mysql.event because Percona mysqldump wanted me to be explicit about dumping the events. You could add –event if you want your events backed up. See CREATE EVENT syntax for more information about this cron-like functionality.

MongoDB

#!/bin/sh
# vim: set ts=8 sw=4 sts=4 et ai tw=79:
# (OSSO/2013-2016)
dstpath=/var/backups/mongo
mongoargs=  # e.g. "--host 10.42.40.80"

mkdir -p "$dstpath"
chmod 700 "$dstpath"
export LC_ALL=C  # use common English days

all_dbs() {
    out=`echo 'show dbs' | mongo $mongoargs --quiet`
    if test $? -eq 0; then
        echo "$out" | sed -rne 's/^([^[:blank:]]+)[[:blank:]].*/\1/p' |
            grep -vE '^(exclude_this_db|and_this_db|twiible_harm_celery)$'
    else
        echo "Could not run 'show dbs' on mongo server" >&2
        exit 1
    fi
}

dbs_to_dump=`all_dbs` || exit 1


dump_db() {
    db="$1"
    tmpdst="$dstpath/$db"  # use a temp dir near the destination

    # Make room for the backups and the backup log.
    if ! mkdir "$tmpdst" 2>/dev/null; then
        echo >&2
        echo "Removing old backup dir ${tmpdst}... " >&2
        echo "Did the previous backup fail?" >&2
        echo >&2
        rm -rf "$tmpdst"
        mkdir "$tmpdst"
    fi

    # Dump, with stderr to templog, because it also outputs
    # stuff when there is success and we don't want cron mails on
    # success.
    temp=`mktemp`
    mongodump $mongoargs -o "$tmpdst" -d "$db" >"$tmpdst/dump.log" 2>"$temp"
    if test $? -ne 0; then
        cat "$temp" >&2
        rm "$temp"
        return 1
    fi
    rm "$temp"

    # This machine is backupped daily, we rotate by altering the
    # filename.
    today=`LC_ALL=C date +%a | tr '[[:upper:]]' '[[:lower:]]'`
    yesterday=`LC_ALL=C date +%a -d '-1 days' | tr '[[:upper:]]' '[[:lower:]]'`

    # Do we need to zip todays/yesterdays backups?
    # (There is only yesterday, unless this is the second time we run today.)
    for day in $yesterday $today; do
        if test -d "$dstpath/${db}.$day"; then
            rm -f "$dstpath/${db}.${day}.tar.gz" || return 1
            tar zcf "$dstpath/${db}.${day}.tar.gz" -C "$dstpath" \
                    "${db}.$day" || return 1
            rm -rf "$dstpath/${db}.$day" || return 1
        fi
    done

    # We leave our backups unzipped. Tomorrow they'll get zipped.
    mv "$dstpath/${db}" "$dstpath/${db}.$today" || return 1
}

ret=0
for db in $dbs_to_dump; do
    if ! dump_db $db; then
        echo >&2
        echo "Dump of $db failed..." >&2
        echo >&2
        ret=1
    fi
done

exit $ret

Update 2015-01-29

Updated the mongo dump to take an optional “–host” parameter and to dump per DB so we can exclude certain DBs.

Update 2016-04-12

Updated it to cope with the additional output that mongodump produces. Quiet is too quiet, but we don’t want to spam succesful dumps. Also fixed set -e which doesn’t work inside a function.

Elasticsearch

Ok, this backup script is a bit bigger and it will cause a bit of downtime during the dump. In our case, we can live with the 2 minute downtime.

Restoring this dump goes something like this:

Block access to your elastic server (using a firewall?)
PUT the metadata:
curl -XPUT "http://localhost:9200/$INDEX/" -d "`cat mappost_$INDEX`"
Replace the old data dir, and restart elastic.

#!/bin/bash
# (OSSO/2013)
# Ideas from: https://gist.github.com/nherment/1939828/raw/46ddf90fdab1b9749327743382a080344a1949a7/backup.sh
# See that script for how to restore the backup as well.
#
# TIP! Restart the graylog server after running this! It may have choked on
# the missing elastic.

INDEX_ROOT_DIR="/var/lib/elasticsearch/YOUR_NAME_HERE/nodes/0/indices"
BACKUPDIR="/var/backups/elastic"
BACKUPDIR_TMP="$BACKUPDIR/tmp"

if ! /etc/init.d/elasticsearch status >/dev/null; then
    echo 'Elasticsearch is not running?'
    exit 1
fi

mkdir -p "$BACKUPDIR"
chmod 700 "$BACKUPDIR"
rm -rf "$BACKUPDIR_TMP"  # make sure we have a clean dir
mkdir -p "$BACKUPDIR_TMP"

INDICES=`ls "$INDEX_ROOT_DIR"`
touch "$BACKUPDIR/begin.stamp"

# We need to stop elasticsearch to get a clean dump of the data.
# To reduce the inconvenience of the downtime, we'll stop and
# start the server for each index.
for index in $INDICES; do
    # Backup the index.. this should be lightning fast. This
    # metadata is required by ES to use the index file data.
    for i in `seq 10`; do
      # Retry curl a few times. ES may take a while to start up.
      if curl --silent -XGET -o "$BACKUPDIR_TMP/mapping_$index" \
            "http://localhost:9200/$index/_mapping?pretty=true"; then
          break
      fi
      sleep $i
    done
    if ! test -f "$BACKUPDIR_TMP/mapping_$index"; then
        echo "Failed to get mapping_$index from ES.. aborting."
        exit 1
    fi

    # Drop first two lines of metadata.
    sed -i -e '1,2d' "$BACKUPDIR_TMP/mapping_$index"
    # Prepend hardcoded settings.
    echo '{"settings":{"number_of_shards":5,"number_of_replicas":1},"mappings":{' \
      >>"$BACKUPDIR_TMP/mappost_$index"
    cat "$BACKUPDIR_TMP/mapping_$index" >>"$BACKUPDIR_TMP/mappost_$index"

    # Stop elasticsearch so we can create a dump. Hope that no one
    # has touched the metadata between our metadata fetch and this
    # stop.
    #date; echo 'elastic is down'
    /etc/init.d/elasticsearch stop >/dev/null
    rm -f "$BACKUPDIR/$index.tar"  # remove old (should not exist)

    # Now lets tar up our data files. these are huge, but we're not
    # going to be nice, our elastic is down after all.. we zip it
    # right away, assuming that less disk writes is faster than less
    # cpu usage.
    # (Only tar took 3m42 on 3GB of data.)
    # (Tar with gz took 4m01.)
    # (Gzip with --fast took about 1m40.)
    tar c -C "$INDEX_ROOT_DIR" "$index" \
          -C "$BACKUPDIR_TMP" "mapping_$index" "mappost_$index" |
      gzip --fast >"$BACKUPDIR/$index.tar.gz.temp"
    success=$?

    # Done? Start elastic again.
    /etc/init.d/elasticsearch start >/dev/null
    #date; echo 'elastic is up'

    if test $success != 0; then
        echo
        echo 'Something went wrong.. aborting.'
        exit 1
    fi

    # Move the data.
    rm -f "$BACKUPDIR/$index.tar.gz"
    mv "$BACKUPDIR/$index.tar.gz.temp" "$BACKUPDIR/$index.tar.gz"
done

# Clean up.
rm -rf "$BACKUPDIR_TMP"
touch "$BACKUPDIR/end.stamp"

That’s all for today. More some other time.

Update 2015-04-16

Added LC_ALL=C to all `date` calls above. You want the names to be equal even if you run the job from the console.

brief / dbase backup bonanza