Backup your Database in Git

Short version: dump your production database into a git repository for an instant backup solution.

Long version: keeping backups of production data is fundamental for a well-run web application, but it’s tricky to maintain history while keeping disk usage at a reasonable level. You could continually overwrite the backup with the latest data, but you risk automatically replacing good data with bad. You could save each version in a separate, timestamped file, but since most of the data is static, you would end up wasting a lot of disk space.

When you think about it, a database dump is just SQL code, so why not manage it the same way you manage the rest of your code — in a source code manager? Setting such a scheme up is dead simple. On your production server, with git installed:

mkdir -p /path/to/backup cd /path/to/backup mysqldump -u [user] -p[pass] --skip-extended-insert [database] > [database].sql git init git add [database].sql git commit -m "Initial commit" 

The --skip-extended-insert option tells mysqldump to give each table row its own insert statement. This creates a larger initial commit than the default bulk insert, but makes future commits much easier to read and (I suspect) keeps the overall repository size smaller, since each patch only includes the individual records added/updated/deleted.

From here, all we have to do is set up a cronjob to update the backup:

0 * * * * cd /path/to/backup && \ mysqldump -u [user] -p[pass] --skip-extended-insert [database] > [database].sql && \ git commit -am "Updating DB backup" 

You may want to add another entry to run git gc every day or so in order to keep disk space down and performance up.

Now that you have all of your data in a git repo, you’ve got a lot of options. Easily view activity on your site with git whatchanged -p. Update your staging server to the latest data with git clone ssh://[hostname]/path/to/backup. Add a remote on Github and get offsite backups with a simple git push.

This technique might fall down if your app approaches Craigslist-level traffic, but it’s working flawlessly for us on SpeakerRate, and should work well for your small- to medium-sized web application.

David is Viget's managing development director. From our Durham, NC, office, he builds high-quality, forward-thinking software for PUMA, the World Wildlife Fund,, and many others.

More posts by David