Backing up your database
Let me start by saying that there are MANY ways to backup databases and this might not be the absolute best way, but it definitely works. I’m backing up my PostgreSQL database that happens to be populated by Django, but you can muck with things as you see fit.
So, one of the decisions I had to make is whether to save a representation of the data or to just dump the database data itself. The first option would be something like this:
python manage.py dumpdata --format=yaml
Second option is to talk directly to the database. The benefit of this option is you get everything about the database in its current setup. Let’s say you made some changes to the schema directly within PostgreSQL. If you dumped the data into YAML or JSON you wouldn’t get those changes to the database itself. So it’s probably prefereable to go straight to the DB like so:
pg_dump <DATABASE NAME>
What I ended up with is a nightly backup on the database server. Then I pull that data from onto a second server using rsync.
Steps it took:
1. Setup e-mail sending, so that I get cron e-mails.
$ sudo apt-get install mailutils $ sudo apt-get install exim4-config $ sudo dpkg-reconfigure exim4-config
Test sending of mail from server
$ echo Test | mail -s Test <YOUR E-MAIL>
E-mail sending log lives here:
/var/log/exim4/mainlog
2. Setting up cron job on server to stash database schema and data. What I did is ran the cron job as the postgres user so that it would have access to the database.
$ su postgres $ crontab -e MAILTO="<YOUR E-MAIL>" 0 1 * * * pg_dump > "<PATH TO BACKUPS>/$(date +\%Y-%m-%d).psql"
3. Setting up pulling data onto another machine; setup e-mail just as you did on server machine
$ crontab -e MAILTO="<YOUR E-MAIL>" 0 3 * * * rsync -vr --size-only<USERNAME>@<SERVER>:<PATH TO BACKUPS ON SERVER> <PATH TO BACKUPS ON SECOND MACHINE>
Why not push the database onto the remote server right after the backup is created, or even use pg_dump from the remote server. directly.
I want there to be as little points for failure as possible with the backups. The absolute simplest thing to do is to run the pg_dump locally. There most likely won’t be any issues that way. That’s why I don’t want to run the pg_dump remotely. And I’m not pushing to the remote machine, because the remote machine might move (the IP would change), but the server won’t.