This article is somewhat long. Interestingly, it does not actually cover my entire talk, as there is much to talk about besides the mechanics of each backup option. I wonder what I’d need to do to make this into a white paper or an article?
The backup presentation was finished last night. I may decide to go back and put some extra stuff in there, but that would be syntax and code and stuff. The logic is all in there, and the notes have been printed. I will post the slides (in .pdf and .swf (flash, the file is very small that way) formats) after the talk on Monday, as I may yet revise them.
I am very excited about one slide in particular, and I’ll share it here. It’s really a slide that I end with, but I feel as though it’s a great starting point as well as a summary point. I haven’t seen this information encapsulated this way before, so here goes:
|Method||No Locking||DDL||Snapshot||Remote||Free||All Engines||All Tables||Text File||Recover
|SELECT . . .
|OS level copy||No||Yes||No||No||Yes||No||Yes||No||Yes||5|
The table is set up so the “No” answers are ‘bad’ and the “Yes” answers are ‘good’. The last column contains the # of “No”s in the row. Right off the bat, we see why mysqldump is probably the most used backup tool — it has the fewest “No”s.
“No locking” — A backup routine should interfere as little as possible with the actual database. If you lock a row or table, even for just reading, you’re blocking others from writing it. Transactional engines like InnoDB and BDB offer transactions to help get around this, but many databases have a mixture of transactional and non-transactional storage engines.
“DDL” — Backups should be able to completely restore a database. If you do not have the Data Definition Language (ie, CREATE TABLE statements), you will not be able to completely restore a database.
“Snapshot” — To use a backup for point-in-time recovery or to build a new replication slave, a snapshot is needed. It is possible to do a point-in-time recovery without a snapshot, but it involves checking the binary logs for possible duplicate statements.
“Remote” — Does the backup need to be run on the OS where the server lies? Due to security in companies or the use of an ISP, it is not always possible for the DBA to have the level of access needed to run programs on the database server.
“Free” — Self-explanatory. Note that the backup options that are free come packaged with MySQL, with the exception of OS-level copy, which is in all OS systems (ignoring embedded systems for now).
“All Engines” — Does the backup option deal with all storage engines? Some tools or commands are engine-specific.
“All Tables” — Can the tool easily backup all the tables without external looping code?
“Text file” — A text file backup is advantageous for a two reasons: Corruption can be easily detected, partial backups (ie, of one table) can be done, and SQL can be standardized for migration. Text files are usually larger than data files, but compression of text is excellent.
“Recover Corruption” — A “Yes” here means the backup option can be used to recover from corruption.
A brief discussion about the options:
SELECT . . . INTO OUTFILE is not used often, because it does not copy DDL. Code is needed to loop through tables, and there will be table-level locking for MyISAM tables. If the SELECT statement is put into a transaction, there will not be a problem for BDB and InnoDB tables.
In MySQL 5.0 it is possible to copy the information about a table from the INFORMATION_SCHEMA database by using SELECT . . .
INTO OUTFILE. I have not tested if inserting data into the INFORMATION_SCHEMA tables is an equal and complete substitution for DDL. Instinct says “no”.
It’s my opinion that mysqldump is the most widely used backup tool. It has the fewest “No”s of all of them, and the features it does not have can be worked around.
Note that mysqldump does do DDL by default, but it’s easy to get an incomplete DDL statement. If you turn off –opt, and don’t put –create-options, you will end up with CREATE TABLE statements that are standard SQL. This may sound good, but remember that storage engines are MySQL specific, so your CREATE TABLE statements will end up using the default storage engine, which may not be what you want upon restoration.
mysqldump can do a snapshot if the option to lock all the tables is used. However, it is usually not feasible to lock every database on the server while the backup is running, because a backup can take minutes.
Replication is also widely used, but it requires another instance. And to reduce single points of failure, that means more hardware needs to be bought. It may be difficult to justify hardware simply to do nothing unless a backup is needed.
One of the features of replication is also a detriment. The standard way to set up a replication slave is to take a snapshot of the master, import it to the slave, and then use binary logs from there. Another way is to use LOAD DATA FROM MASTER, but that only works for MyISAM tables. It is possible to alter tables and data on the slave before or during replication, without replication failing. This is a handy feature for having a write-only master with InnoDB tables and a read-only slave with MyISAM tables (for example, with fulltext searching). However, this also means that the DDL backup is not necessarily to be trusted.
That mistrust can be extended to the data for the same reasons. Replication may not be a reliable backup if DML is being run on the slave server. As well, if the master becomes corrupt, it is likely that corruption will spread to the replication server.
The two most widely-used storage engines, MyISAM and InnoDB, are OS independent (explanation in the MySQL manual for MyISAM and InnoDB). This means that they can simply be copied on the OS-level. The biggest gotcha is that in order to get a snapshot, the entire database has to be locked. And unlike mysqldump, there is no option to an OS-level copy that will lock the tables. Using this is bulkier than mysqldump , and it cannot be used for all storage engines as mysqldump can.
Two solutions have been devised to combat those problems — mysqlhotcopy and InnoDB Hot Backup. mysqlhotcopy is for MyISAM tables while InnoDB Hot Backup is for InnoDB tables. The other difference is that the InnoDB Hot Backup is not free. However, InnoDB Hot Backup does offer a free perl script that will use InnoDB Hot Backup to take a snapshot of the InnoDB tables, and then use MySQL commands to get a snapshot of the other engine types.
Most people will choose a combination of backup methods — many use both replication and mysqldump, or replication and mysqlhotcopy and InnoDB Hot Backup. There is no clear-cut answer, and mostly it depends on your environment and what the backup will be used for.
Ideally there would be one tool that performed all of the functions listed in the chart. The chart is not complete, either — just a listing of my opinion of the most common desired features. There are other desired features not listed, for example replication is an “immediate hot backup” whereas all the other tools require manually importing or starting the MySQL server with new data files, etc. I am definitely interested in what folks have to say on the subject.