Replication and Data Integrity

Last week, Baron pointed out that semi-synchronous replication is not synchronous. I learned a lot reading that post, but I was surprised it was used to pimp the Percona cluster, with no comparison to MySQL’s own cluster solution — that would be a much more fair comparison. There is one critical point Baron did not make, though….

whether it’s semi-synchronous replication or regular asynchronous replication, there is no guarantee of data integrity. I saw this over and over when I was consulting. Just because replication is not failing does *not* mean that the data on the master and slave are in sync.

There is no form of replication that verifies data integrity. You can check if the data on the slave is in sync with the data on the master with pt-table-checksum and pt-table-sync, from the Percona toolkit. I use those tools widely.

I have not yet started using the new version, which boasts just working out of the box — right now there are many options I use, in addition to the ones listed in the blog post that even I reference to this day I have also started using –chunk-size-limit, to avoid pt-table-checksum skipping chunks that are just a bit too large.

I am excited about the rewrite of the tools and have it in my plan to use them. I hope they will save me a lot of time.

In my mind, checksumming is as critical as backups (in fact, if you backup from a slave, you must verify that the slave is in sync with the master and has no data integrity issues). It is not optional. Hopefully you already know that replication does not verify data integrity, but if you did not know, now you know and you also know how to check for that.

What I do is have the checksum run with the modulo/offset feature every so often on the master, and within a week or 2 of a data discrepancy happening, I find it, using a daily monitoring check to see if the checksum table on the slave has matching values for the data on the master and the slave.