Online Verification That Master and Slaves are in Sync

In October 2008, Baron posted How to Check MySQL Replication Integrity Continually. Here at Pythian we have developed a method based on that post, and added “verifying that masters and slaves are in sync” to our standard battery of tests.

We call it “Continual replication sync checking”. This article will explain how it works, how to test and make the procedure non-blocking, benchmarks from the real world, issues we encountered along the way, and finally Pythian’s procedure of setting up continual replication sync in a new environment.

At the 2010 MySQL User Conference & Expo, my co-worker Danil Zburivsky did a presentation about this, and you can get the slides in ODP (Open Office) format or watch the 46-minute video on YouTube.

How it works

On the master, mk-table-checksum is run. In order to make the checksum operation online, we use the modulo and offset features of mk-table-checksum to checksum only part of the data at a time. The checksum is run (from cron) on the master and replicates to the slave. The results are captured in a result table, and a separate process checks the result table and notifies us of any discrepancies.

Testing resource usage for non-blocking replication sync checking

Each environment has a different data size, different resources, and a different level of what “intrusive” means. We start with a baseline of the size of the database, which we get from:

SELECT SUM(INDEX_LENGTH+DATA_LENGTH)/1024/1024/1024 as sizeGb
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE='BASE TABLE';

I included INDEX_LENGTH because the queries that generate the checksum information may or may not depend on indexes. Note that for InnoDB tables, the metadata is approximate, so this will not necessarily be the exact size of the database, but it will be in the right range. In general, running this query in the INFORMATION_SCHEMA did not consume too many resources, though in many cases the query took a few minutes to complete. On databases with several thousand tables, this query can take hours to complete.

Of course there is the caveat that it may take a VERY long time to run this in your environment, and some folks have reported crashing (this is VERY rare, I personally have not seen it even with the high demand client we have had, but others have reported it). So if you prefer to be on the safe side, you could look at the size of your ibdata files (whether you are innodb_file_per_table or not) plus the sizes of all the tables on disk (MyISAM, CSV, etc). That will give you a sense of how big the data is, although the size of the files on disk also includes any fragmentation. In practice, I have seen fragmentation reduce the size of a large database by 40%, so beware that neither of these methods is perfect.

Once we get the data size in gigabytes, we can come up with a modulo to test. In general, we have found that mk-table-checksum is quick enough that it is non-blocking when we use a modulo value that corresponds to approximately 1 Gb of data, and a chunk-size of 100. We actually started out using a checksum of 1000, but locking 1000 rows to test a checksum proved to be longer than was acceptable for clients with lots of transactions — the problem was not the locking on the master, the problem was that it created a slave lag on the slaves. We tried using the --sleep option to mk-table-checksum, which worked, but changing the chunksize to be smaller caused less slave lag than adding a sleep parameter.

Another issue when using chunk-size is that it requires a numeric (but non-decimal) index. We ran across clients using VARCHAR(32) as a PRIMARY key (yes, on InnoDB, and yes, performance was an issue there), or tables that did not have any indexes at all (such as a logging table). If mk-table-checksum does not find a suitable index it will just do a checksum on the whole table.

To find tables that would be problematic, here is an INFORMATION_SCHEMA query that can be run (again, the usual caveats about INFORMATION_SCHEMA apply):

SELECT CONCAT("SHOW CREATE TABLE ", table_schema,".",table_name,"\\G")
FROM TABLES
LEFT JOIN (
     SELECT distinct table_schema,table_name
     FROM STATISTICS INNER JOIN COLUMNS USING
       (table_schema,table_name,column_name)
     WHERE (data_type like '%int' or data_type like 'date%'
     or data_type like 'time%' or data_type='float') and seq_in_index=1
     ) AS good_for_checksum
USING (table_schema,table_name)
WHERE good_for_checksum.table_name IS NULL
and table_schema not in ('information_schema');

Frequency of checksum runs


The frequency that we run the checksum is also very flexible, so we take the size and translate that into a modulo that is “even” in a time-based way. For example, on a server that reported 113 Gb in size from the INFORMATION_SCHEMA query above, we set the modulo to 120. The checksum took 10 minutes and 6 seconds from the time the master started to the time the slave finished. There was no excessive slave lag caused and other resource checks showed that this test was acceptable, including application response time for non-administrative queries.

Given a modulo of 120 that takes about 10 minutes to run and the environment, we decided to run the checksum 9 times per day (hourly for 9 hours during off-peak time). This resulted in the entire data set being checked during a period of just under 2 weeks (120 parts / 9 times per day = 13.333 days).

This means that if there is a data discrepancy, it is discovered within 2 weeks in this particular environment. Though that is not ideal, it is much better than not discovering data discrepancies at all, which is how replication currently works.

Benchmarks From the Real World

The first row in the table below is the example we just went through. The subsequent rows of the table are results from some of the other production environments we run the checksum in. As you can see, we try to keep the data checked at one time to about 1 Gb.



Total Data SizeModuloTest timeChecksum RateFrequencyPeriod
113 Gb12010 min 6 seconds1.59 Mb / sec9x / day2 weeks
9 Gb2151 seconds8.4 Mb / sec3x / day1 week
29 Gb219 min 16 seconds2.6 Mb / sec3x / day1 week
70 Gb2128 seconds2650 Mb / sec
(data was
freshly defragmented!)
3x / day1 week
5.1 Gb214 min 22 sec0.958 Mb / sec3x / day1 week
314.5 Gb33636 min 3 seconds0.44 Mb / sec16x / day3 weeks

In all of these environments, slave lag was 10 seconds or less at any given point on the slaves.

Issues Encountered


Some of the issues we encountered have workarounds, so I wanted to discuss and explain them here before giving our procedure, which contains the workarounds.

  1. Bug 304 – mk-table-checksum deletes all prior checksum results in the result table on every run. There is no fix for this yet, but if you are using mk-table-checksum only for the procedure described in this article (and in particular are not using the --resume-replication option), you can comment out the following code from mk-table-checksum:

    # Clean out the replication table entry for this table.
    if ( (my $replicate_table = $final_o->get('replicate'))
    && !$final_o->get('explain') ) {
    use_repl_db(%args); # USE the proper replicate db
    my $del_sql = "DELETE FROM $replicate_table WHERE db=? AND tbl=?";
    MKDEBUG && _d($dbh, $del_sql, $db, $table->{table});
    $dbh->do($del_sql, {}, $db, $table->{table});
    }

    It is in different places in different versions, but last I checked, searching for "DELETE FROM" in mk-table-checksum only matched three lines of code, and it was pretty clear (due to the inline comment) which block of code to delete. The block shown above is from lines 5154 – 5161 in mk-table-checksum changeset 6647.

  2. Running the checksum may cause “statement not safe for replication” errors, especially in 5.1.
  3. This is usually OK to ignore, because mk-table-checksum works specifically because you can run the same command on the master and slave and get different results. In MySQL 5.1, CSV tables for the general and slow logs exist by default, even if they are not being written to, and “Statement is not safe to log in statement format” errors show up.

    You will need to redirect stderr and expect to see those statements in the MySQL error log. Note that mk-table-checksum works regardless of whether you are using statement-based, row-based or mixed replication.

  4. mk-table-checksum is not perfect, and sometimes shows false positives and false negatives.
  5. This is a hard to deal with problem, and we encourage making bug reports when they are found. However, I will note that if mk-table-checksum finds even one undetected data integrity issue, then it is useful, because right now there is no other way of detecting issues in an automated fashion. As more people use mk-table-checksum and can help the developers figure out how to fix the false positives/false negatives, I am sure it will be even better.

Pythian’s procedure to set up continual replication sync

  1. Check to make sure all tables have appropriate indexes, as above
  2. . If they do not, the nibbling algorithm can be used, though as a caveat I have not tested nibbling with mk-table-checksum.

  3. Figure out the modulo value based on data size, as above

  4. Decide what database in which to put the tables that mk-table-checksum uses. We either use our monitoring database or a database called “maatkit”. Note that it is important to use a database that actually gets replicated!
  5. Get mk-table-checksum and comment out the lines that always delete from the replicated table, as above
  6. Create and populate the table mk-table-checksum will need for the modulo value:
    CREATE TABLE IF NOT EXISTS `checksum_modulo` (
    `modulo_offset` smallint(5) unsigned NOT NULL default '0' primary key
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8;

    INSERT IGNORE INTO checksum_modulo (modulo_offset) VALUES (0);

  7. Do a test run of mk-table-checksum:

  8. perl mk-table-checksum -u avail --ask-pass --algorithm BIT_XOR \
    --replicate maatkit.checksum --create-replicate-table \
    --modulo 120 --chunk-size 100 \
    --offset 'modulo_offset FROM maatkit.checksum_modulo' localhost

    And update the modulo table:
    update maatkit.checksum_modulo set modulo_offset = ((modulo_offset+1)%21);

    And check the results on the slave:
    SELECT * FROM maatkit.checksum
    WHERE this_crc!=master_crc AND db not in ('maatkit','mysql');

  9. If the test run came out OK, set up a script to run via cron such as:

  10. # run checksum table 9 times per day, during off-peak times:
    30 0,1,2,3,4,5,21,22,23 * * * /home/pythian/bin/checksumtbls.sh >> /home/pythian/logs/checksum.log 2>&1

    And the checksumtbls.sh script looks like:

    #!/bin/sh

    DB=maatkit
    USER=maatkit_user
    PASS=password
    REPLTBL="$DB.checksum"
    MODULO=120
    CHUNKSIZE=100
    OFFSETTBL="checksum_modulo"
    ALGORITHM=BIT_XOR
    LOG=/home/pythian/logs/checksum.log

    /usr/bin/perl /home/pythian/bin/mk-table-checksum -u $USER -p $PASS \
    --modulo $MODULO \ --algorithm $ALGORITHM --chunk-size $CHUNKSIZE \
    --offset "modulo_offset FROM $DB.$OFFSETTBL" \
    --replicate $REPLTBL --create-replicate-table localhost >> $LOG

    /usr/bin/mysql -u $USER -p$PASS -e "update $DB.$OFFSETTBL set modulo_offset=((modulo_offset+1)%$MODULO)" >> $LOG

    And of course, do not forget to periodically check on the slave to see where the issues are:

    SELECT * FROM maatkit.checksum
    WHERE this_crc!=master_crc AND db not in ('maatkit','mysql');

    I hope this helps; It is extremely important to make sure

In October 2008, Baron posted How to Check MySQL Replication Integrity Continually. Here at Pythian we have developed a method based on that post, and added “verifying that masters and slaves are in sync” to our standard battery of tests.

We call it “Continual replication sync checking”. This article will explain how it works, how to test and make the procedure non-blocking, benchmarks from the real world, issues we encountered along the way, and finally Pythian’s procedure of setting up continual replication sync in a new environment.

At the 2010 MySQL User Conference & Expo, my co-worker Danil Zburivsky did a presentation about this, and you can get the slides in ODP (Open Office) format or watch the 46-minute video on YouTube.

How it works

On the master, mk-table-checksum is run. In order to make the checksum operation online, we use the modulo and offset features of mk-table-checksum to checksum only part of the data at a time. The checksum is run (from cron) on the master and replicates to the slave. The results are captured in a result table, and a separate process checks the result table and notifies us of any discrepancies.

Testing resource usage for non-blocking replication sync checking

Each environment has a different data size, different resources, and a different level of what “intrusive” means. We start with a baseline of the size of the database, which we get from:

SELECT SUM(INDEX_LENGTH+DATA_LENGTH)/1024/1024/1024 as sizeGb
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE='BASE TABLE';

I included INDEX_LENGTH because the queries that generate the checksum information may or may not depend on indexes. Note that for InnoDB tables, the metadata is approximate, so this will not necessarily be the exact size of the database, but it will be in the right range. In general, running this query in the INFORMATION_SCHEMA did not consume too many resources, though in many cases the query took a few minutes to complete. On databases with several thousand tables, this query can take hours to complete.

Of course there is the caveat that it may take a VERY long time to run this in your environment, and some folks have reported crashing (this is VERY rare, I personally have not seen it even with the high demand client we have had, but others have reported it). So if you prefer to be on the safe side, you could look at the size of your ibdata files (whether you are innodb_file_per_table or not) plus the sizes of all the tables on disk (MyISAM, CSV, etc). That will give you a sense of how big the data is, although the size of the files on disk also includes any fragmentation. In practice, I have seen fragmentation reduce the size of a large database by 40%, so beware that neither of these methods is perfect.

Once we get the data size in gigabytes, we can come up with a modulo to test. In general, we have found that mk-table-checksum is quick enough that it is non-blocking when we use a modulo value that corresponds to approximately 1 Gb of data, and a chunk-size of 100. We actually started out using a checksum of 1000, but locking 1000 rows to test a checksum proved to be longer than was acceptable for clients with lots of transactions — the problem was not the locking on the master, the problem was that it created a slave lag on the slaves. We tried using the --sleep option to mk-table-checksum, which worked, but changing the chunksize to be smaller caused less slave lag than adding a sleep parameter.

Another issue when using chunk-size is that it requires a numeric (but non-decimal) index. We ran across clients using VARCHAR(32) as a PRIMARY key (yes, on InnoDB, and yes, performance was an issue there), or tables that did not have any indexes at all (such as a logging table). If mk-table-checksum does not find a suitable index it will just do a checksum on the whole table.

To find tables that would be problematic, here is an INFORMATION_SCHEMA query that can be run (again, the usual caveats about INFORMATION_SCHEMA apply):

SELECT CONCAT("SHOW CREATE TABLE ", table_schema,".",table_name,"\\G")
FROM TABLES
LEFT JOIN (
     SELECT distinct table_schema,table_name
     FROM STATISTICS INNER JOIN COLUMNS USING
       (table_schema,table_name,column_name)
     WHERE (data_type like '%int' or data_type like 'date%'
     or data_type like 'time%' or data_type='float') and seq_in_index=1
     ) AS good_for_checksum
USING (table_schema,table_name)
WHERE good_for_checksum.table_name IS NULL
and table_schema not in ('information_schema');

Frequency of checksum runs


The frequency that we run the checksum is also very flexible, so we take the size and translate that into a modulo that is “even” in a time-based way. For example, on a server that reported 113 Gb in size from the INFORMATION_SCHEMA query above, we set the modulo to 120. The checksum took 10 minutes and 6 seconds from the time the master started to the time the slave finished. There was no excessive slave lag caused and other resource checks showed that this test was acceptable, including application response time for non-administrative queries.

Given a modulo of 120 that takes about 10 minutes to run and the environment, we decided to run the checksum 9 times per day (hourly for 9 hours during off-peak time). This resulted in the entire data set being checked during a period of just under 2 weeks (120 parts / 9 times per day = 13.333 days).

This means that if there is a data discrepancy, it is discovered within 2 weeks in this particular environment. Though that is not ideal, it is much better than not discovering data discrepancies at all, which is how replication currently works.

Benchmarks From the Real World

The first row in the table below is the example we just went through. The subsequent rows of the table are results from some of the other production environments we run the checksum in. As you can see, we try to keep the data checked at one time to about 1 Gb.



Total Data SizeModuloTest timeChecksum RateFrequencyPeriod
113 Gb12010 min 6 seconds1.59 Mb / sec9x / day2 weeks
9 Gb2151 seconds8.4 Mb / sec3x / day1 week
29 Gb219 min 16 seconds2.6 Mb / sec3x / day1 week
70 Gb2128 seconds2650 Mb / sec
(data was
freshly defragmented!)
3x / day1 week
5.1 Gb214 min 22 sec0.958 Mb / sec3x / day1 week
314.5 Gb33636 min 3 seconds0.44 Mb / sec16x / day3 weeks

In all of these environments, slave lag was 10 seconds or less at any given point on the slaves.

Issues Encountered


Some of the issues we encountered have workarounds, so I wanted to discuss and explain them here before giving our procedure, which contains the workarounds.

  1. Bug 304 – mk-table-checksum deletes all prior checksum results in the result table on every run. There is no fix for this yet, but if you are using mk-table-checksum only for the procedure described in this article (and in particular are not using the --resume-replication option), you can comment out the following code from mk-table-checksum:

    # Clean out the replication table entry for this table.
    if ( (my $replicate_table = $final_o->get('replicate'))
    && !$final_o->get('explain') ) {
    use_repl_db(%args); # USE the proper replicate db
    my $del_sql = "DELETE FROM $replicate_table WHERE db=? AND tbl=?";
    MKDEBUG && _d($dbh, $del_sql, $db, $table->{table});
    $dbh->do($del_sql, {}, $db, $table->{table});
    }

    It is in different places in different versions, but last I checked, searching for "DELETE FROM" in mk-table-checksum only matched three lines of code, and it was pretty clear (due to the inline comment) which block of code to delete. The block shown above is from lines 5154 – 5161 in mk-table-checksum changeset 6647.

  2. Running the checksum may cause “statement not safe for replication” errors, especially in 5.1.
  3. This is usually OK to ignore, because mk-table-checksum works specifically because you can run the same command on the master and slave and get different results. In MySQL 5.1, CSV tables for the general and slow logs exist by default, even if they are not being written to, and “Statement is not safe to log in statement format” errors show up.

    You will need to redirect stderr and expect to see those statements in the MySQL error log. Note that mk-table-checksum works regardless of whether you are using statement-based, row-based or mixed replication.

  4. mk-table-checksum is not perfect, and sometimes shows false positives and false negatives.
  5. This is a hard to deal with problem, and we encourage making bug reports when they are found. However, I will note that if mk-table-checksum finds even one undetected data integrity issue, then it is useful, because right now there is no other way of detecting issues in an automated fashion. As more people use mk-table-checksum and can help the developers figure out how to fix the false positives/false negatives, I am sure it will be even better.

Pythian’s procedure to set up continual replication sync

  1. Check to make sure all tables have appropriate indexes, as above
  2. . If they do not, the nibbling algorithm can be used, though as a caveat I have not tested nibbling with mk-table-checksum.

  3. Figure out the modulo value based on data size, as above

  4. Decide what database in which to put the tables that mk-table-checksum uses. We either use our monitoring database or a database called “maatkit”. Note that it is important to use a database that actually gets replicated!
  5. Get mk-table-checksum and comment out the lines that always delete from the replicated table, as above
  6. Create and populate the table mk-table-checksum will need for the modulo value:
    CREATE TABLE IF NOT EXISTS `checksum_modulo` (
    `modulo_offset` smallint(5) unsigned NOT NULL default '0' primary key
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8;

    INSERT IGNORE INTO checksum_modulo (modulo_offset) VALUES (0);

  7. Do a test run of mk-table-checksum:

  8. perl mk-table-checksum -u avail --ask-pass --algorithm BIT_XOR \
    --replicate maatkit.checksum --create-replicate-table \
    --modulo 120 --chunk-size 100 \
    --offset 'modulo_offset FROM maatkit.checksum_modulo' localhost

    And update the modulo table:
    update maatkit.checksum_modulo set modulo_offset = ((modulo_offset+1)%21);

    And check the results on the slave:
    SELECT * FROM maatkit.checksum
    WHERE this_crc!=master_crc AND db not in ('maatkit','mysql');

  9. If the test run came out OK, set up a script to run via cron such as:

  10. # run checksum table 9 times per day, during off-peak times:
    30 0,1,2,3,4,5,21,22,23 * * * /home/pythian/bin/checksumtbls.sh >> /home/pythian/logs/checksum.log 2>&1

    And the checksumtbls.sh script looks like:

    #!/bin/sh

    DB=maatkit
    USER=maatkit_user
    PASS=password
    REPLTBL="$DB.checksum"
    MODULO=120
    CHUNKSIZE=100
    OFFSETTBL="checksum_modulo"
    ALGORITHM=BIT_XOR
    LOG=/home/pythian/logs/checksum.log

    /usr/bin/perl /home/pythian/bin/mk-table-checksum -u $USER -p $PASS \
    --modulo $MODULO \ --algorithm $ALGORITHM --chunk-size $CHUNKSIZE \
    --offset "modulo_offset FROM $DB.$OFFSETTBL" \
    --replicate $REPLTBL --create-replicate-table localhost >> $LOG

    /usr/bin/mysql -u $USER -p$PASS -e "update $DB.$OFFSETTBL set modulo_offset=((modulo_offset+1)%$MODULO)" >> $LOG

    And of course, do not forget to periodically check on the slave to see where the issues are:

    SELECT * FROM maatkit.checksum
    WHERE this_crc!=master_crc AND db not in ('maatkit','mysql');

    I hope this helps; It is extremely important to make sure