A Pythian Fork
Today marks my last day at Pythian. I have been at Pythian for almost three years. In those three years, Pythian’s already thriving MySQL practice has grown even more. I have worked with big and small clients alike, across many industries, managed a team of up to 4 DBAs, and learned a lot not just about MySQL, but what my goals are in general.
Though I am leaving, everything I said in the blog post I made when I announced I was coming to Pythian still holds true. Pythian is a challenging environment and one I would recommend to anyone who finds their current DBA environment boring that they should come to Pythian and experience what it is like to work here. I had lunch with Paul Vallee yesterday and we even discussed possible future collaborations (hence the title, a joke that I am “forking” off of Pythian).
So if it is so great, why am I leaving? It’s simple, really — Pythian is growing by leaps and bounds. I started when Pythian was about half the size it currently is. There is a lot of change happening within Pythian, and I believe it is very good change. However, I enjoyed the environment Pythian was when I started almost three years ago, and personally I am not ready to go with Pythian on the journey it is taking.
So where am I going next? For starters, I will take the month of August off paid work. I have an idea of where I might go for paid work in September, but you will have to watch Planet MySQL for the announcement. During August I will be doing some conference planning and organizing, for OpenSQLCamp in Boston in October first, and then for conferences in 2011. I will also be moving apartments, which is a big task. And I will be focusing on some personal goals, such as spending more time with my husband and becoming more active.
I am excited about having a month off, even though I have a lot to work on in that month.
Data Warehousing Best Practices: Comparing Oracle to MySQL, part 2 (partitioning)
At Kscope this year, I attended a half day in-depth session entitled Data Warehousing Performance Best Practices, given by Maria Colgan of Oracle. My impression, which was confirmed by folks in the Oracle world, is that she knows her way around the Oracle optimizer.
See part 1 for the introduction and talking about power and hardware. This part will go over the 2nd “P”, partitioning. Learning about Oracle’s partitioning has gotten me more interested in how MySQL’s partitioning works, and I do hope that MySQL partitioning will develop to the level that Oracle partitioning does, because Oracle’s partitioning looks very nice (then again, that’s why it costs so much I guess).
Partition – Larger tables or fact tables can benefit from partitioning because it makes data load easier and can increase join performance and use data elimination. Parallel execution can be done with partitioning due to partition pruning. The degree of parallelism should be a power of 2, because of hash-based algorithm in hash partitioning. To translate this to the MySQL world, if you are using LINEAR HASH partitioning, then you should use a degree of parallelism that is a power of 2 (I checked, and indeed. Otherwise, use a degree of parallelism that makes sense given the number of partitions you have.
One important note that during Pythian’s testing of MySQL partitioning, we found that all partitions were locked when an INSERT occurs, for the duration of the INSERT. Bulk-loading with MySQL partitioning is not as fast as it would be if MySQL allowed partition pruning for INSERTs.
So, what should be partitioned? For the first level of partitioning, the goal is to enable partitioning pruning and simplify data management. The most typical partitioning is range or interval partitioning on a date column. Interval partitioning is you say what the partition is (date, month) and partition is automatically created. MySQL does not have interval partitioning, and I have seen typical first-level partitioning be range or list based on a date or timestamp column. Note that if you use a timestamp field, the partitioning expression is optimized if you use TO_DAYS(timestamp_field) or YEAR(timestamp_field). In my experience, using anything else (such as DATE(timestamp_field)) actually makes partitioning slower than not using partitioning at all. Note that this is based on tests I did a few months ago, and your mileage may vary.
So — how do you decide partitioning strategy? Ask yourself:
- What range of data do the queries touch – a quarter, a year?
- What is the data loading frequency?
- Is an incremental load required?
- How much data is involved, a day, a week, a month?
The answers to the above questions will tell you about how big your interval needs to be. The best scenario is that all answers are the same, “we load every day, and people query by day.” If the answers are different weight access a higher priority than loading, because most people care more about query performance than performance of ETL.
This is true even if your intervals have different sizes — ie sales per day are much bigger in Dec but that’s OK. However, Maria recommends that the subpartition be as evenly divided as possible.
Easier to look at more partitions than to look at a partition that’s too big. But you don’t want too many partitions, max Oracle allows partitions is 1 million partitions, prior to 11g it was 64,000. “Stick closer to 64,000 than 1 million”. MySQL’s limitation is 1024 per table.
For the second level of partitioning, also called subpartitioning, the goal is to allow for multi-level pruning and improve join performance. In Oracle, the most typical subpartition is hash or list – in MySQL, you can only subpartition by hash or key.
How do you decide subpartitioning strategy?
- Select the dimension queried most frequently on the fact table OR
- Pick the common join column
For example, if you want to look at sales per day, per store, you would choose “per day” as the partition and “per store” as the subpartition.
If you do not have a good partition on logical elements (like grouping), then you can subpartition using hash partitioning on common joins — perhaps surrogate keys, or using join key of the largest table involved in the join.
For example, if the sales table is partitioned and another big table is product, you can hash subpartition product_id.
Because there’s overhead in partitions (loading metadata, reading metadata), make sure size of partitions and subpartitions is >20 Mb. So better to have a 30 Mb subpartition than a 15 Mb subpartition. [I have no idea if this is true in MySQL or not -- I think the general concept is true, because there is some overhead, but I have no idea about the 20 Mb figure and why that's true for Oracle, nor do I know what is true in MySQL.]
One easy calculation is double the # of CPUs, round up to nearest power of 2. If you’re executing in parallel, Oracle will use 2x CPUs. (all this advice, by the way, follows 80/20 rule, this is probably good for about 80% of the environments out there). Of course, MySQL does not do parallel execution very well, so this probably does not apply.
Oracle knows it can get partition elimination while it does a join.
If 2 tables have the same degree of parallelism (same # of buckets) and are partitioned in the same way on the join column (say, customer_id in a subpartition of sales and a partition of customer), Oracle will match the partitions when joining:
sales table joined with customer table can change into 4 small joins:
sales sub part 1 joins with customer part 1
sales sub part 2 joins with customer part 2
sales sub part 3 joins with customer part 3
sales sub part 4 joins with customer part 4
And with parallelism, the total time is now reduced to the time it takes to do one of those smaller joins.
This is also why you want to have a power of 2 for buckets – because cores/processors come in powers of 2. Partition-wise joins like this can also be done with range or list, assuming both tables in the join have the same buckets.
I have no idea if MySQL partitioning works this way, but it’s certainly a functionality that makes sense to me.
Data Warehousing Best Practices: Comparing Oracle to MySQL, part 1 (introduction and power)
At Kscope this year, I attended a half day in-depth session entitled Data Warehousing Performance Best Practices, given by Maria Colgan of Oracle. My impression, which was confirmed by folks in the Oracle world, is that she knows her way around the Oracle optimizer.
These are my notes from the session, which include comparisons of how Oracle works (which Maria gave) and how MySQL works (which I researched to figure out the difference, which is why this blog post took a month after the conference to write). Note that I am not an expert on data warehousing in either Oracle or MySQL, so these are more concepts to think about than hard-and-fast advice. In some places, I still have questions, and I am happy to have folks comment and contribute what they know.
One interesting point brought up:
Maria quoted someone (she said the name but I did not grab it) from Forrester saying, “3NF is typically a selfless model used by Enterprise data warehouse, which is used by the whole company. A star schema is a selfish model, used by a department, because it’s already got aggregation in it.”
I thought that was an interesting way of pointing that out — most people do not understand why 3NF is not good enough for data warehousing, and I have had a hard time explaining why a star or snowflake schema should be used. Another schema-related topic I had a hard time putting into words before this workshop was the difference between a star and a snowflake schema: compared to a star schema, in a snowflake schema, you have more than one fact table and maybe some dimensions that are not used often.
From Maria and the slides:
“Oracle says model what will suit your business best. Don’t get lost in academia. Most schemas are not 100% according to the theoretical
models. Some examples: 3NF schema with denormalized attributes to avoid costly joins, Star schema with multiple hierarchies in same fact table.”
Data warehousing has a 3-step approach —
1) data sources -> staging layer (temp loading layer)
2) staging layer (temp loading layer)-> foundation (logical, data store) layer
3) foundation (logical, data store) layer -> access and performance layer
The foundation layer is usually 3NF the access layer is usually a star or snowflake schema. As for the data sources, they can be varied, you would hope that they are in 3NF (and if they are you can skip the first 2 steps) but they are not always that way.
The 3 P’s of best practice for data warehousing (on Oracle) are power, partitioning, parallelism. The goal of the data warehousing environment is to minimize the amount of data accessed and use the most efficient joins – so it is not so index focused. This may be based on Oracle’s way of doing joins, I am not so sure if it applies to MySQL as well.
Power The weakest link in the chain (the 3 steps above) will define the throughput, so make sure your hardware configuration is balanced. Maria mentioned that as DBAs, “most of the time we don’t have control over this, but we’re still bound to the SLAs.”
This includes hardware that immediately comes to mind such as # of CPUs/cores, speed of CPU, amount of RAM, speed of disk as well as what we may not think of immediately: speed of network switches, speed of disk controllers, number and speed of host BUS adapters. Notes on host BUS adapters (HBAs): Know the # of HBA ports you have. 4 Gb HBA does 400 Mb/sec. 2 Gb HBA does 200 Mb/sec. Make sure there’s enough HBA capacity to sustain the CPU throughput (ie, make sure HBA isn’t the bottleneck). Also the speed at which it all talks. If you have a 4 Gb machine but a 2 Gb switch, you end up having 2 Gb throughput. Upgrade the network at the same time you upgrade machines.
Because we are talking about data warehousing, it is often not possible to eliminate disk I/O, so the goal is to have the fastest I/O throughput possible. Data warehouses need to be sized on I/O throughput not number of I/O’s.
I made a post earlier about how to determine I/O throughput for a system, which used information from this session. Justin Swanhart already pointed out that this is based on the fact that Oracle can do hash joins and MySQL can only do nested loop joins. I wonder, though, if there is indeed no case when using MySQL for which I/O throughput is a more useful metric than iops.
Disk arrays that are expensive are usually sized for iops, not throughput, and because they’re expensive the disk array is shared throughout the company. A DBA needs to ask ‘how many connections into the storage array do I have? How many disk controllers do I have? Where are my physical disks, and which controllers are they hanging off of?’
Typical 15k rpm disk can do about 25-35 Mb/sec (per disk) random i/o’s. Disk manufacturers will throw out numbers like 200-300 Mb/sec but that’s sequential I/O and leading edge of the drive. Make sure all your LUNs are not coming off the same set of disks, so that you’re not conflicting on disk seeks.
Continue to part 2, partitioning.
Determining I/O throughput for a system
At Kscope this year, I attended a half day in-depth session entitled Data Warehousing Performance Best Practices, given by Maria Colgan of Oracle. In that session, there was a section on how to determine I/O throughput for a system, because in data warehousing I/O per second (iops) is less important than I/O throughput (how much actual data goes through, not just how many reads/writes).
The section contained an Oracle-specific in-database tool, and a standalone tool that can be used on many operating systems, regardless of whether or not a database exists:
If Oracle is installed, run DBMS_RESOURCE_MANAGER.CALIBRATE_IO:
SET SERVEROUTPUT ON
DECLARE
lat INTEGER;
iops INTEGER;
mbps INTEGER;
BEGIN
-- DBMS_RESOURCE_MANAGER.CALIBRATE_IO(<DISKS>, <MAX_LATENCY>,iops,mbps,lat);
DBMS_RESOURCE_MANAGER.CALIBRATE_IO (2, 10, iops, mbps, lat);
dbms_output.put_line('max_mbps = ' || mbps);
end;
For us MySQL folks, or even the Drizzle or NoSQL folks, Oracle offers a free standalone tool called Orion. The example given in the slides was:
./orion –run advanced –testname mytest –num_small 0 –size_large 1024 –type rand –simulate contact –write 0 –duration 60 –matrix column
-num_small is 0 because you don’t usually do small transactions in a dw.
-type rand for random I/O’s because data warehouse queries usually don’t do sequential reads
-write 0 – no writes, because you do not write often to the dw, that is what the ETL is for.
-duration is in seconds
-matrix column shows you how much you can sustain
I would be interested to see how other folks measure I/O throughput, and maybe even do a side-by-side comparison of different tools. Orion is available for:
Linux (x86, x86-64, Itanium, Power)
Solaris (SPARC64)
AIX (PPC64)
zLinux
HPUX (PA RISC, Itanium)
Windows
I am working on a larger write-up of the session itself, which had many concise descriptions of data warehousing issues, but I thought that this merited its own post.
Online Verification That Master and Slaves are in Sync
In October 2008, Baron posted How to Check MySQL Replication Integrity Continually. Here at Pythian we have developed a method based on that post, and added “verifying that masters and slaves are in sync” to our standard battery of tests.
We call it “Continual replication sync checking”. This article will explain how it works, how to test and make the procedure non-blocking, benchmarks from the real world, issues we encountered along the way, and finally Pythian’s procedure of setting up continual replication sync in a new environment.
At the 2010 MySQL User Conference & Expo, my co-worker Danil Zburivsky did a presentation about this, and you can get the slides in ODP (Open Office) format or watch the 46-minute video on YouTube.
How it works
On the master, mk-table-checksum is run. In order to make the checksum operation online, we use the modulo and offset features of mk-table-checksum to checksum only part of the data at a time. The checksum is run (from cron) on the master and replicates to the slave. The results are captured in a result table, and a separate process checks the result table and notifies us of any discrepancies.
Testing resource usage for non-blocking replication sync checking
Each environment has a different data size, different resources, and a different level of what “intrusive” means. We start with a baseline of the size of the database, which we get from:
SELECT SUM(INDEX_LENGTH+DATA_LENGTH)/1024/1024/1024 as sizeGb
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE='BASE TABLE';
I included INDEX_LENGTH because the queries that generate the checksum information may or may not depend on indexes. Note that for InnoDB tables, the metadata is approximate, so this will not necessarily be the exact size of the database, but it will be in the right range. In general, running this query in the INFORMATION_SCHEMA did not consume too many resources, though in many cases the query took a few minutes to complete. On databases with several thousand tables, this query can take hours to complete.
Of course there is the caveat that it may take a VERY long time to run this in your environment, and some folks have reported crashing (this is VERY rare, I personally have not seen it even with the high demand client we have had, but others have reported it). So if you prefer to be on the safe side, you could look at the size of your ibdata files (whether you are innodb_file_per_table or not) plus the sizes of all the tables on disk (MyISAM, CSV, etc). That will give you a sense of how big the data is, although the size of the files on disk also includes any fragmentation. In practice, I have seen fragmentation reduce the size of a large database by 40%, so beware that neither of these methods is perfect.
Once we get the data size in gigabytes, we can come up with a modulo to test. In general, we have found that mk-table-checksum is quick enough that it is non-blocking when we use a modulo value that corresponds to approximately 1 Gb of data, and a chunk-size of 100. We actually started out using a checksum of 1000, but locking 1000 rows to test a checksum proved to be longer than was acceptable for clients with lots of transactions — the problem was not the locking on the master, the problem was that it created a slave lag on the slaves. We tried using the --sleep option to mk-table-checksum, which worked, but changing the chunksize to be smaller caused less slave lag than adding a sleep parameter.
Another issue when using chunk-size is that it requires a numeric (but non-decimal) index. We ran across clients using VARCHAR(32) as a PRIMARY key (yes, on InnoDB, and yes, performance was an issue there), or tables that did not have any indexes at all (such as a logging table). If mk-table-checksum does not find a suitable index it will just do a checksum on the whole table.
To find tables that would be problematic, here is an INFORMATION_SCHEMA query that can be run (again, the usual caveats about INFORMATION_SCHEMA apply):
SELECT CONCAT("SHOW CREATE TABLE ", table_schema,".",table_name,"\\G")
FROM TABLES
LEFT JOIN (
SELECT distinct table_schema,table_name
FROM STATISTICS INNER JOIN COLUMNS USING
(table_schema,table_name,column_name)
WHERE (data_type like '%int' or data_type like 'date%'
or data_type like 'time%' or data_type='float') and seq_in_index=1
) AS good_for_checksum
USING (table_schema,table_name)
WHERE good_for_checksum.table_name IS NULL
and table_schema not in ('information_schema');
Frequency of checksum runs
The frequency that we run the checksum is also very flexible, so we take the size and translate that into a modulo that is “even” in a time-based way. For example, on a server that reported 113 Gb in size from the INFORMATION_SCHEMA query above, we set the modulo to 120. The checksum took 10 minutes and 6 seconds from the time the master started to the time the slave finished. There was no excessive slave lag caused and other resource checks showed that this test was acceptable, including application response time for non-administrative queries.
Given a modulo of 120 that takes about 10 minutes to run and the environment, we decided to run the checksum 9 times per day (hourly for 9 hours during off-peak time). This resulted in the entire data set being checked during a period of just under 2 weeks (120 parts / 9 times per day = 13.333 days).
This means that if there is a data discrepancy, it is discovered within 2 weeks in this particular environment. Though that is not ideal, it is much better than not discovering data discrepancies at all, which is how replication currently works.
Benchmarks From the Real World
The first row in the table below is the example we just went through. The subsequent rows of the table are results from some of the other production environments we run the checksum in. As you can see, we try to keep the data checked at one time to about 1 Gb.
Total Data SizeModuloTest timeChecksum RateFrequencyPeriod
113 Gb12010 min 6 seconds1.59 Mb / sec9x / day2 weeks
9 Gb2151 seconds8.4 Mb / sec3x / day1 week
29 Gb219 min 16 seconds2.6 Mb / sec3x / day1 week
70 Gb2128 seconds2650 Mb / sec(data wasfreshly defragmented!)3x / day1 week
5.1 Gb214 min 22 sec0.958 Mb / sec3x / day1 week
314.5 Gb33636 min 3 seconds0.44 Mb / sec16x / day3 weeks
In all of these environments, slave lag was 10 seconds or less at any given point on the slaves.
Issues Encountered
Some of the issues we encountered have workarounds, so I wanted to discuss and explain them here before giving our procedure, which contains the workarounds.
- Bug 304 – mk-table-checksum deletes all prior checksum results in the result table on every run. There is no fix for this yet, but if you are using mk-table-checksum only for the procedure described in this article (and in particular are not using the
--resume-replicationoption), you can comment out the following code frommk-table-checksum:
# Clean out the replication table entry for this table.
if ( (my $replicate_table = $final_o->get('replicate'))
&& !$final_o->get('explain') ) {
use_repl_db(%args); # USE the proper replicate db
my $del_sql = "DELETE FROM $replicate_table WHERE db=? AND tbl=?";
MKDEBUG && _d($dbh, $del_sql, $db, $table->{table});
$dbh->do($del_sql, {}, $db, $table->{table});
}
It is in different places in different versions, but last I checked, searching for"DELETE FROM"inmk-table-checksumonly matched three lines of code, and it was pretty clear (due to the inline comment) which block of code to delete. The block shown above is from lines 5154 – 5161 in mk-table-checksum changeset 6647.
- Running the checksum may cause “statement not safe for replication” errors, especially in 5.1. This is usually OK to ignore, because mk-table-checksum works specifically because you can run the same command on the master and slave and get different results. In MySQL 5.1, CSV tables for the general and slow logs exist by default, even if they are not being written to, and “Statement is not safe to log in statement format” errors show up.
- mk-table-checksum is not perfect, and sometimes shows false positives and false negatives. This is a hard to deal with problem, and we encourage making bug reports when they are found. However, I will note that if mk-table-checksum finds even one undetected data integrity issue, then it is useful, because right now there is no other way of detecting issues in an automated fashion. As more people use mk-table-checksum and can help the developers figure out how to fix the false positives/false negatives, I am sure it will be even better.
You will need to redirect stderr and expect to see those statements in the MySQL error log. Note that mk-table-checksum works regardless of whether you are using statement-based, row-based or mixed replication.
Pythian’s procedure to set up continual replication sync
- Check to make sure all tables have appropriate indexes, as above . If they do not, the nibbling algorithm can be used, though as a caveat I have not tested nibbling with mk-table-checksum.
- Figure out the modulo value based on data size, as above
- Decide what database in which to put the tables that mk-table-checksum uses. We either use our monitoring database or a database called “maatkit”. Note that it is important to use a database that actually gets replicated!
- Get mk-table-checksum and comment out the lines that always delete from the replicated table, as above
- Create and populate the table mk-table-checksum will need for the modulo value:
CREATE TABLE IF NOT EXISTS `checksum_modulo` (
`modulo_offset` smallint(5) unsigned NOT NULL default '0' primary key
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT IGNORE INTO checksum_modulo (modulo_offset) VALUES (0);
- Do a test run of mk-table-checksum:
- If the test run came out OK, set up a script to run via cron such as:
perl mk-table-checksum -u avail --ask-pass --algorithm BIT_XOR \
--replicate maatkit.checksum --create-replicate-table \
--modulo 120 --chunk-size 100 \
--offset 'modulo_offset FROM maatkit.checksum_modulo' localhostAnd update the modulo table:
update maatkit.checksum_modulo set modulo_offset = ((modulo_offset+1)%21);And check the results on the slave:
SELECT * FROM maatkit.checksum
WHERE this_crc!=master_crc AND db not in ('maatkit','mysql');# run checksum table 9 times per day, during off-peak times:
30 0,1,2,3,4,5,21,22,23 * * * /home/pythian/bin/checksumtbls.sh >> /home/pythian/logs/checksum.log 2>&1And the checksumtbls.sh script looks like:
#!/bin/sh
DB=maatkit
USER=maatkit_user
PASS=password
REPLTBL="$DB.checksum"
MODULO=120
CHUNKSIZE=100
OFFSETTBL="checksum_modulo"
ALGORITHM=BIT_XOR
LOG=/home/pythian/logs/checksum.log
/usr/bin/perl /home/pythian/bin/mk-table-checksum -u $USER -p $PASS \
--modulo $MODULO \ --algorithm $ALGORITHM --chunk-size $CHUNKSIZE \
--offset "modulo_offset FROM $DB.$OFFSETTBL" \
--replicate $REPLTBL --create-replicate-table localhost >> $LOG
/usr/bin/mysql -u $USER -p$PASS -e "update $DB.$OFFSETTBL set modulo_offset=((modulo_offset+1)%$MODULO)" >> $LOG
And of course, do not forget to periodically check on the slave to see where the issues are:
SELECT * FROM maatkit.checksum
WHERE this_crc!=master_crc AND db not in ('maatkit','mysql');I hope this helps; It is extremely important to make sure
Webinar: What you need to know for a MySQL 5.0 -> 5.1 upgrade
IOUG has a free series of three webinars on upgrading MySQL. Each webinar is an hour long, and it starts with a webinar by me tomorrow at 12 noon Central time (GMT-5) on “Why and How to Upgrade to MySQL 5.1″. The webinar assumes you are upgrading from MySQL 5.0 to MySQL 5.1, and talks a little bit about the new features, server variables, and what you need to know when upgrading to MySQL 5.1.
The software used is GoToWebinar (formerly GoToMeeting), so you will need to install that software. To register, use the links on the IOUG MySQL Upgrade Webinar Series page.
The complete list of webinars in the MySQL Upgrade Series is:
* MySQL 5.1: Why and How to Upgrade
Sheeri Cabral, The Pythian Group
Tuesday, July 27, 12:00 p.m. – 1:00 p.m. CT (GMT-5)
* MySQL Upgrades With No Downtime
Sean Hull, Heavyweight Internet Group
Wednesday, July 28, 12:00 p.m. – 1:00 p.m. CT (GMT-5)
* MySQL Upgrade Best Practices
Matt Yonkovit, Percona
Thursday, July 29, 12:00 p.m. – 1:00 p.m. CT (GMT-5)
(note, I am not sure if it is free for everyone or just free for IOUG members; my apologies if it is the latter)
Using MySQL Partitioning Instead of MERGE Tables
One common question I get is how to use partitioning instead of MERGE tables. The process I use involves using stored procedures to create and drop partitions. This article will go over the stored procedures I use; special thanks to Roland Bouman for taking a look and giving great feedback to optimize this process.
First, a simple table, not partitioned (yet):
use test;
DROP TABLE IF EXISTS my_part;
CREATE TABLE IF NOT EXISTS my_part (
id int NOT NULL,
creationDate datetime NOT NULL,
PRIMARY KEY (id,creationDate)
) ENGINE=InnoDB;
In real, life there is more to the table than just id and creationDate. The most important part is that the partitioned field(s) need to be part of the primary key.
Now, add the partition definition. This can be done in the CREATE statement, but I have found that it is easier for me to think about what fields and indexes I want first, and then worry about partitioning, so I naturally gravitate towards defining the fields and indexes first and then altering the table to add partitioning. This also helps me think about how to modify an existing table to add partitioning.
ALTER TABLE my_part PARTITION BY RANGE (TO_DAYS(creationDate)) (
partition 2010_07_01 values less than (to_days('2010-07-02')),
partition 2010_07_02 values less than (to_days('2010-07-03')),
partition 2010_07_03 values less than (to_days('2010-07-04')),
partition 2010_07_04 values less than (to_days('2010-07-05'))
);
This makes it pretty clear what is happening — the idea is to give the partition names actual dates that they hold, so that it is easy to see what partitions need to be added and deleted.
Deleting partitions
I find that making stored procedures makes things easy….so I will define a procedure called partition_drop to drop partitions. The partition_drop stored procedure takes in a table name and schema name to drop partitions from, and a date to delete up through and including that date. Here’s the procedure:
DELIMITER ||
DROP PROCEDURE IF EXISTS partition_drop ||
CREATE PROCEDURE partition_drop (IN through_date date, IN tbl varchar(64), IN db varchar(64))
BEGIN
DECLARE delete_me varchar(64);
DECLARE notfound BOOL DEFAULT FALSE;
DECLARE pname CURSOR FOR SELECT PARTITION_NAME
FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME=tbl AND TABLE_SCHEMA=db
AND DATE(PARTITION_NAME)!= 0
AND DATE(PARTITION_NAME) IS NOT NULL
AND DATE(PARTITION_NAME)<=through_date;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET notfound:=TRUE;
OPEN pname;
cursor_loop: LOOP
FETCH pname INTO delete_me;
IF notfound THEN LEAVE cursor_loop; END IF;
SET @alter_stmt:=CONCAT("ALTER TABLE ",db,".",tbl," DROP PARTITION ",delete_me);
# sanity check commented out for production use
# SELECT @alter_stmt;
PREPARE stmt_alter FROM @alter_stmt; EXECUTE stmt_alter; DEALLOCATE PREPARE stmt_alter;
END LOOP;
CLOSE pname;
END ||
DELIMITER ;
Go ahead and run CALL partition_drop('2010-07-02','my_part','test'); to verify that SHOW CREATE TABLE my_part; shows that the desired partitions have been dropped.
Adding partitions
Adding partitions is similar to deleting partitions — using a stored procedure that takes in the date to add partitions up to that date. It will not try to add so many partitions that the table will have more than 1024, and it won’t add any partitions that already exist.
DELIMITER ||
DROP PROCEDURE IF EXISTS partition_add ||
CREATE PROCEDURE partition_add (IN through_date date, IN tbl varchar(64), IN db varchar(64))
BEGIN
DECLARE add_me char(10);
DECLARE max_new_parts,add_cnt smallint unsigned default 0;
SELECT 1024-COUNT(*) AS max_new_parts,
SUM(CASE WHEN
DATE(PARTITION_NAME)>=through_date then 1 else 0
END)
INTO max_new_parts, add_cnt
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_SCHEMA = db
AND TABLE_NAME = tbl;
IF add_cnt=0 THEN
BEGIN
SELECT MAX(DATE(PARTITION_NAME)) INTO add_me
FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME=tbl and TABLE_SCHEMA=db
AND DATE(PARTITION_NAME)<through_date;
# to do: declare handler for exceptions here
IF DATEDIFF(through_date,add_me)+1 < max_new_parts THEN
BEGIN
WHILE add_me<through_date do BEGIN
SET add_me:=DATE_FORMAT(add_me + INTERVAL 1 DAY,"%Y_%m_%d");
SET @alter_stmt:=CONCAT("ALTER TABLE ",db,".",tbl," ADD PARTITION (PARTITION ",add_me," VALUES LESS THAN (TO_DAYS('",add_me+INTERVAL 1 DAY, "')))" );
PREPARE stmt_alter FROM @alter_stmt; EXECUTE stmt_alter; DEALLOCATE PREPARE stmt_alter;
END;
END WHILE;
END;
END IF;
END;
END IF;
END ||
DELIMITER ;
Here’s how to call that stored procedure:
CALL partition_add ('2010_07_10', 'my_part','test');
Caveat: This stored procedure will only add in partitions from the maximum partition name/date until the date you add it in, it will not fill in any gaps. However, it is possible to add in another input parameter to be the “start date” and change the stored procedure to add the partition if it does not exist, from the start date through the end date.
Note: It was pointed out by Roland Bouman that it would be better to change both the add and drop stored procedures to do one ALTER TABLE instead of sequential ones. He wrote:
It’s probably better to generate a single statement to drop / add all partitions.
So the general pattern would be:
- generate and concatenate lines for each partition
- one single sequence of prepare, execute, deallocate to execute one DDL statement.
For the drop partition procedure, this approach would allow you to do away with the cursor. You can simply use GROUP_CONCAT in a single SELECT…INTO statement to generate the entire DDL statement.
Note: Roland also mentioned that these stored procedures could be generalized to use any interval, for example if each table partition held 7 days. The only limitation is that to get the optimal partition performance with dates, use either TO_DAYS(date_field) or YEAR(date_field) as the partitionining function.
Putting it all together
The procedure: daily I would run the following in MySQL, to keep only the previous “x” days:
CALL partition_drop('CURRENT_DATE()-INTERVAL x DAY', 'tbl_name','schema_name');
To add new partitions, I would do:
CALL partition_add('CURRENT_DATE()-INTERVAL 32 DAY', 'tbl_name','schema_name');
It is run daily just in case; the side benefit is that it will catch any gaps if you do not modify the stored procedure to include a start date for when to add. Adding a partition does not take a long time, because there’s no data to reorganize.
I would make a daily MySQL event, and then everything is stored nicely in the database, and backed up with the database. Others may choose to run the stored procedures in a cron script or scheduled task (Windows), but that requires a password to be stored somewhere — either in the script, somewhere the script can read, or in a config file such as .my.cnf.
Three editions of MySQL are available
Yes, you read the title correctly — there are three editions of MySQL available, according to http://www.mysql.com/products/enterprise/server.html. Well, that page names two, and then of course there is the community edition….
From the manual page:
MySQL Enterprise Server is available in the following editions:
* MySQL Enterprise Server – Pro is the world’s most popular open source database that enables you to rapidly deliver high performance and scalable Online Transaction Processing (OLTP) applications.
* MySQL Enterprise Server – Advanced is the most comprehensive edition of MySQL. It provides all the benefits of MySQL Enterprise Server Pro and adds horizontal table and index partitioning for improving the performance and management of VLDBs (Very Large Databases).
How is “horizontal table and index partitioning” different from the regular partitioning available in MySQL 5.1?
Those of us that have been around for the past 3 or so years know that there was a point in time where there were two different editions of MySQL available, back when MySQL Enterprise and MySQL Community were actually different. But that experiment was a complete failure, and the code is now the same. MySQL Enterprise does package the software in a way that is not available to the community, specifically the quarterly service pack (QSP) releases. But the actual code….the same.
The pricing page at http://globalspecials.sun.com/store/mysql/ContentTheme/pbPage.categoryEnterprise shows that the Advanced server can be acquired for $3k (Gold) or $5k (Platinum) per year. The fee is worth it for the support MySQL will give you, but why is MySQL muddying the waters by having more “editions”, which very likely are not even different code?
(Special thanks to Aaron Macks for pointing out the existence of mysql-advanced, which was the impetus for this blog post.)
INFORMATION_SCHEMA tables are case sensitive
I wanted to get examples of some of the extra information that the Percona server has in its INFORMATION_SCHEMA metadata, and in doing so, I stumbled across an interesting MySQL bug/feature/point — INFORMATION_SCHEMA tables (which are actually system views) are case sensitive when used in comparisons:
mysql> use information_schema;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select @@version;
+--------------------+
| @@version |
+--------------------+
| 5.1.36-xtradb6-log |
+--------------------+
1 row in set (0.00 sec)
mysql> use information_schema;
Database changed
mysql> show tables like 'innodb%';
Empty set (0.00 sec)
mysql> show tables like 'INNODB%';
+----------------------------------------+
| Tables_in_information_schema (INNODB%) |
+----------------------------------------+
| INNODB_BUFFER_POOL_PAGES_INDEX |
| INNODB_RSEG |
| INNODB_LOCKS |
| INNODB_BUFFER_POOL_PAGES |
| INNODB_TRX |
| INNODB_INDEX_STATS |
| INNODB_LOCK_WAITS |
| INNODB_CMP_RESET |
| INNODB_CMP |
| INNODB_CMPMEM_RESET |
| INNODB_BUFFER_POOL_PAGES_BLOB |
| INNODB_CMPMEM |
| INNODB_TABLE_STATS |
+----------------------------------------+
13 rows in set (0.00 sec)
It is not just for the new tables Percona has added:
mysql> show tables like 'table%';
Empty set (0.00 sec)
mysql> show tables like 'TABLE%';
+---------------------------------------+
| Tables_in_information_schema (TABLE%) |
+---------------------------------------+
| TABLES |
| TABLE_CONSTRAINTS |
| TABLE_PRIVILEGES |
+---------------------------------------+
3 rows in set (0.00 sec)
And it is not due to the collation:
mysql> show global variables like '%collat%';
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | latin1_swedish_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)
mysql> show session variables like '%collat%';
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | latin1_swedish_ci |
| collation_database | utf8_general_ci |
| collation_server | latin1_swedish_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)
The XLDB4 Conference for Very Large Databases
Ronald saved me a post by giving his feedback on a few Oracle conferences that now have MySQL content.
My opinion is pretty much a summary of Ronald’s post, so I won’t repeat it here. Instead, I’ll post about a conference he did not, the 4th Extremely Large Databases Conference. I am particularly interested in any MySQL folks planning to attend (I would expect Tokutek to be represented, and maybe even the
"http://www.calpont.com/">Calpont folks).
Most of this is directly from an e-mail I received from Jacek Becla, who had a keynote at the 2008 MySQL User Conference and Expo. If you also received this e-mail, please feel free to skip ahead to my viewpoints on the various Oracle conferences (or just skip altogether).
4th Extremely Large Databases (XLDB4) Conference
October 6-7, 2010
SLAC National Accelerator Laboratory
Menlo Park, California
http://www.xldb.org/4
XLDB, now including a conference open to all interested parties, is the gathering place for the community managing and analyzing data at extreme scale.
Leading practitioners from science, industry, and academia will discuss lessons learned and real-world solutions for handling terabytes and petabytes of data. Experts will present emerging trends that will significantly impact how extremely large databases are built and used.
More information, including the program, and online registration are available at http://www.xldb.org/4. Space is limited, so if you would like to attend, make sure to register soon. Early registration ($100) ends on July 31; late registration is $150.
If you are uncertain whether to attend, I would recommend reading the article recently written by Curt Monash:
http://www.dbms2.com/2010/07/01/why-you-should-go-to-xldb4/
For questions, please contact us by email using
xldb-admin at slac.stanford.edu or contact me [Jacek] directly at
becla at slac.stanford.edu.
ODTUG Kscope Wrap-up and Slides
Ronald Bradford and I produced a successful MySQL track at Kaleidoscope (hereinafter referred to as Kscope). With a speaker list of Philip Antoniades, Josh Sled and Craig Sylvester of Oracle, Laine Campbell of PalominoDB, Patrick Galbraith of Northscale, Sarah Novotny of Blue Gecko, Padrig O’Sullivan of Akiba, Dossy Shiobara of Panoptic.com and Matt Yonkovic of Percona, we knew the technical content was going to be great.
As someone who’s helped organize all the OpenSQLCamps, a few MySQL Camps, and the Boston MySQL User Group, I know that participation at an event such as this can be small. Despite planning the MySQL track at the last minute, we had top-notch speakers with appropriate content for the audience, which was mostly Oracle crossovers. We had several registrants who came solely for the MySQL content, with all but 2 of the 27 sessions having 10-25 audience members. According to a few different folks, this is the same amount as the SOA and BPM track receives, and that track was not planned at the last minute. The ODTUG conference committee and board were happy with the turnout as well. I can’t wait to see the results of the evaluations!
As someone who’s sat on not-for-profit boards in the past as well as organized events while being on the board, I know how crazy it can be to plan conferences, and I also know that there are some organizations and personalities that are difficult to work with. I am happy to report that ODTUG has been very welcoming, accommodating, and hands-off, letting us do what we need. When I spoke with Edward Roske, the conference chair for Kscope 2011, he said, “You know what the needs for MySQL much better than I do, so just run the MySQL track as if you’re an executive of a corporation.” I know we are all worried about content being controlled by marketing folks, and I am very happy to report that ODTUG’s Kaleidoscope conference is of, by and for the people. There is one slot for a vendor presentation, and it is clearly marked, and there was an expo hall with over 20 booths, so there are opportunities for marketing; but all in all this is a technical conference.
Slides from presentations:
From Ronald:
Increasing MySQL Productivity from Design to Implementation (3-hour presentation)
MySQL idiosyncrasies that bite
From Matt Yonkovit:
The Five Minute DBA
From me:
Importing and Exporting Data with MySQL
What do you mean, SQL syntax error? – a 90-minute look into how MySQL’s SQL extends and deviates from the ANSI/ISO SQL:2003 standard.
Stored
Procedures & Functions and
Triggers and Views.
I also gave Jay Pipes’ Join-fu: the Art of SQL part one and Join-fu: the Art of SQL part two as Jay was unable to attend at the last minute.
My huge thanks to all the speakers and all the attendees; the MySQL track at Kaleidoscope was a success! I am already imagining what we can do next year in Long Beach, CA at the end of June.
MySQL’s SQL Deviations and Extensions
Today at Kaleidoscope I will be doing a 90-minute session comparing MySQL’s SQL syntax to the ANSI/ISO SQL:2003 standard, entitled What Do You Mean, “SQL Syntax Error”?
You can download the PDF slides now.
For those that may be following along the presentation later today (4 pm Eastern time), here are some links that I may throw out during the session:
- SQL 2003 standard – actually it is “Information taken from the Final Committee Draft (FCD) of ISO/IEC 9075-2:2003″ but it’s extremely close to the actual standard. The actual standard is a document that costs a non-trivial amount of money to get, and cannot be republished.
- The MySQL Administrator’s Bible, from which this content was taken from.
- The MySQL Manual page for sql_mode values.
- Information about and examples of quoting in MySQL
- The MySQL Manual page for CREATE TABLE which contains all the special extensions MySQL has for tables.
- The MySQL Manual page for PROCEDURE ANALYSE().
- The MySQL Manual page for the SHOW extension
- PDF slides for an EXPLAIN presentation I have done, and the EXPLAIN Cheatsheet.
MySQL and Quoting
MySQL does not follow the ANSI SQL standard for quoting. MySQL’s default quoting behavior is that either single or double quotes can be used to quote a string (this gets me into trouble when I work with Oracle databases, as double quotes do not indicate a string!).
mysql> SELECT 'alive';
+-------+
| alive |
+-------+
| alive |
+-------+
1 row in set (0.00 sec)
mysql> SELECT "alive";
+-------+
| alive |
+-------+
| alive |
+-------+
1 row in set (0.00 sec)
Bare words are dealt with in context; in this case, a bare word would be parsed as a column name:
mysql> SELECT alive;
ERROR 1054 (42S22): Unknown column 'alive' in 'field list'
Backquotes are the way MySQL escapes table names. So, if you want a reserved word, number or operator to be the name of an object (ie, a table named “1″ or a column named “date”) you need to use backquotes to avoid a syntax error….for example:
mysql> SELECT `alive`;
ERROR 1054 (42S22): Unknown column 'alive' in 'field list'
mysql> CREATE TABLE table (column date);
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'table (column date)' at line 1
mysql> CREATE TABLE `table` (column date);
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'column date)' at line 1
mysql> CREATE TABLE `table` (`column` date);
Query OK, 0 rows affected (0.11 sec)
mysql> DROP TABLE table;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'table' at line 1
mysql> DROP TABLE `table`;
Query OK, 0 rows affected (0.00 sec)
The above illustrates MySQL’s default behavior, which in summary is:
- With no quotes or backquotes, MySQL treats the word ‘alive’ as a field name.
- With single or double quotes, MySQL treats the word ‘alive’ as a string.
Changing SQL Mode
However, this behavior can be changed by setting the sql_mode. There are over 30 different sql modes in MySQL 5.1 that change the behavior of MySQL (not all are related to quoting). The sql_mode value is a comma-separated list showing which sql modes are used.
The default sql_mode is blank, allowing default MySQL behavior.
If sql_mode is changed to ANSI_QUOTES, then ANSI quotes are used….
mysql> SET SESSION SQL_MODE='ANSI_QUOTES';
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW SESSION VARIABLES LIKE 'sql_mode';
+---------------+-------------+
| Variable_name | Value |
+---------------+-------------+
| sql_mode | ANSI_QUOTES |
+---------------+-------------+
1 row in set (0.00 sec)
MySQL still treats barewords and backquotes as columns:
mysql> SELECT alive;
ERROR 1054 (42S22): Unknown column 'alive' in 'field list'
mysql> SELECT `alive`;
ERROR 1054 (42S22): Unknown column 'alive' in 'field list'
MySQL still treats single quotes as strings:
mysql> SELECT 'alive';
+-------+
| alive |
+-------+
| alive |
+-------+
1 row in set (0.00 sec)
However, now MySQL treats double quotes as escape characters, ie, treating double quotes like backquotes:
mysql> SELECT "alive";
ERROR 1054 (42S22): Unknown column 'alive' in 'field list'
Putting quote marks in strings
Since I am discussing quoting, I will end with how to put quote marks into strings.
There are 3 ways to embed quotes in a string:
1) mix single and double quotes:
mysql> SELECT 'I say "OK".' , "I say 'OK'." , 'I say ''OK''.';
+-------------+-------------+-------------+
| I say "OK". | I say 'OK'. | I say 'OK'. |
+-------------+-------------+-------------+
| I say "OK". | I say 'OK'. | I say 'OK'. |
+-------------+-------------+-------------+
1 row in set (0.00 sec)
2) Use 2 quotes in a row to escape quotes:
mysql> SELECT 'I say ''OK''.' , "I say ""OK"".";
+-------------+-------------+
| I say 'OK'. | I say "OK". |
+-------------+-------------+
| I say 'OK'. | I say "OK". |
+-------------+-------------+
1 row in set (0.00 sec)
3) Use the backslash to escape quotes:
mysql> SELECT 'I say \'OK\'.' , "I say \"OK\".";
+-------------+-------------+
| I say 'OK'. | I say "OK". |
+-------------+-------------+
| I say 'OK'. | I say "OK". |
+-------------+-------------+
1 row in set (0.00 sec)
(Interestingly enough, I wrote a blog post on this topic almost exactly 2 years ago, too!)
OpenSQLCamp Boston Pages are online
OpenSQLCamp is less than 4 months away, and I have finally gotten around to updating the site. Special thanks go to Bradley Kuzsmaul and the folks at Tokutek for getting the ball rolling and making the reservation at MIT. Using MIT means that we will have *free* reliable wireless guest access and projects.
OpenSQL Camp is a free unconference for people interested in open source databases (MySQL, SQLite, Postgres, Drizzle), including non-relational databases, database alternatives like NoSQL stores, and database tools such as Gearman. We are not focusing on any one project, and hope to see representatives from a variety of open source database projects attend. As usual I am one of the main organizers of Open SQL Camp (in previous years, Baron Schwartz, Selena Deckelmann and Eric Day have been main organizers too; this year Bradley Kuzsmaul is the other main organizer). The target audience are users and developers, but others are encouraged to attend too. There will be both presentations and hackathons, with plenty of opportunities to learn, contribute, and collaborate!
I have updated the main Boston 2010 page at http://opensqlcamp.org/Events/Boston2010/ with travel and logistics information, including links to:
Register — it’s free and easy, and you can always change your mind later!
Maybe you have an idea for a session you would like to see, or a session you would like to give? If so, you can note it on the sessions page. This will give everyone a sense of what type of presentations will be there. I have started by putting 2 sessions I am willing to give and a third at the bottom for one I’d like to see, to give everyone an idea of both types of descriptions.
Probably the most important link right now is the way we keep OpenSQLCamp free for all attendees – sponsor or donate to the conference! Any donation amount is accepted, and all donations are tax-exempt to the fullest extent of the law. Businesses and organizations will be listed as sponsors if they make a donation of $250 or more, and individuals will be listed as sponsors if they make a donation of $100 or more. More information on sponsor benefits, including where to send a graphic to, at the link.
There is a preliminary schedule, up until the conference itself it will only show the agenda of the conference — how many rooms and what time the presentations are supposed to be. During and after the conference we will update this schedule page with the titles, presenters and links to any notes/videos/audio taken.
If you have any questions, please do not hesitate to ask on the mailing list or by posting a comment here.
Next Week’s MySQL Sessions at ODTUG Kaleidoscope
By now you know that there is a MySQL Track during next week’s ODTUG Kaleidoscope in Washington, DC. Ronald Bradford and I organized the schedule at the last minute (Ronald did a lot of the work!). It was difficult to fill a schedule with 19 sessions that are either 1 hour or 1.5 hours long, and to do it I ended up with three presentations.
At each presentation I will be giving away a copy of The MySQL Administrator’s Bible, so be sure to show up! All MySQL track sessions are in Maryland C, and all times are Eastern.
On Monday, June 28th from 4 pm – 5:30 pm I will be presenting “What do you mean, SQL Syntax Error?”, a presentation about how MySQL’s SQL syntax extends and deviates from the ANSI/ISO SQL:2003 standard. There is an 80-page PDF accompaniment that will be given out for free during this session.
On Tuesday, June 29th from 11 am to 12 noon I will be presenting Importing and Exporting Data with MySQL, about the many tools to load and bulk load data, and how to export data for regular and bulk loads. I will also be going over which storage engines are particularly well-suited for bulk loading, and the caveats to watch out for. This session is useful for those who know MySQL as well as those asking the question, “What’s the equivalent of Oracle’s SQL Loader for MySQL?”
On Wednesday, June 30th from 8:30 am to 9:30 am I will be presenting Navigating MySQL Stored Procedures & Functions, Views and Triggers, which covers all the ways stored procedures, stored functions, views and triggers can be used, including a highlight of Oracle differences.
I hope to see you there!

The She-BA