Category Archives: Performance

Indexing Talk Online

I am doing a quick blog post to announce that I have put an indexing talk online*. Most recently, I delivered this indexing talk at Confoo and Scale 11x.

The talk is on YouTube at Are You Getting the Best Out of Your MySQL Indexes? There are also PDF slides.
From the official conference description, if you want to know more:
MySQL indexes are often used to make performance better. However, they can make performance suffer if you are not using them properly. Oracle ACE Director Sheeri Cabral explains the pitfalls to avoid with indexes and how to utilize compound indexes to maximize index availability with the least amount of write overhead.

*I know I have not been posting blogs for a long time. This was a very busy year, and I took March through July off from conferences in order to buy a house and move.

Picking Up Where You Left Off……

I started this as a response to Keith Murphy’s post at, but it got long, so it deserves its own post. The basic context is figuring out how not to cause duplicate information if a large INSERT statement fails before finishing.

Firstly, the surefire way to make sure there are no duplicates if you have a unique (or primary) key is to use INSERT IGNORE INTO.

Secondly, I just experimented with adding an index to an InnoDB table that had 1 million rows, and here’s what I got (please note, this is one experience only, the plural of “anecdote” is *not* “data”; also I did this in this particular order, so there may have been caching taking place):

Way #1:
– ALTER the table to add the new index. This was the slowest method, taking over 13 minutes.

Way #2:
– CREATE a new table with the same schema as the old except for adding the new index
– INSERT INTO newtable SELECT * FROM oldtable;
– ALTER TABLE oldtable RENAME somethingdifferent;
– ALTER TABLE newtable RENAME oldtable;

The ALTER TABLEs happen instantly. This was faster by a few seconds, which is statistically negligible given the 13+ minutes total time.

Way #3:
– mysqldump the table schema only (–no-data) into a file (tableschema.sql).

– mysqldump the table data only (-t) into another file (tabledata.sql).
– optionally pipe into awk to replace “^INSERT INTO” with “INSERT IGNORE INTO”

– edit the table schema file, adding the new index into the table definition
– optionally change the name of the table to something like newtable, making sure to change the DROP TABLE *and* CREATE TABLE statements.

– mysql < tableschema.sql (this will drop the old table unless you changed the name) - mysql < tabledata.sql () - If you changed the table name in the DROP and CREATE statements, run - ALTER TABLE oldtable RENAME somethingdifferent; and ALTER TABLE newtable RENAME oldtable; - Delete the "somethingdifferent" table This way took just over 10 minutes, 3 minutes faster than the other 2 ways, for a time savings of 25%. CAVEAT: MySQL helpfully moves references on a table to the new table name when you ALTER TABLE...RENAME. You will have to adjust your foreign keys, stored procedures, functions and triggers if you use anything other than Way #1. CAVEAT #2: Make sure that the character set of the MySQL server is supported by the MySQL client and the operating system where you're dumping the file to, otherwise special characters can end up falling victim to mojibake.

Top 10 MySQL Best Practices

So, O’Reilly’s has published the “Top 10 MySQL Best Practices” at Sadly, I find most “best practice” list do not thoroughly explain the “why” enough so that people can make their own decisions.

For instance, #3 is “Protect the MySQL installation directory from access by other users.” I was intrigued at what they would consider the “installation” directory. By reading the tip, they actually mean the data directory. They say nothing of the log directory, nor that innodb data files may be in different places than the standard myisam data directories.

They perpetuate a myth in #4, “Don’t store binary data in MySQL.” What they really mean is “don’t store large data in MySQL”, which they go into in the tip. While it’s true that there is very little benefit to having binary data in a database, they don’t go into what those benefits are. This means that people can’t make informed decisions, just “the best practice is this so I’m doing it.”

The benefit of putting binary data in MySQL is to be able to associate metadata and other data. For instance, “user 200 owns file 483”. If user 200 is gone from the system, how can you make sure file 483 is as well? There’s no referential integrity unless it’s in the database. While it’s true that in most cases people would rather sacrifice the referential integrity for things like faster database backups and easier partitioning of large data objects, I believe in giving people full disclosure so they can make their own informed decision.

#5 is my biggest pet peeve. “Stick to ANSI SQL,” with the goal being to be able to migrate to a different platform without having to rewrite the code. Does anyone tell Oracle folks not to use pl/sql like collections? Nobody says “SQL is a declarative language, pl/sql is procedural therefore you should never use it”. How about SQL Server folks not to use transact-sql statements like WAITFOR? MATCH… AGAINST is not standard SQL, so I should never use it?

Now, of course, if you’re selling a product to be run on different database platforms, then sure, you want to be platform agnostic. But you’d know that from the start. And if you have to migrate platforms you’re going to have to do lots of work anyway, because there are third-party additions to all the software any way.

And why would *anyone* choose a specific database, and then *not* use those features? I think that it’s a good tip to stick to ANSI SQL if you *know* you want to, or if you have no idea about the DBMS you’re using.

If you want to see how this cripples MySQL, check out Visibone’s SQL chart at: — you can buy it here: I may post later on about my own personal MySQL Best Practices….

Making Queries 45-90 Times Faster!!

aka…..”when good queries go bad!”

So, today the developers were debugging why a script was running much longer than expected. They were doing text database inserts, and got to the point where they realized that double the amount of text meant the queries took double the amount of time.

You see, they were doing similar text inserts over and over, instead of using connection pooling and/or batching them. Apparently the other DBA explained that it was a limitation of MySQL, but either the developers didn’t convey what they were doing well, or the DBA didn’t think to mention batching.

I ran a simple test on a test server. I used the commandline to connect to a db server on the same machine (even though in qa and production the db machine is on a different machine) just to make a point:

Queries per connect

Type Connects Queries Length of data transmitted Time
One-off 1000 1 619 bytes 12.232s
Single Connection 1 1000 604 kilobytes 0.268s
Batch 1 1 517 kilobytes 0.135s

So 1000 INSERTs using 1 connection is over 45 times faster than 1000 INSERTs using 1000 connections.
Using 1 batch INSERT statement is over 1.75 times faster than using 1 connection.
Using 1 batch INSERT statement is over 90 times faster than 1000 INSERTs using 1000 connections.

Note that while it’s faster to send a batch, if you don’t support sending 517 kilobytes to your database at once, you’ll want to break it up. That’s a small coding price to pay for 90x the database performance!!!

For reference, the formats used:
INSERT INTO foo (col1, col2…) VALUES (val1, val2…);

Single Connection:
INSERT INTO foo (col1, col2…) VALUES (val1, val2…);
INSERT INTO foo (col1, col2…) VALUES (val1a, val2a…);

Batch: INSERT INTO foo (col1, col2…) VALUES (val1, val2…), (val1a, val2a);

OurSQL Episode 22: Things To Avoid With MySQL Queries


Things to Avoid in Queries
Subqueries and Correlated subqueries

Jan Kneschke’s post on Groupwise Maximum:

Calculated comparisons do not use indexes




call the comment line at +1 617-674-2369

use Odeo to leave a voice mail through your computer:

or use the Technocation forums:

Direct play this episode at:

OurSQL Episode 19: MySQL Proxy

Direct play the episode at:


Call the comment line at +1 617-674-2369.

MySQL Focuses on Japan

MySQL Associate Certification Now Available

Learning resource:
Pythian Group’s Carnival of the Vanities for the DBA community, published weekly on Fridays.

MySQL Proxy
Giuseppe Maxia’s Blog:

Getting Started with MySQL Proxy article plus tutorials:

public Subversion tree:

Intercept and dump queries (part 1):

Make macros to map “cd” to “use” and “ls” to “show tables” (part 2):

Injection Queries (part 3):

Lua interpreted language:

OurSQL Episode 18: De-myth-tifying Indexes

Direct play this episode at:



call the comment line at +1 617-674-2369

use Odeo to leave a voice mail through your computer:

Or use the Technocation forums:

Episode 18: De-Myth-tifying Indexes

Where I have been:
Wedding video (short) and photos:

Honeymoon (and wedding) photos:

FREE training for Oracle DBAs who want to learn MySQL

mysqlnd (PHP native driver) needs testers and benchmarkers:

Learning Resource:
The MySQL category at

Podcast Promo:


Big O notation:

BTREE Index podcast:

HASH Index podcast:


The main theme used is Angry Red Dwarf’s “I Dream About You”

Smallfish’s “The Thank you song”

OurSQL Episode 10: How About Some Cache?

This week I talk about the MySQL Query Cache.

Direct play the podcast here:

Subscribe to the podcast by clicking:

You can Direct download all the oursql podcasts at:

Show notes:

Listener Feedback:

Daylight Savings Time and how to check your system:

There’s not much more time left to register for the MySQL Users Conference & Expo before the $200 early bird discount disappears!

Learning Resource:

Check out the 2006 MySQL conference presentation slides by the speakers!

Feature: How about some cache?
The MySQL Manual has a short, very readable chapter on the Query Cache, which starts here:

MySQL Queues, part II — groups of queues

I believe this is a huge optimization for a heavily implemented Web 2.0 idea.

This article makes simple work of groups of queues. An example of this would be “the most recent 10 people to view an article,” so each article has a queue of up to 10 items in it. This method eliminates the need for multiple SQL statements or using TRIGGERS to check to see if the queue is full.

I bow down to Baron Schwartz, aka Xarpb, for his article on how to implement a queue in SQL:

I am very excited because this also works for groups of objects, and we’re about to implement something at work that needs this idea. The idea of “the most recent x things” or “the top x things” is huge, especially in social networking, and probably one of the most often sought after features.

The biggest issue is that in order to display, say, the most recent posts, a query has to find the time of all the posts and only get the most recent 10. This can be made easy by the logic that the 10 most recent posts are the last 10 rows in the table. Any logic is also added, as in “the last 10 rows in the table viewable and for this guest/login.”

What if you want to track the last 10 people to view the post? Aha, this gets trickier. Convention would say that when a person views a post, have an SQL transaction that adds the information (person x viewed post y at time z and anyo other info, such as browser type, IP, etc) and if there are more than 10 entries for that post, delete the oldest ones until you have 10 entries. This transaction could be done via the application code or via triggers in MySQL 5.0 and up.

However, both those methods use multiple SQL queries, and in the case that an article has been viewed fewer than 10 times, the queries are unnecessary. And given each article has a different popularity — some are viewed lots more than others — running multiple queries ends up being a waste of cycles for articles whose last 10 viewers change infrequently.

These commands were tested on MySQL 4.1.19-standard-log. I use REPLACE INTO because it’s shorter than SELECT…ON DUPLICATE KEY UPDATE, and yes, those aren’t

Let’s say you have a New Year’s Resolution to eat 5 servings of fruits and 5 servings of vegetables per day. The only thing that changes from Baron’s example is that we add a group field (called ‘kind’). The “fruit” field was changed to “edible” and will still contain the name of the edible.

As Baron does, I will use a MySQL-specific command. However, he used SELECT...ON DUPLICATE KEY and I will use REPLACE, as it is smaller in syntax.

use test;
id int NOT NULL,
modulo int NOT NULL,
kind char(1) NOT NULL,
food varchar(10) NOT NULL,
PRIMARY KEY(id,kind),
UNIQUE KEY(modulo,kind)

The basic statement is below — I’ve added AS clauses to make the variables more clear. The modulus is, in this case, 5, but in the article case above would be 10. The “kind” is either “f” or “v”, these are your groups of queues. In this case they stand for “fruits” and “vegetables” but they might be numbers referring to articles. The “food” stands for the type of food eaten, but in the article scenario would represent the username or user id of the customer viewing the article.

REPLACE INTO q (id, modulo, kind, food)
(COALESCE(MAX(id), -1) + 1) AS id,
(COALESCE(MAX(id), -1) + 1) MOD 5 AS modulo,
'f' AS kind,
'apple' AS food
FROM q WHERE kind='f';

Continue reading

Lots of Boolean Values

So, the biggest issue from last month’s Boston MySQL Meetup was “What is the most efficient way to query across many values that are boolean in nature?”

Now, it was a bit more complex than that, as it always is. Values may be set or not set, and people may search across any number of boolean values, or all of them. There are over 1 million entries, and over 40 boolean values!

A few ways of doing so came up:

1) Simply put all the values into separate columns, with a BOOLEAN type (really TINYINT(1))
2) Using ENUMs
3) Using foreign keys to “join tables”
4) Using SET
5) Using bits and bit operations

Using real data, the original questioner and I sought to find the best answer for his case.