Making Queries 45-90 Times Faster!!

Feature:

Things to Avoid in Queries
Subqueries and Correlated subqueries
http://dev.mysql.com/doc/refman/4.1/en/correlated-subqueries.html

Jan Kneschke’s post on Groupwise Maximum:
http://jan.kneschke.de/projects/mysql/groupwise-max

Calculated comparisons do not use indexes

INSERT IGNORE
REPLACE
INSERT…ON DUPLICATE KEY

Feedback:

Email podcast@technocation.org

call the comment line at +1 617-674-2369

use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

or use the Technocation forums:
http://technocation.org/forum

Direct play this episode at:
http://technocation.org/content/oursql-episode-22%3A-things-avoid-mysql-queries
On Monday August 20th, plague about it 2007, obesity the Google Summer of Code officially ended. I have had a great time this summer, web although it has not always been sunshine and flowers! Because of the nature of the Summer of Code, setbacks due to lack of knowledge were not a problem. It’s expected that the students don’t know everything!

So mostly the setbacks were organizational. I had 2 students working on MySQL Auditing Software, which I have tentatively (and very geekily) called OughtToAudit. One student was working on the administrative interface, where access to the auditing program and the auditing rules themselves are defined. As well, reporting on suspicious activity as well as the rule-breaking activity could be seen. The other student was working on a pcap (libpcap, winpcap) engine to store all database traffic. Why pcap? One of the main tenets of auditing is that the auditing system is independent of the system to be audited. Part of this is for control purposes, so that the DBA is not the final arbiter of what’s in the auditing system — that can be owned by someone else, so that the DBA can be watched, too (just 2 months ago a report came out about a DBA stealing sensitive data, http://tinyurl.com/2xpjmz).

The community bonding period was great. I did not want to code during that time, I wanted to have the students learn more about auditing, and get to be part of the community. Well, only one student had time during that period, and looking back on it, he had more to learn, so I should have had him start. I also wasn’t as organized as I could have been and was planning on using the community bonding time to write up a spec, which was late.

The coding started a bit late because both students had finals the first week in June. And then I got married the 2nd week in June and went on a 2-week honeymoon, which did not help matters. I thought my vacation would be 3 solid weeks into the Summer of Code, but it ended up being about 2 non-solid weeks (say, 1.5 actual weeks). So just when the questions started coming to the forefront, I was gone. The best laid plans and all that, I guess.

After my honeymoon it was July, and I scrambled to get organized and be the best help I could. I succeeded, but I really needed a push to get myself more motivated. Basically I did not do as much as I should have in the first half. During or just after the midterm, we established a schedule of twice-weekly conference calls (5 pm my time, 10 pm for one student, 11 pm for another, on Wednesdays and Sundays). This helped a lot, and sometimes one or more folks couldn’t make it, and that’s OK, because we had them twice a week.

From my point of view, there were not any surprises, though things did take longer than I expected, as I misjudged skills and knowledge of both students at different points, in different directions — that is, I thought both students were both better and worse at different parts of their projects, so some parts went faster and others went slower.

The outcome so far is this: we are at about an 0.7 or 0.8 release, not ready even for alpha until we can integrate a few things. We have overcome a lot of challenges, and both students know a lot more about MySQL and auditing than they did before, and got good coding experience. Which was the point of the Google Summer of Code. MySQL is closer to having auditing software, though I’d have hoped we’d have gotten a bit further than we have. But we’ve agreed to meet once a month, now that the students go back to jobs and school, and continue to work on it.

All in all, it was a good experience. Had I to do it over, I’d have done many things similarly. I would start with the conference calls from the beginning and not been overconfident in the beginning, and used the community bonding period to do what the students wanted instead of holding them back.
On Monday August 20th, plague about it 2007, obesity the Google Summer of Code officially ended. I have had a great time this summer, web although it has not always been sunshine and flowers! Because of the nature of the Summer of Code, setbacks due to lack of knowledge were not a problem. It’s expected that the students don’t know everything!

So mostly the setbacks were organizational. I had 2 students working on MySQL Auditing Software, which I have tentatively (and very geekily) called OughtToAudit. One student was working on the administrative interface, where access to the auditing program and the auditing rules themselves are defined. As well, reporting on suspicious activity as well as the rule-breaking activity could be seen. The other student was working on a pcap (libpcap, winpcap) engine to store all database traffic. Why pcap? One of the main tenets of auditing is that the auditing system is independent of the system to be audited. Part of this is for control purposes, so that the DBA is not the final arbiter of what’s in the auditing system — that can be owned by someone else, so that the DBA can be watched, too (just 2 months ago a report came out about a DBA stealing sensitive data, http://tinyurl.com/2xpjmz).

The community bonding period was great. I did not want to code during that time, I wanted to have the students learn more about auditing, and get to be part of the community. Well, only one student had time during that period, and looking back on it, he had more to learn, so I should have had him start. I also wasn’t as organized as I could have been and was planning on using the community bonding time to write up a spec, which was late.

The coding started a bit late because both students had finals the first week in June. And then I got married the 2nd week in June and went on a 2-week honeymoon, which did not help matters. I thought my vacation would be 3 solid weeks into the Summer of Code, but it ended up being about 2 non-solid weeks (say, 1.5 actual weeks). So just when the questions started coming to the forefront, I was gone. The best laid plans and all that, I guess.

After my honeymoon it was July, and I scrambled to get organized and be the best help I could. I succeeded, but I really needed a push to get myself more motivated. Basically I did not do as much as I should have in the first half. During or just after the midterm, we established a schedule of twice-weekly conference calls (5 pm my time, 10 pm for one student, 11 pm for another, on Wednesdays and Sundays). This helped a lot, and sometimes one or more folks couldn’t make it, and that’s OK, because we had them twice a week.

From my point of view, there were not any surprises, though things did take longer than I expected, as I misjudged skills and knowledge of both students at different points, in different directions — that is, I thought both students were both better and worse at different parts of their projects, so some parts went faster and others went slower.

The outcome so far is this: we are at about an 0.7 or 0.8 release, not ready even for alpha until we can integrate a few things. We have overcome a lot of challenges, and both students know a lot more about MySQL and auditing than they did before, and got good coding experience. Which was the point of the Google Summer of Code. MySQL is closer to having auditing software, though I’d have hoped we’d have gotten a bit further than we have. But we’ve agreed to meet once a month, now that the students go back to jobs and school, and continue to work on it.

All in all, it was a good experience. Had I to do it over, I’d have done many things similarly. I would start with the conference calls from the beginning and not been overconfident in the beginning, and used the community bonding period to do what the students wanted instead of holding them back.
Administrative note:

I had a bit of a mishap involving my hand and a glass door pane. The OurSQL podcast will be on hiatus for a few weeks as I recover. I injured my left hand, order and I’m a lefty, unhealthy so I’m typing one-handed these days. I apologize for the break in the show schedule, and hope you’ll be able to hear new podcasts about MySQL very soon!
On Monday August 20th, plague about it 2007, obesity the Google Summer of Code officially ended. I have had a great time this summer, web although it has not always been sunshine and flowers! Because of the nature of the Summer of Code, setbacks due to lack of knowledge were not a problem. It’s expected that the students don’t know everything!

So mostly the setbacks were organizational. I had 2 students working on MySQL Auditing Software, which I have tentatively (and very geekily) called OughtToAudit. One student was working on the administrative interface, where access to the auditing program and the auditing rules themselves are defined. As well, reporting on suspicious activity as well as the rule-breaking activity could be seen. The other student was working on a pcap (libpcap, winpcap) engine to store all database traffic. Why pcap? One of the main tenets of auditing is that the auditing system is independent of the system to be audited. Part of this is for control purposes, so that the DBA is not the final arbiter of what’s in the auditing system — that can be owned by someone else, so that the DBA can be watched, too (just 2 months ago a report came out about a DBA stealing sensitive data, http://tinyurl.com/2xpjmz).

The community bonding period was great. I did not want to code during that time, I wanted to have the students learn more about auditing, and get to be part of the community. Well, only one student had time during that period, and looking back on it, he had more to learn, so I should have had him start. I also wasn’t as organized as I could have been and was planning on using the community bonding time to write up a spec, which was late.

The coding started a bit late because both students had finals the first week in June. And then I got married the 2nd week in June and went on a 2-week honeymoon, which did not help matters. I thought my vacation would be 3 solid weeks into the Summer of Code, but it ended up being about 2 non-solid weeks (say, 1.5 actual weeks). So just when the questions started coming to the forefront, I was gone. The best laid plans and all that, I guess.

After my honeymoon it was July, and I scrambled to get organized and be the best help I could. I succeeded, but I really needed a push to get myself more motivated. Basically I did not do as much as I should have in the first half. During or just after the midterm, we established a schedule of twice-weekly conference calls (5 pm my time, 10 pm for one student, 11 pm for another, on Wednesdays and Sundays). This helped a lot, and sometimes one or more folks couldn’t make it, and that’s OK, because we had them twice a week.

From my point of view, there were not any surprises, though things did take longer than I expected, as I misjudged skills and knowledge of both students at different points, in different directions — that is, I thought both students were both better and worse at different parts of their projects, so some parts went faster and others went slower.

The outcome so far is this: we are at about an 0.7 or 0.8 release, not ready even for alpha until we can integrate a few things. We have overcome a lot of challenges, and both students know a lot more about MySQL and auditing than they did before, and got good coding experience. Which was the point of the Google Summer of Code. MySQL is closer to having auditing software, though I’d have hoped we’d have gotten a bit further than we have. But we’ve agreed to meet once a month, now that the students go back to jobs and school, and continue to work on it.

All in all, it was a good experience. Had I to do it over, I’d have done many things similarly. I would start with the conference calls from the beginning and not been overconfident in the beginning, and used the community bonding period to do what the students wanted instead of holding them back.
Administrative note:

I had a bit of a mishap involving my hand and a glass door pane. The OurSQL podcast will be on hiatus for a few weeks as I recover. I injured my left hand, order and I’m a lefty, unhealthy so I’m typing one-handed these days. I apologize for the break in the show schedule, and hope you’ll be able to hear new podcasts about MySQL very soon!
I was pointed to an article about how the “one size fits all” database model doesn’t work anymore — how Oracle, otolaryngologist DB2 and Ingres were written so long ago, order they’d have to be rewritten to meet the needs of today’s database users. Jacob Nikom pointed the article to me; apparently he contacted the author and started to explain how MySQL meets that criteria, pills but the author disagreed.

Read the article for yourself:
http://www.databasecolumn.com/2007/09/one-size-fits-all.html

Anyone else notice the irony of saying “all those other DBMS’s aren’t a one-size-fits-all solution, but this one is?” I think that MySQL comes the closest to a DBMS that is NOT “one size fits all”, given the multiple storage engines available. What other DBMS will allow you to use your Amazon S3 account as a table? What other DBMS will allow you to use a .csv file as a table without importing? It’s not perfect, because there’s still a basic layer of functional implementation that the MySQL server handles (and must), but it’s the closest.

Note that the article is written by the founder and CTO of the software solution mentioned.
On Monday August 20th, plague about it 2007, obesity the Google Summer of Code officially ended. I have had a great time this summer, web although it has not always been sunshine and flowers! Because of the nature of the Summer of Code, setbacks due to lack of knowledge were not a problem. It’s expected that the students don’t know everything!

So mostly the setbacks were organizational. I had 2 students working on MySQL Auditing Software, which I have tentatively (and very geekily) called OughtToAudit. One student was working on the administrative interface, where access to the auditing program and the auditing rules themselves are defined. As well, reporting on suspicious activity as well as the rule-breaking activity could be seen. The other student was working on a pcap (libpcap, winpcap) engine to store all database traffic. Why pcap? One of the main tenets of auditing is that the auditing system is independent of the system to be audited. Part of this is for control purposes, so that the DBA is not the final arbiter of what’s in the auditing system — that can be owned by someone else, so that the DBA can be watched, too (just 2 months ago a report came out about a DBA stealing sensitive data, http://tinyurl.com/2xpjmz).

The community bonding period was great. I did not want to code during that time, I wanted to have the students learn more about auditing, and get to be part of the community. Well, only one student had time during that period, and looking back on it, he had more to learn, so I should have had him start. I also wasn’t as organized as I could have been and was planning on using the community bonding time to write up a spec, which was late.

The coding started a bit late because both students had finals the first week in June. And then I got married the 2nd week in June and went on a 2-week honeymoon, which did not help matters. I thought my vacation would be 3 solid weeks into the Summer of Code, but it ended up being about 2 non-solid weeks (say, 1.5 actual weeks). So just when the questions started coming to the forefront, I was gone. The best laid plans and all that, I guess.

After my honeymoon it was July, and I scrambled to get organized and be the best help I could. I succeeded, but I really needed a push to get myself more motivated. Basically I did not do as much as I should have in the first half. During or just after the midterm, we established a schedule of twice-weekly conference calls (5 pm my time, 10 pm for one student, 11 pm for another, on Wednesdays and Sundays). This helped a lot, and sometimes one or more folks couldn’t make it, and that’s OK, because we had them twice a week.

From my point of view, there were not any surprises, though things did take longer than I expected, as I misjudged skills and knowledge of both students at different points, in different directions — that is, I thought both students were both better and worse at different parts of their projects, so some parts went faster and others went slower.

The outcome so far is this: we are at about an 0.7 or 0.8 release, not ready even for alpha until we can integrate a few things. We have overcome a lot of challenges, and both students know a lot more about MySQL and auditing than they did before, and got good coding experience. Which was the point of the Google Summer of Code. MySQL is closer to having auditing software, though I’d have hoped we’d have gotten a bit further than we have. But we’ve agreed to meet once a month, now that the students go back to jobs and school, and continue to work on it.

All in all, it was a good experience. Had I to do it over, I’d have done many things similarly. I would start with the conference calls from the beginning and not been overconfident in the beginning, and used the community bonding period to do what the students wanted instead of holding them back.
Administrative note:

I had a bit of a mishap involving my hand and a glass door pane. The OurSQL podcast will be on hiatus for a few weeks as I recover. I injured my left hand, order and I’m a lefty, unhealthy so I’m typing one-handed these days. I apologize for the break in the show schedule, and hope you’ll be able to hear new podcasts about MySQL very soon!
I was pointed to an article about how the “one size fits all” database model doesn’t work anymore — how Oracle, otolaryngologist DB2 and Ingres were written so long ago, order they’d have to be rewritten to meet the needs of today’s database users. Jacob Nikom pointed the article to me; apparently he contacted the author and started to explain how MySQL meets that criteria, pills but the author disagreed.

Read the article for yourself:
http://www.databasecolumn.com/2007/09/one-size-fits-all.html

Anyone else notice the irony of saying “all those other DBMS’s aren’t a one-size-fits-all solution, but this one is?” I think that MySQL comes the closest to a DBMS that is NOT “one size fits all”, given the multiple storage engines available. What other DBMS will allow you to use your Amazon S3 account as a table? What other DBMS will allow you to use a .csv file as a table without importing? It’s not perfect, because there’s still a basic layer of functional implementation that the MySQL server handles (and must), but it’s the closest.

Note that the article is written by the founder and CTO of the software solution mentioned.
I know I’ve been away from the MySQL community for a bit….my hand injury is healing nicely, viagra and I was able to concentrate time on things that required less typing and more mouse clicking. One such thing was the site overhaul of http://www.technocation.org to be easier to use on my side for things like embedding video and mp3 files. More user friendly for you, with regards to article names. Please feel free to vote on whether you like the new site (poll is at the top of the home page, or direct link at http://technocation.org/content/do-you-new-technocation-site%3F). You can add a comment to the poll, too, if you want to voice your opinion.

I do hope to get back into podcasting, and have one for next week. Coming very soon: Videos from MySQL Camp!
On Monday August 20th, plague about it 2007, obesity the Google Summer of Code officially ended. I have had a great time this summer, web although it has not always been sunshine and flowers! Because of the nature of the Summer of Code, setbacks due to lack of knowledge were not a problem. It’s expected that the students don’t know everything!

So mostly the setbacks were organizational. I had 2 students working on MySQL Auditing Software, which I have tentatively (and very geekily) called OughtToAudit. One student was working on the administrative interface, where access to the auditing program and the auditing rules themselves are defined. As well, reporting on suspicious activity as well as the rule-breaking activity could be seen. The other student was working on a pcap (libpcap, winpcap) engine to store all database traffic. Why pcap? One of the main tenets of auditing is that the auditing system is independent of the system to be audited. Part of this is for control purposes, so that the DBA is not the final arbiter of what’s in the auditing system — that can be owned by someone else, so that the DBA can be watched, too (just 2 months ago a report came out about a DBA stealing sensitive data, http://tinyurl.com/2xpjmz).

The community bonding period was great. I did not want to code during that time, I wanted to have the students learn more about auditing, and get to be part of the community. Well, only one student had time during that period, and looking back on it, he had more to learn, so I should have had him start. I also wasn’t as organized as I could have been and was planning on using the community bonding time to write up a spec, which was late.

The coding started a bit late because both students had finals the first week in June. And then I got married the 2nd week in June and went on a 2-week honeymoon, which did not help matters. I thought my vacation would be 3 solid weeks into the Summer of Code, but it ended up being about 2 non-solid weeks (say, 1.5 actual weeks). So just when the questions started coming to the forefront, I was gone. The best laid plans and all that, I guess.

After my honeymoon it was July, and I scrambled to get organized and be the best help I could. I succeeded, but I really needed a push to get myself more motivated. Basically I did not do as much as I should have in the first half. During or just after the midterm, we established a schedule of twice-weekly conference calls (5 pm my time, 10 pm for one student, 11 pm for another, on Wednesdays and Sundays). This helped a lot, and sometimes one or more folks couldn’t make it, and that’s OK, because we had them twice a week.

From my point of view, there were not any surprises, though things did take longer than I expected, as I misjudged skills and knowledge of both students at different points, in different directions — that is, I thought both students were both better and worse at different parts of their projects, so some parts went faster and others went slower.

The outcome so far is this: we are at about an 0.7 or 0.8 release, not ready even for alpha until we can integrate a few things. We have overcome a lot of challenges, and both students know a lot more about MySQL and auditing than they did before, and got good coding experience. Which was the point of the Google Summer of Code. MySQL is closer to having auditing software, though I’d have hoped we’d have gotten a bit further than we have. But we’ve agreed to meet once a month, now that the students go back to jobs and school, and continue to work on it.

All in all, it was a good experience. Had I to do it over, I’d have done many things similarly. I would start with the conference calls from the beginning and not been overconfident in the beginning, and used the community bonding period to do what the students wanted instead of holding them back.
Administrative note:

I had a bit of a mishap involving my hand and a glass door pane. The OurSQL podcast will be on hiatus for a few weeks as I recover. I injured my left hand, order and I’m a lefty, unhealthy so I’m typing one-handed these days. I apologize for the break in the show schedule, and hope you’ll be able to hear new podcasts about MySQL very soon!
I was pointed to an article about how the “one size fits all” database model doesn’t work anymore — how Oracle, otolaryngologist DB2 and Ingres were written so long ago, order they’d have to be rewritten to meet the needs of today’s database users. Jacob Nikom pointed the article to me; apparently he contacted the author and started to explain how MySQL meets that criteria, pills but the author disagreed.

Read the article for yourself:
http://www.databasecolumn.com/2007/09/one-size-fits-all.html

Anyone else notice the irony of saying “all those other DBMS’s aren’t a one-size-fits-all solution, but this one is?” I think that MySQL comes the closest to a DBMS that is NOT “one size fits all”, given the multiple storage engines available. What other DBMS will allow you to use your Amazon S3 account as a table? What other DBMS will allow you to use a .csv file as a table without importing? It’s not perfect, because there’s still a basic layer of functional implementation that the MySQL server handles (and must), but it’s the closest.

Note that the article is written by the founder and CTO of the software solution mentioned.
I know I’ve been away from the MySQL community for a bit….my hand injury is healing nicely, viagra and I was able to concentrate time on things that required less typing and more mouse clicking. One such thing was the site overhaul of http://www.technocation.org to be easier to use on my side for things like embedding video and mp3 files. More user friendly for you, with regards to article names. Please feel free to vote on whether you like the new site (poll is at the top of the home page, or direct link at http://technocation.org/content/do-you-new-technocation-site%3F). You can add a comment to the poll, too, if you want to voice your opinion.

I do hope to get back into podcasting, and have one for next week. Coming very soon: Videos from MySQL Camp!
Since I know a lot of MySQLers use Perl, epilepsy I wanted to pass this along. Today was the first I’d heard of this survey, pulmonologist so I’m thinking that there are a lot of other folks who use Perl occasionally as I do (or even regularly) that are in the dark. Apparently it began in late July, and announced at OSCON 2007, so I apologize if you’ve heard about it over and over.

Take the survey now, as you only have until September 30th to do so!

http://perlsurvey.org/
Skoll is a Community-Based Testing project out of the University of Maryland. Their first testing framework comes for MySQL. Watch Sandro Fouché, order buy cialis graduate researcher on this project, prescription take you through what Skoll is, cheap how it’s beneficial, and how you can use it with an actual demo. The Skoll testing client for MySQL can be downloaded here:
http://www.cs.umd.edu/projects/skoll/contribute/

The video can be played or downloaded using the “play” or “download” link from the original article at http://www.technocation.org.

Skoll is a Community-Based Testing project out of the University of Maryland. Their first testing framework comes for MySQL. Watch Sandro Fouché, order buy cialis graduate researcher on this project, prescription take you through what Skoll is, cheap how it’s beneficial, and how you can use it with an actual demo. The Skoll testing client for MySQL can be downloaded here:
http://www.cs.umd.edu/projects/skoll/contribute/

The video can be played or downloaded using the “play” or “download” link from the original article at http://www.technocation.org.

aka…..”when good queries go bad!”

So, information pills today the developers were debugging why a script was running much longer than expected. They were doing text database inserts, try and got to the point where they realized that double the amount of text meant the queries took double the amount of time.

You see, viagra they were doing similar text inserts over and over, instead of using connection pooling and/or batching them. Apparently the other DBA explained that it was a limitation of MySQL, but either the developers didn’t convey what they were doing well, or the DBA didn’t think to mention batching.

I ran a simple test on a test server. I used the commandline to connect to a db server on the same machine (even though in qa and production the db machine is on a different machine) just to make a point:

Queries per connect

Type Connects Queries Length of data transmitted Time
One-off 1000 1 619 bytes 12.232s
Single Connection 1 1000 604 kilobytes 0.268s
Batch 1 1 517 kilobytes 0.135s

So 1000 INSERTs using 1 connection is over 45 times faster than 1000 INSERTs using 1000 connections.
Using 1 batch INSERT statement is over 1.75 times faster than using 1 connection.
Using 1 batch INSERT statement is over 90 times faster than 1000 INSERTs using 1000 connections.

Note that while it’s faster to send a batch, if you don’t support sending 517 kilobytes to your database at once, you’ll want to break it up. That’s a small coding price to pay for 90x the database performance!!!

For reference, the formats used:
One-off:
INSERT INTO foo (col1, col2…) VALUES (val1, val2…);

Single Connection:
INSERT INTO foo (col1, col2…) VALUES (val1, val2…);
INSERT INTO foo (col1, col2…) VALUES (val1a, val2a…);

Batch: INSERT INTO foo (col1, col2…) VALUES (val1, val2…), (val1a, val2a);

Comments are closed.