Selectivity and Index Performance

Let’s look into selectivity, as this is an important topics when looking at index performance. (Oooh, I said “performance”, watch everyone’s ears perk up!).

This will probably answer the questions “Why isn’t MySQL using my index?” or “Why is my query so slow when I have an index on it?”

Selectivity describes how different values of a field are. It is a number from 0-1, although you can also think of it as a percentage. A value of 1, or 100%, means that each value in the field is unique. This happens with UNIQUE and PRIMARY keys, although non-unique fields may have a selectivity of 1 — for example, a timestamp value in a not-often-used table.

To calculate this, you take the total number of DISTINCT records and divide by the total number of records.

My company has a large Users table, so I grabbed some statistics off of that:


+----------+
| count(*) |
+----------+
| 817666 |
+----------+
1 row in set (0.63 sec)

+--------------------------+
| count(distinct username) |
+--------------------------+
| 817666 |
+--------------------------+
1 row in set (1.63 sec)

So the selectivity is 81766/81766, or 1. If this were not a UNIQUE KEY already, it’s a good candidate for one.

the “created” field is a timestamp for when the user record was created
+-------------------------+
| count(distinct created) |
+-------------------------+
| 811227 |
+-------------------------+
1 row in set (2.04 sec)

As I expect, there are *some* duplicates, but for the most part, everyone has a different creation time. 811227/817666 = 0.99.

+--------------------+
| count(distinct IP) |
+--------------------+
| 544694 |
+--------------------+
1 row in set (2.35 sec)

This is interesting — lots of people logon from public places, or use their friends’ computers, so there are duplicate IP’s (the last IP used to login is associated with a user, so it’s a 1-to-1 relationship). The selectivity here is 0.67.

+-------------------------+
| count(distinct browser) |
+-------------------------+
| 25699 |
+-------------------------+
1 row in set (1.70 sec)

This is what the server reports the user’s browser is. It records the last browser used by the user. This gives us about a 0.03 for selectivity.

+---------------------+
| count(distinct age) |
+---------------------+
| 83 |
+---------------------+
1 row in set (0.63 sec)

There are only 83 different reported ages on our site. That makes the selectivity of age 0.000101508. That is very low, effectively zero.

So why is this important? I’m glad you asked….

MySQL has a cost-based optimizer. This means that MySQL calculates the costs of different ways of performing a query and then chooses the cheapest one. Sounds reasonable, right? Well, calculating the costs is an inexact science. In order to calculate the exact cost, the optimizer would actually have to run the query. So an estimate is taken, and the estimate is wrong sometimes. Most of the time the estimate is correct.

In contrast, some database systems allow a rule-based optimizer. This means that no matter what the data state, the database uses rules to figure out the “optimal” path to the query. In most enterprise-level database systems, a cost-based optimizer performs better than a rule-based optimizer. In other words, there are so many exceptions to the rules that the calculation overhead is worth it.

(Just to clarify, in both systems, the correct result set will be generated. The optimizer determines the path to the information.)

This cost-based optimizer uses selectivity information when it decides whether or not to use an index.

But what does this mean for me?

Well, in the example above, this means if you want to query folks by age or age group, it’s useless to put an index on it. It’s a waste of cpu time and disk I/O to have an index for something with such a low selectivity. The optimizer will NEVER use it.

I’ve heard that the optimizer will do a full table scan if it calculates it will return more than 30% of the table. Why is that number so low, why not more like 50 or 75%? Well, first the server has to go to the index, search the index, and find if the index record matches. Then it needs to follow the index record’s pointer to the real record on disk. And the MySQL gurus have decided that around 30% is the place where using the index is slower than just doing a full table scan.

(Note: I’m not exactly sure what the # is but I’ve heard it’s around 30%. As well, at the user conference I saw graphs that showed that for the most part this was true. I thought it was in Jay Pipes’ Performance Tuning presentation, but the graphs are not in the slides. Pointers are appreciated.)

So in this case, should we put an index on browser? Well, this is one of those cases where I’d think about how often we’d be doing queries and how much we care about server performance doing a report versus server performance while inserting or updating a record. If we really care one way or another, go that way. And document!

Another thing to consider is the nature of the data. For something like age, that’s not going to change. Sure, we might have some 120 year olds eventually, but there’s not going to be that much variance. For browsers, there will only be more and more types put out, considering different version numbers and OS configurations of standard browsers as well as mobile phone browsers.

However, if it does not matter, or if it’s too difficult to decide which is more important (your boss says “we can’t be slow during normal usage OR during reporting!”) I default to MySQL — it’s better at optimization than I am. I’d probably put an index so MySQL could decide whether to do a full table scan or use an index, particularly given that I expect the number of browsers to keep increasing.

For something like IP, that has a selectivity of 0.67, so putting an index on it is worth it if we query on IPs a lot.

I hope this article has been helpful!

Jay Pipes has a great article on using the INFORMATION_SCHEMA to find out selectivity:
http://tinyurl.com/kfffp

Let’s look into selectivity, as this is an important topics when looking at index performance. (Oooh, I said “performance”, watch everyone’s ears perk up!).

This will probably answer the questions “Why isn’t MySQL using my index?” or “Why is my query so slow when I have an index on it?”

Selectivity describes how different values of a field are. It is a number from 0-1, although you can also think of it as a percentage. A value of 1, or 100%, means that each value in the field is unique. This happens with UNIQUE and PRIMARY keys, although non-unique fields may have a selectivity of 1 — for example, a timestamp value in a not-often-used table.

To calculate this, you take the total number of DISTINCT records and divide by the total number of records.

My company has a large Users table, so I grabbed some statistics off of that:


+----------+
| count(*) |
+----------+
| 817666 |
+----------+
1 row in set (0.63 sec)

+--------------------------+
| count(distinct username) |
+--------------------------+
| 817666 |
+--------------------------+
1 row in set (1.63 sec)

So the selectivity is 81766/81766, or 1. If this were not a UNIQUE KEY already, it’s a good candidate for one.

the “created” field is a timestamp for when the user record was created
+-------------------------+
| count(distinct created) |
+-------------------------+
| 811227 |
+-------------------------+
1 row in set (2.04 sec)

As I expect, there are *some* duplicates, but for the most part, everyone has a different creation time. 811227/817666 = 0.99.

+--------------------+
| count(distinct IP) |
+--------------------+
| 544694 |
+--------------------+
1 row in set (2.35 sec)

This is interesting — lots of people logon from public places, or use their friends’ computers, so there are duplicate IP’s (the last IP used to login is associated with a user, so it’s a 1-to-1 relationship). The selectivity here is 0.67.

+-------------------------+
| count(distinct browser) |
+-------------------------+
| 25699 |
+-------------------------+
1 row in set (1.70 sec)

This is what the server reports the user’s browser is. It records the last browser used by the user. This gives us about a 0.03 for selectivity.

+---------------------+
| count(distinct age) |
+---------------------+
| 83 |
+---------------------+
1 row in set (0.63 sec)

There are only 83 different reported ages on our site. That makes the selectivity of age 0.000101508. That is very low, effectively zero.

So why is this important? I’m glad you asked….

MySQL has a cost-based optimizer. This means that MySQL calculates the costs of different ways of performing a query and then chooses the cheapest one. Sounds reasonable, right? Well, calculating the costs is an inexact science. In order to calculate the exact cost, the optimizer would actually have to run the query. So an estimate is taken, and the estimate is wrong sometimes. Most of the time the estimate is correct.

In contrast, some database systems allow a rule-based optimizer. This means that no matter what the data state, the database uses rules to figure out the “optimal” path to the query. In most enterprise-level database systems, a cost-based optimizer performs better than a rule-based optimizer. In other words, there are so many exceptions to the rules that the calculation overhead is worth it.

(Just to clarify, in both systems, the correct result set will be generated. The optimizer determines the path to the information.)

This cost-based optimizer uses selectivity information when it decides whether or not to use an index.

But what does this mean for me?

Well, in the example above, this means if you want to query folks by age or age group, it’s useless to put an index on it. It’s a waste of cpu time and disk I/O to have an index for something with such a low selectivity. The optimizer will NEVER use it.

I’ve heard that the optimizer will do a full table scan if it calculates it will return more than 30% of the table. Why is that number so low, why not more like 50 or 75%? Well, first the server has to go to the index, search the index, and find if the index record matches. Then it needs to follow the index record’s pointer to the real record on disk. And the MySQL gurus have decided that around 30% is the place where using the index is slower than just doing a full table scan.

(Note: I’m not exactly sure what the # is but I’ve heard it’s around 30%. As well, at the user conference I saw graphs that showed that for the most part this was true. I thought it was in Jay Pipes’ Performance Tuning presentation, but the graphs are not in the slides. Pointers are appreciated.)

So in this case, should we put an index on browser? Well, this is one of those cases where I’d think about how often we’d be doing queries and how much we care about server performance doing a report versus server performance while inserting or updating a record. If we really care one way or another, go that way. And document!

Another thing to consider is the nature of the data. For something like age, that’s not going to change. Sure, we might have some 120 year olds eventually, but there’s not going to be that much variance. For browsers, there will only be more and more types put out, considering different version numbers and OS configurations of standard browsers as well as mobile phone browsers.

However, if it does not matter, or if it’s too difficult to decide which is more important (your boss says “we can’t be slow during normal usage OR during reporting!”) I default to MySQL — it’s better at optimization than I am. I’d probably put an index so MySQL could decide whether to do a full table scan or use an index, particularly given that I expect the number of browsers to keep increasing.

For something like IP, that has a selectivity of 0.67, so putting an index on it is worth it if we query on IPs a lot.

I hope this article has been helpful!

Jay Pipes has a great article on using the INFORMATION_SCHEMA to find out selectivity:
http://tinyurl.com/kfffp

The Care and Feeding of MySQL Tables

Our site went from weekly crashes during our two busiest nights to not even peeping this week (during the two busiest nights), and the only thing we changed was that we did some table maintenance. We hadn’t done table maintenance at least as long as I’ve been around, which is 6 months. We are a site with high volumes of both reads and writes. This article will talk about the care and feeding of tables; feel free to use this for justification to have a maintenance window, or even permission to run table maintenance statements.

MySQL uses a cost-based optimizer to best translate the written query into what actually happens. This means when you write:

SELECT foo FROM t1 INNER JOIN t2 USING (commonField);

The optimizer looks at the statistics for tables t1 and t2 and decides which is better:
1) To go through each item in t1, looking for a matching “commonField” in t2
or
2) To go through each item in t2, looking for a matching “commonField” t1

If t1 is very large and t2 is very small, it makes sense to follow plan #2. This is a simplified example, of course.

From the documentation:
ANALYZE TABLE analyzes and stores the key distribution for a table. During the analysis, the table is locked with a read lock. This statement works with MyISAM, BDB, and InnoDB tables.

If the key distribution is off, the optimizer will be using incorrect (out-of-date) information. Therefore, the optimizations it makes will not be…well, optimal.

ANALYZE TABLE takes a very short amount of time — less than a second for even a million rows. I tested with InnoDB and MyISAM, but I’d guess that BDB is the same. Our database of 14G took less than a minute to analyze all 112 tables in 3 datases.

Documentation: http://dev.mysql.com/doc/refman/5.0/en/analyze-table.html

CHECK TABLE check tables and views for incorrectly closed tables, incorrect or deleted links, and verified checksums for the rows. It can also check for full consistency.

This takes a bit — checking our tables for everything (but not checking for full consistency, as it takes longer) took 11 minutes (14G, 112 tables in 3 databases). Next month I will run a CHECK EXTENDED and see how long this takes.

Documentation: http://dev.mysql.com/doc/refman/5.0/en/check-table.html

And the daddy of them all:
OPTIMIZE TABLE can be used on MyISAM, BDB and InnoDB tables. In MyISAM tables, it repairs deleted or split rows, updates index statistics, and sorts the index pages. For InnoDB and BDB, OPTIMIZE TABLE maps to ALTER TABLE and just rebuilds the index, thereby getting rid of defragmentation, corruption and incorrect statistics.

Documentation:
http://dev.mysql.com/doc/refman/5.0/en/optimize-table.html

This took 36 minutes on our (14G, 112 tables in 3 databases) server.

From the documentation:
In most setups, you need not run OPTIMIZE TABLE at all. Even if you do a lot of updates to variable-length rows, it is not likely that you need to do this more than once a week or month and only on certain tables.

This morning, we backed up our data. Then we ran ANALYZE TABLE on all of our tables first. Sure, OPTIMIZE TABLE performs the same function, but OPTIMIZE TABLE takes a long time, and we wanted the benefit of ANALYZE TABLE right away. Plus, if anything failed, at least the table’s index statistics are up-to-date. Then we ran CHECK TABLE, and then OPTIMIZE TABLE on each table. We did this while running live, and as I said, the entire process took less than an hour.

Actually, it took 40 minutes, because once the script I had running to CHECK TABLE was halfway through, I started the OPTIMIZE TABLE script. I specifically set the tables to run through in alphabetical order, so there was no chance of the scripts trying to run on the same table. I will not do that again, I will just run them serially for safety’s sake.

Our site went from weekly crashes during our two busiest nights to not even peeping this week (during the two busiest nights), and the only thing we changed was that we did some table maintenance. We hadn’t done table maintenance at least as long as I’ve been around, which is 6 months. We are a site with high volumes of both reads and writes. This article will talk about the care and feeding of tables; feel free to use this for justification to have a maintenance window, or even permission to run table maintenance statements.

MySQL uses a cost-based optimizer to best translate the written query into what actually happens. This means when you write:

SELECT foo FROM t1 INNER JOIN t2 USING (commonField);

The optimizer looks at the statistics for tables t1 and t2 and decides which is better:
1) To go through each item in t1, looking for a matching “commonField” in t2
or
2) To go through each item in t2, looking for a matching “commonField” t1

If t1 is very large and t2 is very small, it makes sense to follow plan #2. This is a simplified example, of course.

From the documentation:
ANALYZE TABLE analyzes and stores the key distribution for a table. During the analysis, the table is locked with a read lock. This statement works with MyISAM, BDB, and InnoDB tables.

If the key distribution is off, the optimizer will be using incorrect (out-of-date) information. Therefore, the optimizations it makes will not be…well, optimal.

ANALYZE TABLE takes a very short amount of time — less than a second for even a million rows. I tested with InnoDB and MyISAM, but I’d guess that BDB is the same. Our database of 14G took less than a minute to analyze all 112 tables in 3 datases.

Documentation: http://dev.mysql.com/doc/refman/5.0/en/analyze-table.html

CHECK TABLE check tables and views for incorrectly closed tables, incorrect or deleted links, and verified checksums for the rows. It can also check for full consistency.

This takes a bit — checking our tables for everything (but not checking for full consistency, as it takes longer) took 11 minutes (14G, 112 tables in 3 databases). Next month I will run a CHECK EXTENDED and see how long this takes.

Documentation: http://dev.mysql.com/doc/refman/5.0/en/check-table.html

And the daddy of them all:
OPTIMIZE TABLE can be used on MyISAM, BDB and InnoDB tables. In MyISAM tables, it repairs deleted or split rows, updates index statistics, and sorts the index pages. For InnoDB and BDB, OPTIMIZE TABLE maps to ALTER TABLE and just rebuilds the index, thereby getting rid of defragmentation, corruption and incorrect statistics.

Documentation:
http://dev.mysql.com/doc/refman/5.0/en/optimize-table.html

This took 36 minutes on our (14G, 112 tables in 3 databases) server.

From the documentation:
In most setups, you need not run OPTIMIZE TABLE at all. Even if you do a lot of updates to variable-length rows, it is not likely that you need to do this more than once a week or month and only on certain tables.

This morning, we backed up our data. Then we ran ANALYZE TABLE on all of our tables first. Sure, OPTIMIZE TABLE performs the same function, but OPTIMIZE TABLE takes a long time, and we wanted the benefit of ANALYZE TABLE right away. Plus, if anything failed, at least the table’s index statistics are up-to-date. Then we ran CHECK TABLE, and then OPTIMIZE TABLE on each table. We did this while running live, and as I said, the entire process took less than an hour.

Actually, it took 40 minutes, because once the script I had running to CHECK TABLE was halfway through, I started the OPTIMIZE TABLE script. I specifically set the tables to run through in alphabetical order, so there was no chance of the scripts trying to run on the same table. I will not do that again, I will just run them serially for safety’s sake.

One thing MERGE Tables Are Good For

Many people have some kind of reporting or auditing on their database. The problem is that the data grows very large, and lots of times there is data that can be purged. Sure, theoretically one never needs to purge data, but sometimes a “delete” flag just won’t work — when you search on the delete flag, a full table scan may be the most efficient way to go.

Of course, that’s not acceptable. And in many cases, say when you have users who no longer use the site but did in the past (and perhaps have billing data associated with them), you never want to get rid of them.

So what to do? Make a special reporting database, that gathers information from the production database(s). Use MyISAM tables, because a reporting server can afford to be behind the master, and MyISAM is better for reporting — better metadata. For something like a “Users” table, make 2 more tables:

1) DeletedUsers
2) AllUsers

Where Deleted Users is where you put information about the users you delete (something like INSERT INTO DeletedUsers SELECT * FROM Users WHERE [parameters of deletion] and then run DELETE FROM Users WHERE [parameters of deletion] on the master. On the reporting slave, make a MERGE table called “AllUsers” and run your reports from that, when you might need to gather historical table.

Thoughts?

Many people have some kind of reporting or auditing on their database. The problem is that the data grows very large, and lots of times there is data that can be purged. Sure, theoretically one never needs to purge data, but sometimes a “delete” flag just won’t work — when you search on the delete flag, a full table scan may be the most efficient way to go.

Of course, that’s not acceptable. And in many cases, say when you have users who no longer use the site but did in the past (and perhaps have billing data associated with them), you never want to get rid of them.

So what to do? Make a special reporting database, that gathers information from the production database(s). Use MyISAM tables, because a reporting server can afford to be behind the master, and MyISAM is better for reporting — better metadata. For something like a “Users” table, make 2 more tables:

1) DeletedUsers
2) AllUsers

Where Deleted Users is where you put information about the users you delete (something like INSERT INTO DeletedUsers SELECT * FROM Users WHERE [parameters of deletion] and then run DELETE FROM Users WHERE [parameters of deletion] on the master. On the reporting slave, make a MERGE table called “AllUsers” and run your reports from that, when you might need to gather historical table.

Thoughts?

Working Smarter, Not Harder (SET and ENUM)

So, in a previous post I talked about smart code. Today I put myself square into a discussion about ENUM and SET. ENUM is an enumerated list of values; similar to a pull-down menu, where the only values allowed in that field are the ones defined, with the option of also having a null column.

The ENUM field takes the idea of normalizing the data and eliminates the need for a join on that data. It also makes data integrity easy — if the value you’re trying to enter is not in the ENUM column definition, MySQL throws an error.

ENUM is not a standard SQL data type. It is MySQL specific.

As an example, in the real world I run a database for an international not-for-profit. Whenever a donation comes in, it is associated with some form of solicitation — either “October 2005 mailing” or “website donation” or “2005 Los Angeles House Party”, etc.

To normalize this data in a pure relational database, we’d have to have a separate table, with 2 fields — an ID, and the name of the solitication. We would add to this table whenever we have an event, or do a mailing. In the “donation” table, we would reference the solicitation via the solicitation ID, and set up referential integrity so that an ID that does not exist could not be entered into the table.

Instead, we take advantage of MySQL’s ENUM column type. I have a script that will alter the table when we need to add a new solicitation, so even non-technical folk can do it (they go to a web page, add the name of the field, and click “submit”). Pull-down menus are also not difficult with a similar script to get all the possible values in the ENUM column.

The reason ENUM is a great field is that it stores all the row values in 2 bytes; there is a limit of 65,535 elements in the ENUM column. This uses a lot less space than the hack of just having a char or varchar field, which is what folks usually do when they do not want to add a join to a query by having another table.

Only one value at a time can be stored in an ENUM field; if you want to store more than one value in a column (which makes your database not relational, by the way), you can use the SET datatype.

One example of where I use the SET datatype in the not-for-profit database is for a person’s role in the organization. For instance, a person can be a donor and a volunteer; or a major donor and have grantwriting skills and be a staff member. Or they can be none of these.

Because of the combinations that can be made, SET is limited to 64 values. Each value is stored in a maximum of 8 bytes — not impressive for one value, but rather small when it could be holding up to 64 values. This is more than enough for an application that needs “roles within a not-for-profit organization,” but not necessarily enough for an application that needs “albums in a photo database.”

Most folks will either normalize their data and have a separate table, or make their own SET field by using a text field and matching. The latter has the benefit of utilizing the relevance of a FULLTEXT search, however, the tables are likely larger because they need to store all the text. And there is no data integrity; values can be spelled wrong.

It would be great if there were a LARGE SET value; although it may well be that the computation of joining is faster than querying a table with a large field (if LARGE SET had a limit of 128 items, instead of SET’s 64, it would be stored in 16 bytes, which is pretty large if most rows only have 1 or 2 values).

I’d be interested to know where folks are using SET and ENUM, and where they are deliberately not using them.

So, in a previous post I talked about smart code. Today I put myself square into a discussion about ENUM and SET. ENUM is an enumerated list of values; similar to a pull-down menu, where the only values allowed in that field are the ones defined, with the option of also having a null column.
Continue reading “Working Smarter, Not Harder (SET and ENUM)”

Smart code

So, the other day I was asked by a developer to come up with a table to tally votes. Basically, there are 6 choices, and customers can vote once per day. I asked if there were any other constraints, and there were none. I specifically asked if they wanted once per calendar date, or ‘it has to be 24 hours since the last vote’; they wanted ‘once per calendar date’. And updating the current day’s vote is not allowed. Once the vote is in, it cannot be changed.

So I came up with a simple table:

CREATE TABLE `ManOfMonth` (
`uid` int(10) unsigned NOT NULL default ‘0’,
`voteDate` date NOT NULL default ‘0000-00-00’,
`uidVoteFor` int(10) unsigned NOT NULL default ‘0’,
PRIMARY KEY (`uid`,`voteDate`),
KEY `countVotes` (`uidVoteFor`)
)

There’s no need for a timestamp, and you can select now() into the voteDate part, and MySQL truncates the data for you (we’re using 4.1.12).

The smart part of this is that the primary key is the constraint on the database — that is, a userid cannot vote more than once per day. So I told the developer, “When you’re writing the code, don’t bother running a select to see if they’ve already voted; just do the insert query, and check if there ends up being an error for a duplicate key.”

Using the MySQL error messages in this way prevents unnecessary selects. This kind of coding should be common practice; I have a feeling it is not, particularly since I had to explain to the developer why the select was unnecessary. Then again, the developer also had wanted a table with uidVoteFor and the tally as 2 columns. This would be a smaller table, sure, and getting the result would be easy, but then there’s still the accounting for whether or not the person had voted that day. The index on uidVoteFor helps make the tally query faster.

To contact me via e-mail:

awfief@gmail.com
To contact me via e-mail:

awfief@gmail.com
To contact me via e-mail:

awfief@gmail.com
Having used Oracle, DB2, Postgres, Sybase, Informix, and MSSQL, I always enjoyed that MySQL just named everything “MySQL”. Sure, it can get confusing — there’s MySQL the server, MySQL the client, MySQL the database instance. . . .MySQL the flamethrower (the kids love this one). . . .But seriously, the ‘big guys’ have all this complicated jargon for really simple ideas.

MySQL has joined them. Granted, I’d been out of the MySQL world for about a year, and some wonderful things have happened in that year. Even a year ago, the company I worked for wasn’t using the most recent software nor taking advantage of all the features their versions of MySQL did have to offer. But I digress.

I’ve been working on MySQL knowledge, particularly with the free webinars. Today I attended the “MySQL Network and MySQL 5.0” webinar, where I learned that MySQL is packaging (better) software, support, tools, access to developers, and a knowledgebase into what they call “MySQL Network.” I was completely unclear on the concept of MySQL Network from the description, and from the name I figured it would have something to do with technical networking, not business to business networking.

Meanwhile, yesterday I realized that the “Pluggable Storage Engines” in MySQL just mean “you can use different table types, and turn off the ones you don’t want to use.” I was familiar with the concept, but not the buzz-phrase used to describe it.
To contact me via e-mail:

awfief@gmail.com
To contact me via e-mail:

awfief@gmail.com
Having used Oracle, DB2, Postgres, Sybase, Informix, and MSSQL, I always enjoyed that MySQL just named everything “MySQL”. Sure, it can get confusing — there’s MySQL the server, MySQL the client, MySQL the database instance. . . .MySQL the flamethrower (the kids love this one). . . .But seriously, the ‘big guys’ have all this complicated jargon for really simple ideas.

MySQL has joined them. Granted, I’d been out of the MySQL world for about a year, and some wonderful things have happened in that year. Even a year ago, the company I worked for wasn’t using the most recent software nor taking advantage of all the features their versions of MySQL did have to offer. But I digress.

I’ve been working on MySQL knowledge, particularly with the free webinars. Today I attended the “MySQL Network and MySQL 5.0” webinar, where I learned that MySQL is packaging (better) software, support, tools, access to developers, and a knowledgebase into what they call “MySQL Network.” I was completely unclear on the concept of MySQL Network from the description, and from the name I figured it would have something to do with technical networking, not business to business networking.

Meanwhile, yesterday I realized that the “Pluggable Storage Engines” in MySQL just mean “you can use different table types, and turn off the ones you don’t want to use.” I was familiar with the concept, but not the buzz-phrase used to describe it.
To contact me via e-mail:

awfief@gmail.com
Having used Oracle, DB2, Postgres, Sybase, Informix, and MSSQL, I always enjoyed that MySQL just named everything “MySQL”. Sure, it can get confusing — there’s MySQL the server, MySQL the client, MySQL the database instance. . . .MySQL the flamethrower (the kids love this one). . . .But seriously, the ‘big guys’ have all this complicated jargon for really simple ideas.

MySQL has joined them. Granted, I’d been out of the MySQL world for about a year, and some wonderful things have happened in that year. Even a year ago, the company I worked for wasn’t using the most recent software nor taking advantage of all the features their versions of MySQL did have to offer. But I digress.

I’ve been working on MySQL knowledge, particularly with the free webinars. Today I attended the “MySQL Network and MySQL 5.0” webinar, where I learned that MySQL is packaging (better) software, support, tools, access to developers, and a knowledgebase into what they call “MySQL Network.” I was completely unclear on the concept of MySQL Network from the description, and from the name I figured it would have something to do with technical networking, not business to business networking.

Meanwhile, yesterday I realized that the “Pluggable Storage Engines” in MySQL just mean “you can use different table types, and turn off the ones you don’t want to use.” I was familiar with the concept, but not the buzz-phrase used to describe it.
The first Boston MySQL User Group meeting went swimmingly. About 1/2 the people who RSVP’s yes or maybe showed up, ) got pizza as a thank-you gift.

My boss offered me a ride home, definitely — I’ll just go into work later, and not be tempted by a ride home.

The demographics of the group was really amazing:

about 15% female
those with no experience with any database
those with experience with databases but not MySQL
those who’ve been using MySQL for weeks
those who’ve been using MySQL for months
those who’ve been using MySQL for years
those who are trying to convince their company to switch
about 10% Indian
about 20% other-Asian
(I didn’t notice anyone that was recognizably Hispanic or black)
job titles ranging from developer, dba, all the way up to the vice president and president level
The publishing sector was represented by O’Reilly, Addison-Wesley (which is owned by Pearson, which handles the MySQL Press imprint), and Apress. O’Reilly and Apress solicited authors.

Corrections always welcome, and special thanks to Mike Kruckenberg, and Mark Rubin of MySQL AB.

I cannot wait for next month. . .
To contact me via e-mail:

awfief@gmail.com
To contact me via e-mail:

awfief@gmail.com
Having used Oracle, DB2, Postgres, Sybase, Informix, and MSSQL, I always enjoyed that MySQL just named everything “MySQL”. Sure, it can get confusing — there’s MySQL the server, MySQL the client, MySQL the database instance. . . .MySQL the flamethrower (the kids love this one). . . .But seriously, the ‘big guys’ have all this complicated jargon for really simple ideas.

MySQL has joined them. Granted, I’d been out of the MySQL world for about a year, and some wonderful things have happened in that year. Even a year ago, the company I worked for wasn’t using the most recent software nor taking advantage of all the features their versions of MySQL did have to offer. But I digress.

I’ve been working on MySQL knowledge, particularly with the free webinars. Today I attended the “MySQL Network and MySQL 5.0” webinar, where I learned that MySQL is packaging (better) software, support, tools, access to developers, and a knowledgebase into what they call “MySQL Network.” I was completely unclear on the concept of MySQL Network from the description, and from the name I figured it would have something to do with technical networking, not business to business networking.

Meanwhile, yesterday I realized that the “Pluggable Storage Engines” in MySQL just mean “you can use different table types, and turn off the ones you don’t want to use.” I was familiar with the concept, but not the buzz-phrase used to describe it.
To contact me via e-mail:

awfief@gmail.com
Having used Oracle, DB2, Postgres, Sybase, Informix, and MSSQL, I always enjoyed that MySQL just named everything “MySQL”. Sure, it can get confusing — there’s MySQL the server, MySQL the client, MySQL the database instance. . . .MySQL the flamethrower (the kids love this one). . . .But seriously, the ‘big guys’ have all this complicated jargon for really simple ideas.

MySQL has joined them. Granted, I’d been out of the MySQL world for about a year, and some wonderful things have happened in that year. Even a year ago, the company I worked for wasn’t using the most recent software nor taking advantage of all the features their versions of MySQL did have to offer. But I digress.

I’ve been working on MySQL knowledge, particularly with the free webinars. Today I attended the “MySQL Network and MySQL 5.0” webinar, where I learned that MySQL is packaging (better) software, support, tools, access to developers, and a knowledgebase into what they call “MySQL Network.” I was completely unclear on the concept of MySQL Network from the description, and from the name I figured it would have something to do with technical networking, not business to business networking.

Meanwhile, yesterday I realized that the “Pluggable Storage Engines” in MySQL just mean “you can use different table types, and turn off the ones you don’t want to use.” I was familiar with the concept, but not the buzz-phrase used to describe it.
The first Boston MySQL User Group meeting went swimmingly. About 1/2 the people who RSVP’s yes or maybe showed up, ) got pizza as a thank-you gift.

My boss offered me a ride home, definitely — I’ll just go into work later, and not be tempted by a ride home.

The demographics of the group was really amazing:

about 15% female
those with no experience with any database
those with experience with databases but not MySQL
those who’ve been using MySQL for weeks
those who’ve been using MySQL for months
those who’ve been using MySQL for years
those who are trying to convince their company to switch
about 10% Indian
about 20% other-Asian
(I didn’t notice anyone that was recognizably Hispanic or black)
job titles ranging from developer, dba, all the way up to the vice president and president level
The publishing sector was represented by O’Reilly, Addison-Wesley (which is owned by Pearson, which handles the MySQL Press imprint), and Apress. O’Reilly and Apress solicited authors.

Corrections always welcome, and special thanks to Mike Kruckenberg, and Mark Rubin of MySQL AB.

I cannot wait for next month. . .
To contact me via e-mail:

awfief@gmail.com
Having used Oracle, DB2, Postgres, Sybase, Informix, and MSSQL, I always enjoyed that MySQL just named everything “MySQL”. Sure, it can get confusing — there’s MySQL the server, MySQL the client, MySQL the database instance. . . .MySQL the flamethrower (the kids love this one). . . .But seriously, the ‘big guys’ have all this complicated jargon for really simple ideas.

MySQL has joined them. Granted, I’d been out of the MySQL world for about a year, and some wonderful things have happened in that year. Even a year ago, the company I worked for wasn’t using the most recent software nor taking advantage of all the features their versions of MySQL did have to offer. But I digress.

I’ve been working on MySQL knowledge, particularly with the free webinars. Today I attended the “MySQL Network and MySQL 5.0” webinar, where I learned that MySQL is packaging (better) software, support, tools, access to developers, and a knowledgebase into what they call “MySQL Network.” I was completely unclear on the concept of MySQL Network from the description, and from the name I figured it would have something to do with technical networking, not business to business networking.

Meanwhile, yesterday I realized that the “Pluggable Storage Engines” in MySQL just mean “you can use different table types, and turn off the ones you don’t want to use.” I was familiar with the concept, but not the buzz-phrase used to describe it.
The first Boston MySQL User Group meeting went swimmingly. About 1/2 the people who RSVP’s yes or maybe showed up, ) got pizza as a thank-you gift.

My boss offered me a ride home, definitely — I’ll just go into work later, and not be tempted by a ride home.

The demographics of the group was really amazing:

about 15% female
those with no experience with any database
those with experience with databases but not MySQL
those who’ve been using MySQL for weeks
those who’ve been using MySQL for months
those who’ve been using MySQL for years
those who are trying to convince their company to switch
about 10% Indian
about 20% other-Asian
(I didn’t notice anyone that was recognizably Hispanic or black)
job titles ranging from developer, dba, all the way up to the vice president and president level
The publishing sector was represented by O’Reilly, Addison-Wesley (which is owned by Pearson, which handles the MySQL Press imprint), and Apress. O’Reilly and Apress solicited authors.

Corrections always welcome, and special thanks to Mike Kruckenberg, and Mark Rubin of MySQL AB.

I cannot wait for next month. . .
So I’ve been looking into the Archive Storage Engine. What I would really like to do with it is get data in realtime, they want all the data up until “now”.

It is inadvisable to replicate from one storage engine type to another. I have not yet played with it, but since an Archive table does not allow updates and deletes, replicating from a MyISAM or InnoDB table to an Archive one is a bad idea.

Most folks probably run a batch job; but I wonder if it can be done in real-time. Or rather, ‘what is the best way to run it real-time?’ One way, off the top of my head, is to do this are to replicate to a blackhole table with a trigger, to insert into an archive table whenever an INSERT statement is called. The blackhole table should not give an error upon UPDATE or DELETE statements.

This also allows for easy aggregation, because the trigger can say “update the count and the country of a new profile” instead of having an entire replicated set of data, with reports running “SELECT count(*)”. Instead of copying all the data and running the same reports on a different server/table, we can now collect the data we actually want, which is “1 new paid membership at time t located in Sao Paulo, Brazil.” For reporting, we do not care what the name of the member is.

I have searched around but have not yet found how users are getting data into their archived databases. I need a sandbox server at work so I can play with the options.
To contact me via e-mail:

awfief@gmail.com
To contact me via e-mail:

awfief@gmail.com
Having used Oracle, DB2, Postgres, Sybase, Informix, and MSSQL, I always enjoyed that MySQL just named everything “MySQL”. Sure, it can get confusing — there’s MySQL the server, MySQL the client, MySQL the database instance. . . .MySQL the flamethrower (the kids love this one). . . .But seriously, the ‘big guys’ have all this complicated jargon for really simple ideas.

MySQL has joined them. Granted, I’d been out of the MySQL world for about a year, and some wonderful things have happened in that year. Even a year ago, the company I worked for wasn’t using the most recent software nor taking advantage of all the features their versions of MySQL did have to offer. But I digress.

I’ve been working on MySQL knowledge, particularly with the free webinars. Today I attended the “MySQL Network and MySQL 5.0” webinar, where I learned that MySQL is packaging (better) software, support, tools, access to developers, and a knowledgebase into what they call “MySQL Network.” I was completely unclear on the concept of MySQL Network from the description, and from the name I figured it would have something to do with technical networking, not business to business networking.

Meanwhile, yesterday I realized that the “Pluggable Storage Engines” in MySQL just mean “you can use different table types, and turn off the ones you don’t want to use.” I was familiar with the concept, but not the buzz-phrase used to describe it.
To contact me via e-mail:

awfief@gmail.com
Having used Oracle, DB2, Postgres, Sybase, Informix, and MSSQL, I always enjoyed that MySQL just named everything “MySQL”. Sure, it can get confusing — there’s MySQL the server, MySQL the client, MySQL the database instance. . . .MySQL the flamethrower (the kids love this one). . . .But seriously, the ‘big guys’ have all this complicated jargon for really simple ideas.

MySQL has joined them. Granted, I’d been out of the MySQL world for about a year, and some wonderful things have happened in that year. Even a year ago, the company I worked for wasn’t using the most recent software nor taking advantage of all the features their versions of MySQL did have to offer. But I digress.

I’ve been working on MySQL knowledge, particularly with the free webinars. Today I attended the “MySQL Network and MySQL 5.0” webinar, where I learned that MySQL is packaging (better) software, support, tools, access to developers, and a knowledgebase into what they call “MySQL Network.” I was completely unclear on the concept of MySQL Network from the description, and from the name I figured it would have something to do with technical networking, not business to business networking.

Meanwhile, yesterday I realized that the “Pluggable Storage Engines” in MySQL just mean “you can use different table types, and turn off the ones you don’t want to use.” I was familiar with the concept, but not the buzz-phrase used to describe it.
The first Boston MySQL User Group meeting went swimmingly. About 1/2 the people who RSVP’s yes or maybe showed up, ) got pizza as a thank-you gift.

My boss offered me a ride home, definitely — I’ll just go into work later, and not be tempted by a ride home.

The demographics of the group was really amazing:

about 15% female
those with no experience with any database
those with experience with databases but not MySQL
those who’ve been using MySQL for weeks
those who’ve been using MySQL for months
those who’ve been using MySQL for years
those who are trying to convince their company to switch
about 10% Indian
about 20% other-Asian
(I didn’t notice anyone that was recognizably Hispanic or black)
job titles ranging from developer, dba, all the way up to the vice president and president level
The publishing sector was represented by O’Reilly, Addison-Wesley (which is owned by Pearson, which handles the MySQL Press imprint), and Apress. O’Reilly and Apress solicited authors.

Corrections always welcome, and special thanks to Mike Kruckenberg, and Mark Rubin of MySQL AB.

I cannot wait for next month. . .
To contact me via e-mail:

awfief@gmail.com
Having used Oracle, DB2, Postgres, Sybase, Informix, and MSSQL, I always enjoyed that MySQL just named everything “MySQL”. Sure, it can get confusing — there’s MySQL the server, MySQL the client, MySQL the database instance. . . .MySQL the flamethrower (the kids love this one). . . .But seriously, the ‘big guys’ have all this complicated jargon for really simple ideas.

MySQL has joined them. Granted, I’d been out of the MySQL world for about a year, and some wonderful things have happened in that year. Even a year ago, the company I worked for wasn’t using the most recent software nor taking advantage of all the features their versions of MySQL did have to offer. But I digress.

I’ve been working on MySQL knowledge, particularly with the free webinars. Today I attended the “MySQL Network and MySQL 5.0” webinar, where I learned that MySQL is packaging (better) software, support, tools, access to developers, and a knowledgebase into what they call “MySQL Network.” I was completely unclear on the concept of MySQL Network from the description, and from the name I figured it would have something to do with technical networking, not business to business networking.

Meanwhile, yesterday I realized that the “Pluggable Storage Engines” in MySQL just mean “you can use different table types, and turn off the ones you don’t want to use.” I was familiar with the concept, but not the buzz-phrase used to describe it.
The first Boston MySQL User Group meeting went swimmingly. About 1/2 the people who RSVP’s yes or maybe showed up, ) got pizza as a thank-you gift.

My boss offered me a ride home, definitely — I’ll just go into work later, and not be tempted by a ride home.

The demographics of the group was really amazing:

about 15% female
those with no experience with any database
those with experience with databases but not MySQL
those who’ve been using MySQL for weeks
those who’ve been using MySQL for months
those who’ve been using MySQL for years
those who are trying to convince their company to switch
about 10% Indian
about 20% other-Asian
(I didn’t notice anyone that was recognizably Hispanic or black)
job titles ranging from developer, dba, all the way up to the vice president and president level
The publishing sector was represented by O’Reilly, Addison-Wesley (which is owned by Pearson, which handles the MySQL Press imprint), and Apress. O’Reilly and Apress solicited authors.

Corrections always welcome, and special thanks to Mike Kruckenberg, and Mark Rubin of MySQL AB.

I cannot wait for next month. . .
So I’ve been looking into the Archive Storage Engine. What I would really like to do with it is get data in realtime, they want all the data up until “now”.

It is inadvisable to replicate from one storage engine type to another. I have not yet played with it, but since an Archive table does not allow updates and deletes, replicating from a MyISAM or InnoDB table to an Archive one is a bad idea.

Most folks probably run a batch job; but I wonder if it can be done in real-time. Or rather, ‘what is the best way to run it real-time?’ One way, off the top of my head, is to do this are to replicate to a blackhole table with a trigger, to insert into an archive table whenever an INSERT statement is called. The blackhole table should not give an error upon UPDATE or DELETE statements.

This also allows for easy aggregation, because the trigger can say “update the count and the country of a new profile” instead of having an entire replicated set of data, with reports running “SELECT count(*)”. Instead of copying all the data and running the same reports on a different server/table, we can now collect the data we actually want, which is “1 new paid membership at time t located in Sao Paulo, Brazil.” For reporting, we do not care what the name of the member is.

I have searched around but have not yet found how users are getting data into their archived databases. I need a sandbox server at work so I can play with the options.
To contact me via e-mail:

awfief@gmail.com
Having used Oracle, DB2, Postgres, Sybase, Informix, and MSSQL, I always enjoyed that MySQL just named everything “MySQL”. Sure, it can get confusing — there’s MySQL the server, MySQL the client, MySQL the database instance. . . .MySQL the flamethrower (the kids love this one). . . .But seriously, the ‘big guys’ have all this complicated jargon for really simple ideas.

MySQL has joined them. Granted, I’d been out of the MySQL world for about a year, and some wonderful things have happened in that year. Even a year ago, the company I worked for wasn’t using the most recent software nor taking advantage of all the features their versions of MySQL did have to offer. But I digress.

I’ve been working on MySQL knowledge, particularly with the free webinars. Today I attended the “MySQL Network and MySQL 5.0” webinar, where I learned that MySQL is packaging (better) software, support, tools, access to developers, and a knowledgebase into what they call “MySQL Network.” I was completely unclear on the concept of MySQL Network from the description, and from the name I figured it would have something to do with technical networking, not business to business networking.

Meanwhile, yesterday I realized that the “Pluggable Storage Engines” in MySQL just mean “you can use different table types, and turn off the ones you don’t want to use.” I was familiar with the concept, but not the buzz-phrase used to describe it.
The first Boston MySQL User Group meeting went swimmingly. About 1/2 the people who RSVP’s yes or maybe showed up, ) got pizza as a thank-you gift.

My boss offered me a ride home, definitely — I’ll just go into work later, and not be tempted by a ride home.

The demographics of the group was really amazing:

about 15% female
those with no experience with any database
those with experience with databases but not MySQL
those who’ve been using MySQL for weeks
those who’ve been using MySQL for months
those who’ve been using MySQL for years
those who are trying to convince their company to switch
about 10% Indian
about 20% other-Asian
(I didn’t notice anyone that was recognizably Hispanic or black)
job titles ranging from developer, dba, all the way up to the vice president and president level
The publishing sector was represented by O’Reilly, Addison-Wesley (which is owned by Pearson, which handles the MySQL Press imprint), and Apress. O’Reilly and Apress solicited authors.

Corrections always welcome, and special thanks to Mike Kruckenberg, and Mark Rubin of MySQL AB.

I cannot wait for next month. . .
So I’ve been looking into the Archive Storage Engine. What I would really like to do with it is get data in realtime, they want all the data up until “now”.

It is inadvisable to replicate from one storage engine type to another. I have not yet played with it, but since an Archive table does not allow updates and deletes, replicating from a MyISAM or InnoDB table to an Archive one is a bad idea.

Most folks probably run a batch job; but I wonder if it can be done in real-time. Or rather, ‘what is the best way to run it real-time?’ One way, off the top of my head, is to do this are to replicate to a blackhole table with a trigger, to insert into an archive table whenever an INSERT statement is called. The blackhole table should not give an error upon UPDATE or DELETE statements.

This also allows for easy aggregation, because the trigger can say “update the count and the country of a new profile” instead of having an entire replicated set of data, with reports running “SELECT count(*)”. Instead of copying all the data and running the same reports on a different server/table, we can now collect the data we actually want, which is “1 new paid membership at time t located in Sao Paulo, Brazil.” For reporting, we do not care what the name of the member is.

I have searched around but have not yet found how users are getting data into their archived databases. I need a sandbox server at work so I can play with the options.
Why is it that running a monitoring system is such a marketable skill? In every company, having servers up is important. So does everyone have a robust monitoring system? Sure, folks implement scripts, but those are always one-off solutions. And what IS it with folks writing scripts that e-mail them when they’re successful? Just touch a file and have a simple script check to see if the file was touched, that way you know when it failed and do not have to rely on “did I get that e-mail?”

I was lucky to have worked on Nagios while at Tufts, and have set it up everywhere else I’ve worked. I should definitely remember to check up on the plugins site from time to time, so I do not duplicate efforts. Of course, submitting plugins would also eliminate the problem of “I wrote that check at my last company, and now I want it again!” I have not yet come across that, although I did come across a plugin I could have used at my last job, checking data via snmp, that would have been useful, and which I will implement here.

For now, nagios (2.0) is happy. I’ve written checks to find the lag time on replication and the remaining innodb space. Behind the cut is a 50k picture of the nagios server, currently happy.

I also intend to use the reporting feature of nagios to get stats on how available our services are.

Continue reading “Smart code”