Category Archives: Performance

Lots of Boolean Values

So, the biggest issue from last month’s Boston MySQL Meetup was “What is the most efficient way to query across many values that are boolean in nature?”

Now, it was a bit more complex than that, as it always is. Values may be set or not set, and people may search across any number of boolean values, or all of them. There are over 1 million entries, and over 40 boolean values!

A few ways of doing so came up:

1) Simply put all the values into separate columns, with a BOOLEAN type (really TINYINT(1))
2) Using ENUMs
3) Using foreign keys to “join tables”
4) Using SET
5) Using bits and bit operations

Using real data, the original questioner and I sought to find the best answer for his case.

One TRIGGER Fact, Optimizer Rewriting Stuff

  • One thing about TRIGGERs is that you can have 6 triggers per table:
    BEFORE
    AFTER
        INSERT
    UPDATE
    DELETE

    That was brought up at the User Group meeting.

    In other news, I have been reading the MySQL Internals Manual. The most interesting chapter yet is Chapter 3, the optimizer. This can be read out of order, and actually contains lots of information that folks can use. For instance:

    WHERE col1=col2 AND col2=’x’
    transforms to
    WHERE col1=’x’ AND col2=’x’

    because constant matching is faster.

    Tautologies and always-false statements are usually optimized out of the query so that something like WHERE 0=0 does not waste time by actually being tested.

    A table with 0 or 1 rows is a constant value. Another constant table scenario is WHERE col1=[constant] for all the columns of any UNIQUE or PRIMARY KEYS, assuming they are defined as NOT NULL. This is a good reason to use NOT NULL for column definitions unless you really need NULL.

    A constant table is what is used by the optimizer when you see “const” as the “join type” in the result of an EXPLAIN.

    WHERE col1 in (1,2,3) is the exact same expression as WHERE col1=1 OR col1=2 OR col1=3

    WHERE col1 BETWEEN 1 AND 3 is the exact same expression as WHERE col1>=1 AND col1< =3

    The latter 2 examples are good to know, because the first of the two statements in each case are more clear.

  • Semi-Dynamic Data

    http://www.artfulsoftware.com has a “Common Queries” page, which I find rather useful. I hadn’t realized its history, as described on the homepage:

    Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

    One day, I clicked on the page and got the dreaded “blank” PHP page.

    This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

    Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

    The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

    I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

    I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

    Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

    This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

    * an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.

    Selectivity and Index Performance

    Let’s look into selectivity, as this is an important topics when looking at index performance. (Oooh, I said “performance”, watch everyone’s ears perk up!).

    This will probably answer the questions “Why isn’t MySQL using my index?” or “Why is my query so slow when I have an index on it?”

    Selectivity describes how different values of a field are. It is a number from 0-1, although you can also think of it as a percentage. A value of 1, or 100%, means that each value in the field is unique. This happens with UNIQUE and PRIMARY keys, although non-unique fields may have a selectivity of 1 — for example, a timestamp value in a not-often-used table.

    To calculate this, you take the total number of DISTINCT records and divide by the total number of records.

    My company has a large Users table, so I grabbed some statistics off of that:


    +----------+
    | count(*) |
    +----------+
    | 817666 |
    +----------+
    1 row in set (0.63 sec)

    +--------------------------+
    | count(distinct username) |
    +--------------------------+
    | 817666 |
    +--------------------------+
    1 row in set (1.63 sec)

    So the selectivity is 81766/81766, or 1. If this were not a UNIQUE KEY already, it’s a good candidate for one.

    the “created” field is a timestamp for when the user record was created
    +-------------------------+
    | count(distinct created) |
    +-------------------------+
    | 811227 |
    +-------------------------+
    1 row in set (2.04 sec)

    As I expect, there are *some* duplicates, but for the most part, everyone has a different creation time. 811227/817666 = 0.99.

    +--------------------+
    | count(distinct IP) |
    +--------------------+
    | 544694 |
    +--------------------+
    1 row in set (2.35 sec)

    This is interesting — lots of people logon from public places, or use their friends’ computers, so there are duplicate IP’s (the last IP used to login is associated with a user, so it’s a 1-to-1 relationship). The selectivity here is 0.67.

    +-------------------------+
    | count(distinct browser) |
    +-------------------------+
    | 25699 |
    +-------------------------+
    1 row in set (1.70 sec)

    This is what the server reports the user’s browser is. It records the last browser used by the user. This gives us about a 0.03 for selectivity.

    +---------------------+
    | count(distinct age) |
    +---------------------+
    | 83 |
    +---------------------+
    1 row in set (0.63 sec)

    There are only 83 different reported ages on our site. That makes the selectivity of age 0.000101508. That is very low, effectively zero.

    So why is this important? I’m glad you asked….

    MySQL has a cost-based optimizer. This means that MySQL calculates the costs of different ways of performing a query and then chooses the cheapest one. Sounds reasonable, right? Well, calculating the costs is an inexact science. In order to calculate the exact cost, the optimizer would actually have to run the query. So an estimate is taken, and the estimate is wrong sometimes. Most of the time the estimate is correct.

    In contrast, some database systems allow a rule-based optimizer. This means that no matter what the data state, the database uses rules to figure out the “optimal” path to the query. In most enterprise-level database systems, a cost-based optimizer performs better than a rule-based optimizer. In other words, there are so many exceptions to the rules that the calculation overhead is worth it.

    (Just to clarify, in both systems, the correct result set will be generated. The optimizer determines the path to the information.)

    This cost-based optimizer uses selectivity information when it decides whether or not to use an index.

    But what does this mean for me?

    Well, in the example above, this means if you want to query folks by age or age group, it’s useless to put an index on it. It’s a waste of cpu time and disk I/O to have an index for something with such a low selectivity. The optimizer will NEVER use it.

    I’ve heard that the optimizer will do a full table scan if it calculates it will return more than 30% of the table. Why is that number so low, why not more like 50 or 75%? Well, first the server has to go to the index, search the index, and find if the index record matches. Then it needs to follow the index record’s pointer to the real record on disk. And the MySQL gurus have decided that around 30% is the place where using the index is slower than just doing a full table scan.

    (Note: I’m not exactly sure what the # is but I’ve heard it’s around 30%. As well, at the user conference I saw graphs that showed that for the most part this was true. I thought it was in Jay Pipes’ Performance Tuning presentation, but the graphs are not in the slides. Pointers are appreciated.)

    So in this case, should we put an index on browser? Well, this is one of those cases where I’d think about how often we’d be doing queries and how much we care about server performance doing a report versus server performance while inserting or updating a record. If we really care one way or another, go that way. And document!

    Another thing to consider is the nature of the data. For something like age, that’s not going to change. Sure, we might have some 120 year olds eventually, but there’s not going to be that much variance. For browsers, there will only be more and more types put out, considering different version numbers and OS configurations of standard browsers as well as mobile phone browsers.

    However, if it does not matter, or if it’s too difficult to decide which is more important (your boss says “we can’t be slow during normal usage OR during reporting!”) I default to MySQL — it’s better at optimization than I am. I’d probably put an index so MySQL could decide whether to do a full table scan or use an index, particularly given that I expect the number of browsers to keep increasing.

    For something like IP, that has a selectivity of 0.67, so putting an index on it is worth it if we query on IPs a lot.

    I hope this article has been helpful!

    Jay Pipes has a great article on using the INFORMATION_SCHEMA to find out selectivity:
    http://tinyurl.com/kfffp

    The Care and Feeding of MySQL Tables

    Our site went from weekly crashes during our two busiest nights to not even peeping this week (during the two busiest nights), and the only thing we changed was that we did some table maintenance. We hadn’t done table maintenance at least as long as I’ve been around, which is 6 months. We are a site with high volumes of both reads and writes. This article will talk about the care and feeding of tables; feel free to use this for justification to have a maintenance window, or even permission to run table maintenance statements.

    MySQL uses a cost-based optimizer to best translate the written query into what actually happens. This means when you write:

    SELECT foo FROM t1 INNER JOIN t2 USING (commonField);

    The optimizer looks at the statistics for tables t1 and t2 and decides which is better:
    1) To go through each item in t1, looking for a matching “commonField” in t2
    or
    2) To go through each item in t2, looking for a matching “commonField” t1

    If t1 is very large and t2 is very small, it makes sense to follow plan #2. This is a simplified example, of course.

    From the documentation:
    ANALYZE TABLE analyzes and stores the key distribution for a table. During the analysis, the table is locked with a read lock. This statement works with MyISAM, BDB, and InnoDB tables.

    If the key distribution is off, the optimizer will be using incorrect (out-of-date) information. Therefore, the optimizations it makes will not be…well, optimal.

    ANALYZE TABLE takes a very short amount of time — less than a second for even a million rows. I tested with InnoDB and MyISAM, but I’d guess that BDB is the same. Our database of 14G took less than a minute to analyze all 112 tables in 3 datases.

    Documentation: http://dev.mysql.com/doc/refman/5.0/en/analyze-table.html

    CHECK TABLE check tables and views for incorrectly closed tables, incorrect or deleted links, and verified checksums for the rows. It can also check for full consistency.

    This takes a bit — checking our tables for everything (but not checking for full consistency, as it takes longer) took 11 minutes (14G, 112 tables in 3 databases). Next month I will run a CHECK EXTENDED and see how long this takes.

    Documentation: http://dev.mysql.com/doc/refman/5.0/en/check-table.html

    And the daddy of them all:
    OPTIMIZE TABLE can be used on MyISAM, BDB and InnoDB tables. In MyISAM tables, it repairs deleted or split rows, updates index statistics, and sorts the index pages. For InnoDB and BDB, OPTIMIZE TABLE maps to ALTER TABLE and just rebuilds the index, thereby getting rid of defragmentation, corruption and incorrect statistics.

    Documentation:
    http://dev.mysql.com/doc/refman/5.0/en/optimize-table.html

    This took 36 minutes on our (14G, 112 tables in 3 databases) server.

    From the documentation:
    In most setups, you need not run OPTIMIZE TABLE at all. Even if you do a lot of updates to variable-length rows, it is not likely that you need to do this more than once a week or month and only on certain tables.

    This morning, we backed up our data. Then we ran ANALYZE TABLE on all of our tables first. Sure, OPTIMIZE TABLE performs the same function, but OPTIMIZE TABLE takes a long time, and we wanted the benefit of ANALYZE TABLE right away. Plus, if anything failed, at least the table’s index statistics are up-to-date. Then we ran CHECK TABLE, and then OPTIMIZE TABLE on each table. We did this while running live, and as I said, the entire process took less than an hour.

    Actually, it took 40 minutes, because once the script I had running to CHECK TABLE was halfway through, I started the OPTIMIZE TABLE script. I specifically set the tables to run through in alphabetical order, so there was no chance of the scripts trying to run on the same table. I will not do that again, I will just run them serially for safety’s sake.

    One thing MERGE Tables Are Good For

    Many people have some kind of reporting or auditing on their database. The problem is that the data grows very large, and lots of times there is data that can be purged. Sure, theoretically one never needs to purge data, but sometimes a “delete” flag just won’t work — when you search on the delete flag, a full table scan may be the most efficient way to go.

    Of course, that’s not acceptable. And in many cases, say when you have users who no longer use the site but did in the past (and perhaps have billing data associated with them), you never want to get rid of them.

    So what to do? Make a special reporting database, that gathers information from the production database(s). Use MyISAM tables, because a reporting server can afford to be behind the master, and MyISAM is better for reporting — better metadata. For something like a “Users” table, make 2 more tables:

    1) DeletedUsers
    2) AllUsers

    Where Deleted Users is where you put information about the users you delete (something like INSERT INTO DeletedUsers SELECT * FROM Users WHERE [parameters of deletion] and then run DELETE FROM Users WHERE [parameters of deletion] on the master. On the reporting slave, make a MERGE table called “AllUsers” and run your reports from that, when you might need to gather historical table.

    Thoughts?

    Working Smarter, Not Harder (SET and ENUM)

    So, in a previous post I talked about smart code. Today I put myself square into a discussion about ENUM and SET. ENUM is an enumerated list of values; similar to a pull-down menu, where the only values allowed in that field are the ones defined, with the option of also having a null column.
    Continue reading

    Smart code

    To contact me via e-mail:

    awfief@gmail.com
    To contact me via e-mail:

    awfief@gmail.com
    To contact me via e-mail:

    awfief@gmail.com
    Having used Oracle, DB2, Postgres, Sybase, Informix, and MSSQL, I always enjoyed that MySQL just named everything “MySQL”. Sure, it can get confusing — there’s MySQL the server, MySQL the client, MySQL the database instance. . . .MySQL the flamethrower (the kids love this one). . . .But seriously, the ‘big guys’ have all this complicated jargon for really simple ideas.

    MySQL has joined them. Granted, I’d been out of the MySQL world for about a year, and some wonderful things have happened in that year. Even a year ago, the company I worked for wasn’t using the most recent software nor taking advantage of all the features their versions of MySQL did have to offer. But I digress.

    I’ve been working on MySQL knowledge, particularly with the free webinars. Today I attended the “MySQL Network and MySQL 5.0” webinar, where I learned that MySQL is packaging (better) software, support, tools, access to developers, and a knowledgebase into what they call “MySQL Network.” I was completely unclear on the concept of MySQL Network from the description, and from the name I figured it would have something to do with technical networking, not business to business networking.

    Meanwhile, yesterday I realized that the “Pluggable Storage Engines” in MySQL just mean “you can use different table types, and turn off the ones you don’t want to use.” I was familiar with the concept, but not the buzz-phrase used to describe it.
    To contact me via e-mail:

    awfief@gmail.com
    To contact me via e-mail:

    awfief@gmail.com
    Having used Oracle, DB2, Postgres, Sybase, Informix, and MSSQL, I always enjoyed that MySQL just named everything “MySQL”. Sure, it can get confusing — there’s MySQL the server, MySQL the client, MySQL the database instance. . . .MySQL the flamethrower (the kids love this one). . . .But seriously, the ‘big guys’ have all this complicated jargon for really simple ideas.

    MySQL has joined them. Granted, I’d been out of the MySQL world for about a year, and some wonderful things have happened in that year. Even a year ago, the company I worked for wasn’t using the most recent software nor taking advantage of all the features their versions of MySQL did have to offer. But I digress.

    I’ve been working on MySQL knowledge, particularly with the free webinars. Today I attended the “MySQL Network and MySQL 5.0” webinar, where I learned that MySQL is packaging (better) software, support, tools, access to developers, and a knowledgebase into what they call “MySQL Network.” I was completely unclear on the concept of MySQL Network from the description, and from the name I figured it would have something to do with technical networking, not business to business networking.

    Meanwhile, yesterday I realized that the “Pluggable Storage Engines” in MySQL just mean “you can use different table types, and turn off the ones you don’t want to use.” I was familiar with the concept, but not the buzz-phrase used to describe it.
    To contact me via e-mail:

    awfief@gmail.com
    Having used Oracle, DB2, Postgres, Sybase, Informix, and MSSQL, I always enjoyed that MySQL just named everything “MySQL”. Sure, it can get confusing — there’s MySQL the server, MySQL the client, MySQL the database instance. . . .MySQL the flamethrower (the kids love this one). . . .But seriously, the ‘big guys’ have all this complicated jargon for really simple ideas.

    MySQL has joined them. Granted, I’d been out of the MySQL world for about a year, and some wonderful things have happened in that year. Even a year ago, the company I worked for wasn’t using the most recent software nor taking advantage of all the features their versions of MySQL did have to offer. But I digress.

    I’ve been working on MySQL knowledge, particularly with the free webinars. Today I attended the “MySQL Network and MySQL 5.0” webinar, where I learned that MySQL is packaging (better) software, support, tools, access to developers, and a knowledgebase into what they call “MySQL Network.” I was completely unclear on the concept of MySQL Network from the description, and from the name I figured it would have something to do with technical networking, not business to business networking.

    Meanwhile, yesterday I realized that the “Pluggable Storage Engines” in MySQL just mean “you can use different table types, and turn off the ones you don’t want to use.” I was familiar with the concept, but not the buzz-phrase used to describe it.
    The first Boston MySQL User Group meeting went swimmingly. About 1/2 the people who RSVP’s yes or maybe showed up, ) got pizza as a thank-you gift.

    My boss offered me a ride home, definitely — I’ll just go into work later, and not be tempted by a ride home.

    The demographics of the group was really amazing:

    about 15% female
    those with no experience with any database
    those with experience with databases but not MySQL
    those who’ve been using MySQL for weeks
    those who’ve been using MySQL for months
    those who’ve been using MySQL for years
    those who are trying to convince their company to switch
    about 10% Indian
    about 20% other-Asian
    (I didn’t notice anyone that was recognizably Hispanic or black)
    job titles ranging from developer, dba, all the way up to the vice president and president level
    The publishing sector was represented by O’Reilly, Addison-Wesley (which is owned by Pearson, which handles the MySQL Press imprint), and Apress. O’Reilly and Apress solicited authors.

    Corrections always welcome, and special thanks to Mike Kruckenberg, and Mark Rubin of MySQL AB.

    I cannot wait for next month. . .
    To contact me via e-mail:

    awfief@gmail.com
    To contact me via e-mail:

    awfief@gmail.com
    Having used Oracle, DB2, Postgres, Sybase, Informix, and MSSQL, I always enjoyed that MySQL just named everything “MySQL”. Sure, it can get confusing — there’s MySQL the server, MySQL the client, MySQL the database instance. . . .MySQL the flamethrower (the kids love this one). . . .But seriously, the ‘big guys’ have all this complicated jargon for really simple ideas.

    MySQL has joined them. Granted, I’d been out of the MySQL world for about a year, and some wonderful things have happened in that year. Even a year ago, the company I worked for wasn’t using the most recent software nor taking advantage of all the features their versions of MySQL did have to offer. But I digress.

    I’ve been working on MySQL knowledge, particularly with the free webinars. Today I attended the “MySQL Network and MySQL 5.0” webinar, where I learned that MySQL is packaging (better) software, support, tools, access to developers, and a knowledgebase into what they call “MySQL Network.” I was completely unclear on the concept of MySQL Network from the description, and from the name I figured it would have something to do with technical networking, not business to business networking.

    Meanwhile, yesterday I realized that the “Pluggable Storage Engines” in MySQL just mean “you can use different table types, and turn off the ones you don’t want to use.” I was familiar with the concept, but not the buzz-phrase used to describe it.
    To contact me via e-mail:

    awfief@gmail.com
    Having used Oracle, DB2, Postgres, Sybase, Informix, and MSSQL, I always enjoyed that MySQL just named everything “MySQL”. Sure, it can get confusing — there’s MySQL the server, MySQL the client, MySQL the database instance. . . .MySQL the flamethrower (the kids love this one). . . .But seriously, the ‘big guys’ have all this complicated jargon for really simple ideas.

    MySQL has joined them. Granted, I’d been out of the MySQL world for about a year, and some wonderful things have happened in that year. Even a year ago, the company I worked for wasn’t using the most recent software nor taking advantage of all the features their versions of MySQL did have to offer. But I digress.

    I’ve been working on MySQL knowledge, particularly with the free webinars. Today I attended the “MySQL Network and MySQL 5.0” webinar, where I learned that MySQL is packaging (better) software, support, tools, access to developers, and a knowledgebase into what they call “MySQL Network.” I was completely unclear on the concept of MySQL Network from the description, and from the name I figured it would have something to do with technical networking, not business to business networking.

    Meanwhile, yesterday I realized that the “Pluggable Storage Engines” in MySQL just mean “you can use different table types, and turn off the ones you don’t want to use.” I was familiar with the concept, but not the buzz-phrase used to describe it.
    The first Boston MySQL User Group meeting went swimmingly. About 1/2 the people who RSVP’s yes or maybe showed up, ) got pizza as a thank-you gift.

    My boss offered me a ride home, definitely — I’ll just go into work later, and not be tempted by a ride home.

    The demographics of the group was really amazing:

    about 15% female
    those with no experience with any database
    those with experience with databases but not MySQL
    those who’ve been using MySQL for weeks
    those who’ve been using MySQL for months
    those who’ve been using MySQL for years
    those who are trying to convince their company to switch
    about 10% Indian
    about 20% other-Asian
    (I didn’t notice anyone that was recognizably Hispanic or black)
    job titles ranging from developer, dba, all the way up to the vice president and president level
    The publishing sector was represented by O’Reilly, Addison-Wesley (which is owned by Pearson, which handles the MySQL Press imprint), and Apress. O’Reilly and Apress solicited authors.

    Corrections always welcome, and special thanks to Mike Kruckenberg, and Mark Rubin of MySQL AB.

    I cannot wait for next month. . .
    To contact me via e-mail:

    awfief@gmail.com
    Having used Oracle, DB2, Postgres, Sybase, Informix, and MSSQL, I always enjoyed that MySQL just named everything “MySQL”. Sure, it can get confusing — there’s MySQL the server, MySQL the client, MySQL the database instance. . . .MySQL the flamethrower (the kids love this one). . . .But seriously, the ‘big guys’ have all this complicated jargon for really simple ideas.

    MySQL has joined them. Granted, I’d been out of the MySQL world for about a year, and some wonderful things have happened in that year. Even a year ago, the company I worked for wasn’t using the most recent software nor taking advantage of all the features their versions of MySQL did have to offer. But I digress.

    I’ve been working on MySQL knowledge, particularly with the free webinars. Today I attended the “MySQL Network and MySQL 5.0” webinar, where I learned that MySQL is packaging (better) software, support, tools, access to developers, and a knowledgebase into what they call “MySQL Network.” I was completely unclear on the concept of MySQL Network from the description, and from the name I figured it would have something to do with technical networking, not business to business networking.

    Meanwhile, yesterday I realized that the “Pluggable Storage Engines” in MySQL just mean “you can use different table types, and turn off the ones you don’t want to use.” I was familiar with the concept, but not the buzz-phrase used to describe it.
    The first Boston MySQL User Group meeting went swimmingly. About 1/2 the people who RSVP’s yes or maybe showed up, ) got pizza as a thank-you gift.

    My boss offered me a ride home, definitely — I’ll just go into work later, and not be tempted by a ride home.

    The demographics of the group was really amazing:

    about 15% female
    those with no experience with any database
    those with experience with databases but not MySQL
    those who’ve been using MySQL for weeks
    those who’ve been using MySQL for months
    those who’ve been using MySQL for years
    those who are trying to convince their company to switch
    about 10% Indian
    about 20% other-Asian
    (I didn’t notice anyone that was recognizably Hispanic or black)
    job titles ranging from developer, dba, all the way up to the vice president and president level
    The publishing sector was represented by O’Reilly, Addison-Wesley (which is owned by Pearson, which handles the MySQL Press imprint), and Apress. O’Reilly and Apress solicited authors.

    Corrections always welcome, and special thanks to Mike Kruckenberg, and Mark Rubin of MySQL AB.

    I cannot wait for next month. . .
    So I’ve been looking into the Archive Storage Engine. What I would really like to do with it is get data in realtime, they want all the data up until “now”.

    It is inadvisable to replicate from one storage engine type to another. I have not yet played with it, but since an Archive table does not allow updates and deletes, replicating from a MyISAM or InnoDB table to an Archive one is a bad idea.

    Most folks probably run a batch job; but I wonder if it can be done in real-time. Or rather, ‘what is the best way to run it real-time?’ One way, off the top of my head, is to do this are to replicate to a blackhole table with a trigger, to insert into an archive table whenever an INSERT statement is called. The blackhole table should not give an error upon UPDATE or DELETE statements.

    This also allows for easy aggregation, because the trigger can say “update the count and the country of a new profile” instead of having an entire replicated set of data, with reports running “SELECT count(*)”. Instead of copying all the data and running the same reports on a different server/table, we can now collect the data we actually want, which is “1 new paid membership at time t located in Sao Paulo, Brazil.” For reporting, we do not care what the name of the member is.

    I have searched around but have not yet found how users are getting data into their archived databases. I need a sandbox server at work so I can play with the options.
    To contact me via e-mail:

    awfief@gmail.com
    To contact me via e-mail:

    awfief@gmail.com
    Having used Oracle, DB2, Postgres, Sybase, Informix, and MSSQL, I always enjoyed that MySQL just named everything “MySQL”. Sure, it can get confusing — there’s MySQL the server, MySQL the client, MySQL the database instance. . . .MySQL the flamethrower (the kids love this one). . . .But seriously, the ‘big guys’ have all this complicated jargon for really simple ideas.

    MySQL has joined them. Granted, I’d been out of the MySQL world for about a year, and some wonderful things have happened in that year. Even a year ago, the company I worked for wasn’t using the most recent software nor taking advantage of all the features their versions of MySQL did have to offer. But I digress.

    I’ve been working on MySQL knowledge, particularly with the free webinars. Today I attended the “MySQL Network and MySQL 5.0” webinar, where I learned that MySQL is packaging (better) software, support, tools, access to developers, and a knowledgebase into what they call “MySQL Network.” I was completely unclear on the concept of MySQL Network from the description, and from the name I figured it would have something to do with technical networking, not business to business networking.

    Meanwhile, yesterday I realized that the “Pluggable Storage Engines” in MySQL just mean “you can use different table types, and turn off the ones you don’t want to use.” I was familiar with the concept, but not the buzz-phrase used to describe it.
    To contact me via e-mail:

    awfief@gmail.com
    Having used Oracle, DB2, Postgres, Sybase, Informix, and MSSQL, I always enjoyed that MySQL just named everything “MySQL”. Sure, it can get confusing — there’s MySQL the server, MySQL the client, MySQL the database instance. . . .MySQL the flamethrower (the kids love this one). . . .But seriously, the ‘big guys’ have all this complicated jargon for really simple ideas.

    MySQL has joined them. Granted, I’d been out of the MySQL world for about a year, and some wonderful things have happened in that year. Even a year ago, the company I worked for wasn’t using the most recent software nor taking advantage of all the features their versions of MySQL did have to offer. But I digress.

    I’ve been working on MySQL knowledge, particularly with the free webinars. Today I attended the “MySQL Network and MySQL 5.0” webinar, where I learned that MySQL is packaging (better) software, support, tools, access to developers, and a knowledgebase into what they call “MySQL Network.” I was completely unclear on the concept of MySQL Network from the description, and from the name I figured it would have something to do with technical networking, not business to business networking.

    Meanwhile, yesterday I realized that the “Pluggable Storage Engines” in MySQL just mean “you can use different table types, and turn off the ones you don’t want to use.” I was familiar with the concept, but not the buzz-phrase used to describe it.
    The first Boston MySQL User Group meeting went swimmingly. About 1/2 the people who RSVP’s yes or maybe showed up, ) got pizza as a thank-you gift.

    My boss offered me a ride home, definitely — I’ll just go into work later, and not be tempted by a ride home.

    The demographics of the group was really amazing:

    about 15% female
    those with no experience with any database
    those with experience with databases but not MySQL
    those who’ve been using MySQL for weeks
    those who’ve been using MySQL for months
    those who’ve been using MySQL for years
    those who are trying to convince their company to switch
    about 10% Indian
    about 20% other-Asian
    (I didn’t notice anyone that was recognizably Hispanic or black)
    job titles ranging from developer, dba, all the way up to the vice president and president level
    The publishing sector was represented by O’Reilly, Addison-Wesley (which is owned by Pearson, which handles the MySQL Press imprint), and Apress. O’Reilly and Apress solicited authors.

    Corrections always welcome, and special thanks to Mike Kruckenberg, and Mark Rubin of MySQL AB.

    I cannot wait for next month. . .
    To contact me via e-mail:

    awfief@gmail.com
    Having used Oracle, DB2, Postgres, Sybase, Informix, and MSSQL, I always enjoyed that MySQL just named everything “MySQL”. Sure, it can get confusing — there’s MySQL the server, MySQL the client, MySQL the database instance. . . .MySQL the flamethrower (the kids love this one). . . .But seriously, the ‘big guys’ have all this complicated jargon for really simple ideas.

    MySQL has joined them. Granted, I’d been out of the MySQL world for about a year, and some wonderful things have happened in that year. Even a year ago, the company I worked for wasn’t using the most recent software nor taking advantage of all the features their versions of MySQL did have to offer. But I digress.

    I’ve been working on MySQL knowledge, particularly with the free webinars. Today I attended the “MySQL Network and MySQL 5.0” webinar, where I learned that MySQL is packaging (better) software, support, tools, access to developers, and a knowledgebase into what they call “MySQL Network.” I was completely unclear on the concept of MySQL Network from the description, and from the name I figured it would have something to do with technical networking, not business to business networking.

    Meanwhile, yesterday I realized that the “Pluggable Storage Engines” in MySQL just mean “you can use different table types, and turn off the ones you don’t want to use.” I was familiar with the concept, but not the buzz-phrase used to describe it.
    The first Boston MySQL User Group meeting went swimmingly. About 1/2 the people who RSVP’s yes or maybe showed up, ) got pizza as a thank-you gift.

    My boss offered me a ride home, definitely — I’ll just go into work later, and not be tempted by a ride home.

    The demographics of the group was really amazing:

    about 15% female
    those with no experience with any database
    those with experience with databases but not MySQL
    those who’ve been using MySQL for weeks
    those who’ve been using MySQL for months
    those who’ve been using MySQL for years
    those who are trying to convince their company to switch
    about 10% Indian
    about 20% other-Asian
    (I didn’t notice anyone that was recognizably Hispanic or black)
    job titles ranging from developer, dba, all the way up to the vice president and president level
    The publishing sector was represented by O’Reilly, Addison-Wesley (which is owned by Pearson, which handles the MySQL Press imprint), and Apress. O’Reilly and Apress solicited authors.

    Corrections always welcome, and special thanks to Mike Kruckenberg, and Mark Rubin of MySQL AB.

    I cannot wait for next month. . .
    So I’ve been looking into the Archive Storage Engine. What I would really like to do with it is get data in realtime, they want all the data up until “now”.

    It is inadvisable to replicate from one storage engine type to another. I have not yet played with it, but since an Archive table does not allow updates and deletes, replicating from a MyISAM or InnoDB table to an Archive one is a bad idea.

    Most folks probably run a batch job; but I wonder if it can be done in real-time. Or rather, ‘what is the best way to run it real-time?’ One way, off the top of my head, is to do this are to replicate to a blackhole table with a trigger, to insert into an archive table whenever an INSERT statement is called. The blackhole table should not give an error upon UPDATE or DELETE statements.

    This also allows for easy aggregation, because the trigger can say “update the count and the country of a new profile” instead of having an entire replicated set of data, with reports running “SELECT count(*)”. Instead of copying all the data and running the same reports on a different server/table, we can now collect the data we actually want, which is “1 new paid membership at time t located in Sao Paulo, Brazil.” For reporting, we do not care what the name of the member is.

    I have searched around but have not yet found how users are getting data into their archived databases. I need a sandbox server at work so I can play with the options.
    To contact me via e-mail:

    awfief@gmail.com
    Having used Oracle, DB2, Postgres, Sybase, Informix, and MSSQL, I always enjoyed that MySQL just named everything “MySQL”. Sure, it can get confusing — there’s MySQL the server, MySQL the client, MySQL the database instance. . . .MySQL the flamethrower (the kids love this one). . . .But seriously, the ‘big guys’ have all this complicated jargon for really simple ideas.

    MySQL has joined them. Granted, I’d been out of the MySQL world for about a year, and some wonderful things have happened in that year. Even a year ago, the company I worked for wasn’t using the most recent software nor taking advantage of all the features their versions of MySQL did have to offer. But I digress.

    I’ve been working on MySQL knowledge, particularly with the free webinars. Today I attended the “MySQL Network and MySQL 5.0” webinar, where I learned that MySQL is packaging (better) software, support, tools, access to developers, and a knowledgebase into what they call “MySQL Network.” I was completely unclear on the concept of MySQL Network from the description, and from the name I figured it would have something to do with technical networking, not business to business networking.

    Meanwhile, yesterday I realized that the “Pluggable Storage Engines” in MySQL just mean “you can use different table types, and turn off the ones you don’t want to use.” I was familiar with the concept, but not the buzz-phrase used to describe it.
    The first Boston MySQL User Group meeting went swimmingly. About 1/2 the people who RSVP’s yes or maybe showed up, ) got pizza as a thank-you gift.

    My boss offered me a ride home, definitely — I’ll just go into work later, and not be tempted by a ride home.

    The demographics of the group was really amazing:

    about 15% female
    those with no experience with any database
    those with experience with databases but not MySQL
    those who’ve been using MySQL for weeks
    those who’ve been using MySQL for months
    those who’ve been using MySQL for years
    those who are trying to convince their company to switch
    about 10% Indian
    about 20% other-Asian
    (I didn’t notice anyone that was recognizably Hispanic or black)
    job titles ranging from developer, dba, all the way up to the vice president and president level
    The publishing sector was represented by O’Reilly, Addison-Wesley (which is owned by Pearson, which handles the MySQL Press imprint), and Apress. O’Reilly and Apress solicited authors.

    Corrections always welcome, and special thanks to Mike Kruckenberg, and Mark Rubin of MySQL AB.

    I cannot wait for next month. . .
    So I’ve been looking into the Archive Storage Engine. What I would really like to do with it is get data in realtime, they want all the data up until “now”.

    It is inadvisable to replicate from one storage engine type to another. I have not yet played with it, but since an Archive table does not allow updates and deletes, replicating from a MyISAM or InnoDB table to an Archive one is a bad idea.

    Most folks probably run a batch job; but I wonder if it can be done in real-time. Or rather, ‘what is the best way to run it real-time?’ One way, off the top of my head, is to do this are to replicate to a blackhole table with a trigger, to insert into an archive table whenever an INSERT statement is called. The blackhole table should not give an error upon UPDATE or DELETE statements.

    This also allows for easy aggregation, because the trigger can say “update the count and the country of a new profile” instead of having an entire replicated set of data, with reports running “SELECT count(*)”. Instead of copying all the data and running the same reports on a different server/table, we can now collect the data we actually want, which is “1 new paid membership at time t located in Sao Paulo, Brazil.” For reporting, we do not care what the name of the member is.

    I have searched around but have not yet found how users are getting data into their archived databases. I need a sandbox server at work so I can play with the options.
    Why is it that running a monitoring system is such a marketable skill? In every company, having servers up is important. So does everyone have a robust monitoring system? Sure, folks implement scripts, but those are always one-off solutions. And what IS it with folks writing scripts that e-mail them when they’re successful? Just touch a file and have a simple script check to see if the file was touched, that way you know when it failed and do not have to rely on “did I get that e-mail?”

    I was lucky to have worked on Nagios while at Tufts, and have set it up everywhere else I’ve worked. I should definitely remember to check up on the plugins site from time to time, so I do not duplicate efforts. Of course, submitting plugins would also eliminate the problem of “I wrote that check at my last company, and now I want it again!” I have not yet come across that, although I did come across a plugin I could have used at my last job, checking data via snmp, that would have been useful, and which I will implement here.

    For now, nagios (2.0) is happy. I’ve written checks to find the lag time on replication and the remaining innodb space. Behind the cut is a 50k picture of the nagios server, currently happy.

    I also intend to use the reporting feature of nagios to get stats on how available our services are.

    Continue reading