Data Warehousing Best Practices: Comparing Oracle to MySQL, part 2 (partitioning)

Ronald Bradford’s recent warning to be sure to know your my.cnf sections reminded me of a similar issue that I ran into last summer, sickness where putting the “group” option in both the [mysqld_safe] and [mysqld] directives resulted in a mostly silent problem.

I started noticing this in MySQL 5.1 and it affected both the official MySQL binary and the Percona binary. In trying to be conscientious, cialis sale I had the following set:

[mysqld_safe]
user=mysql
group=mysql

[mysqld]
user=mysql
group=mysql

However, purchase when the MySQL server started up, the error log showed

[Warning] option 'group_concat_max_len': unsigned value 0 adjusted to 4

This was obviously a problem, but I only started noticing it during MySQL restarts, which was mostly during upgrades to MySQL 5.1. I tracked it down and realized that when I removed the “group” option from the [mysqld] directive, the warning did not come up.

The problem is that [safe_mysqld] sees “group” as the “group” option, but [mysqld] does not know about the “group” option. The MySQL server allows the shortest unique identifier of an option to *be* that option. Thus, “group” is an acceptable abbreviation for “group_concat_max_len”.

So mysqld was taking:
group=mysql

and translating it to:
group_concat_max_len=mysql

but “mysql” is a string, not a number, so MySQL tried to be helpful by converting to a number….so it was as if I stated:
group_concat_max_len=0

I filed a bug for this back in June:
http://bugs.mysql.com/bug.php?id=45379. The response was “If 3 different people ask about removing this feature reclassifying report to feature request with new synopsis.”

So, a second moral: make a bug report if you want things to get changed, and if you see a bug report for a problem you’re encountering, make sure to add your voice so that MySQL understands that an issue is indeed serious.

Ronald Bradford’s recent warning to be sure to know your my.cnf sections reminded me of a similar issue that I ran into last summer, sickness where putting the “group” option in both the [mysqld_safe] and [mysqld] directives resulted in a mostly silent problem.

I started noticing this in MySQL 5.1 and it affected both the official MySQL binary and the Percona binary. In trying to be conscientious, cialis sale I had the following set:

[mysqld_safe]
user=mysql
group=mysql

[mysqld]
user=mysql
group=mysql

However, purchase when the MySQL server started up, the error log showed

[Warning] option 'group_concat_max_len': unsigned value 0 adjusted to 4

This was obviously a problem, but I only started noticing it during MySQL restarts, which was mostly during upgrades to MySQL 5.1. I tracked it down and realized that when I removed the “group” option from the [mysqld] directive, the warning did not come up.

The problem is that [safe_mysqld] sees “group” as the “group” option, but [mysqld] does not know about the “group” option. The MySQL server allows the shortest unique identifier of an option to *be* that option. Thus, “group” is an acceptable abbreviation for “group_concat_max_len”.

So mysqld was taking:
group=mysql

and translating it to:
group_concat_max_len=mysql

but “mysql” is a string, not a number, so MySQL tried to be helpful by converting to a number….so it was as if I stated:
group_concat_max_len=0

I filed a bug for this back in June:
http://bugs.mysql.com/bug.php?id=45379. The response was “If 3 different people ask about removing this feature reclassifying report to feature request with new synopsis.”

So, a second moral: make a bug report if you want things to get changed, and if you see a bug report for a problem you’re encountering, make sure to add your voice so that MySQL understands that an issue is indeed serious.

Last month at the Boston MySQL User Group, more about .

Here’s the video:

Ronald Bradford’s recent warning to be sure to know your my.cnf sections reminded me of a similar issue that I ran into last summer, sickness where putting the “group” option in both the [mysqld_safe] and [mysqld] directives resulted in a mostly silent problem.

I started noticing this in MySQL 5.1 and it affected both the official MySQL binary and the Percona binary. In trying to be conscientious, cialis sale I had the following set:

[mysqld_safe]
user=mysql
group=mysql

[mysqld]
user=mysql
group=mysql

However, purchase when the MySQL server started up, the error log showed

[Warning] option 'group_concat_max_len': unsigned value 0 adjusted to 4

This was obviously a problem, but I only started noticing it during MySQL restarts, which was mostly during upgrades to MySQL 5.1. I tracked it down and realized that when I removed the “group” option from the [mysqld] directive, the warning did not come up.

The problem is that [safe_mysqld] sees “group” as the “group” option, but [mysqld] does not know about the “group” option. The MySQL server allows the shortest unique identifier of an option to *be* that option. Thus, “group” is an acceptable abbreviation for “group_concat_max_len”.

So mysqld was taking:
group=mysql

and translating it to:
group_concat_max_len=mysql

but “mysql” is a string, not a number, so MySQL tried to be helpful by converting to a number….so it was as if I stated:
group_concat_max_len=0

I filed a bug for this back in June:
http://bugs.mysql.com/bug.php?id=45379. The response was “If 3 different people ask about removing this feature reclassifying report to feature request with new synopsis.”

So, a second moral: make a bug report if you want things to get changed, and if you see a bug report for a problem you’re encountering, make sure to add your voice so that MySQL understands that an issue is indeed serious.

Last month at the Boston MySQL User Group, more about .

Here’s the video:

There are those that are very adamant about letting people know that using INFORMATION_SCHEMA can crash your database. For example, plague in making changes to many tables at once Baron writes:

“querying the INFORMATION_SCHEMA database on MySQL can completely lock a busy server for a long time. It can even crash it. It is very dangerous.”

Though Baron is telling the truth here, he left out one extremely important piece of information: you can actually figure out how dangerous your INFORMATION_SCHEMA query will be, ahead of time, using EXPLAIN.


In MySQL 5.1.21 and higher, not only were optimizations made to the INFORMATION_SCHEMA, but new values were added so that EXPLAIN had better visibility into what MySQL is actually doing. As per http://dev.mysql.com/doc/refman/5.1/en/information-schema-optimization.html there are 6 new “Extra” values for EXPLAIN that are used only for INFORMATION_SCHEMA queries.

The first 2 “Extra” values for EXPLAIN are mostly self-explanatory:
Scanned 1 database – Only one database directory needs to be scanned.
Scanned all databases – All database directories are scanned. This is more dangerous than only scanning one database.

Note that there is no middle ground — there is no optimization to only scan 2 databases; either all database directories are scanned, or only one is. If your query spans more than one database, then all database directories are scanned. Note that this

SHOW statements are less dangerous than using INFORMATION_SCHEMA because they only use one database at a time. If you have an INFORMATION_SCHEMA query that produces an “Extra” value of “Scanned 1 database”, it is just as safe as a SHOW statement.

The optimizations went even further, though. From the most “dangerous” — ie, resource intensive — to the least, here are the other 4 “Extra” values introduced in MySQL 5.1.21 (which, for the record, came out in August 2007, so it is a feature that has been around for 2.5 years at this point):

Open_full_table
Open_trigger_only
Open_frm_only
Skip_open_table

A bit more explanation, and some examples:

Open_full_table – Needs to open all the metadata, including the tables format file (.frm) and data/index files such as .MYD and .MYI. The previously linked to manual page about the optimization includes which information will show each “Extra” type — for example, the AUTO_INCREMENT and DATA_LENGTH fields of the TABLES table require opening all the metadata.

mysql> EXPLAIN SELECT TABLE_SCHEMA,TABLE_NAME,AUTO_INCREMENT FROM TABLESG
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: TABLES
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: NULL
Extra: Open_full_table; Scanned all databases
1 row in set (0.00 sec)

Let’s see an example that only scans 1 database:

mysql> EXPLAIN TABLE_NAME,AUTO_INCREMENT FROM TABLES WHERE TABLE_SCHEMA='test'G
ERROR 1109 (42S02): Unknown table 'TABLE_NAME' in information_schema
mysql> EXPLAIN SELECT TABLE_NAME,AUTO_INCREMENT FROM TABLES WHERE TABLE_SCHEMA='test'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: TABLES
type: ALL
possible_keys: NULL
key: TABLE_SCHEMA
key_len: NULL
ref: NULL
rows: NULL
Extra: Using where; Open_full_table; Scanned 1 database
1 row in set (0.00 sec)

Note that “Scanned all databases” will apply if there is any way there could be more than one database. For example, on my test server, only the ‘test’ and ’sakila’ databases exist (other than ‘mysql’ and ‘INFORMATION_SCHEMA’ of course) and yet when I do

EXPLAIN SELECT TABLE_NAME,AUTO_INCREMENT FROM TABLES WHERE TABLE_SCHEMA LIKE 'test%'G

I still get “Scanned all databases”. So be careful.

One of the basic pieces of advice I see to optimize queries can be applied to queries on the INFORMATION_SCHEMA — Do not use SELECT * unless you actually want to get every single piece of information. In the case of INFORMATION_SCHEMA, optimizing your queries can mean the difference between the server crashing and the server staying up.

Open_trigger_only – Only the .TRG file needs to be opened. Interestingly enough, this does not seem to have an example that applies. The manual page says that the TRIGGERS table uses Open_full_table for all fields. When I tested it, though, I did not get anything in the “Extra” field at all — not “Open_trigger_only” and not even “Open_full_table”:

mysql> select @@version;
+---------------------+
| @@version           |
+---------------------+
| 5.1.37-1ubuntu5-log |
+---------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM TRIGGERSG
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: TRIGGERS
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: NULL
Extra:
1 row in set (0.00 sec)

Open_frm_only – Only the format file (.frm) of the table needs to be open. Again, check the manual page for the fields that can use this optimization — fields such as CREATE_OPTIONS and ENGINE in the TABLES table do, for example.

Skip_open_table – This is the last new “Extra” feature, and it is the best. This optimization type means that no files need to be opened. The database directories are scanned and information can be obtained — mostly the table name, so when querying only the TABLE_NAME and TABLE_SCHEMA fields from the TABLES table, your query is safe.

So instead of putting your head in the sand and never using the great tool that is the INFORMATION_SCHEMA, first EXPLAIN your query to see if it will work or not.

(Note, if you are still on MySQL 5.0, what are you waiting for? The upgrade to MySQL 5.1 is relatively painless, and Pythian has a comprehensive checklist for how to upgrade while keeping your sanity).

Ronald Bradford’s recent warning to be sure to know your my.cnf sections reminded me of a similar issue that I ran into last summer, sickness where putting the “group” option in both the [mysqld_safe] and [mysqld] directives resulted in a mostly silent problem.

I started noticing this in MySQL 5.1 and it affected both the official MySQL binary and the Percona binary. In trying to be conscientious, cialis sale I had the following set:

[mysqld_safe]
user=mysql
group=mysql

[mysqld]
user=mysql
group=mysql

However, purchase when the MySQL server started up, the error log showed

[Warning] option 'group_concat_max_len': unsigned value 0 adjusted to 4

This was obviously a problem, but I only started noticing it during MySQL restarts, which was mostly during upgrades to MySQL 5.1. I tracked it down and realized that when I removed the “group” option from the [mysqld] directive, the warning did not come up.

The problem is that [safe_mysqld] sees “group” as the “group” option, but [mysqld] does not know about the “group” option. The MySQL server allows the shortest unique identifier of an option to *be* that option. Thus, “group” is an acceptable abbreviation for “group_concat_max_len”.

So mysqld was taking:
group=mysql

and translating it to:
group_concat_max_len=mysql

but “mysql” is a string, not a number, so MySQL tried to be helpful by converting to a number….so it was as if I stated:
group_concat_max_len=0

I filed a bug for this back in June:
http://bugs.mysql.com/bug.php?id=45379. The response was “If 3 different people ask about removing this feature reclassifying report to feature request with new synopsis.”

So, a second moral: make a bug report if you want things to get changed, and if you see a bug report for a problem you’re encountering, make sure to add your voice so that MySQL understands that an issue is indeed serious.

Last month at the Boston MySQL User Group, more about .

Here’s the video:

There are those that are very adamant about letting people know that using INFORMATION_SCHEMA can crash your database. For example, plague in making changes to many tables at once Baron writes:

“querying the INFORMATION_SCHEMA database on MySQL can completely lock a busy server for a long time. It can even crash it. It is very dangerous.”

Though Baron is telling the truth here, he left out one extremely important piece of information: you can actually figure out how dangerous your INFORMATION_SCHEMA query will be, ahead of time, using EXPLAIN.


In MySQL 5.1.21 and higher, not only were optimizations made to the INFORMATION_SCHEMA, but new values were added so that EXPLAIN had better visibility into what MySQL is actually doing. As per http://dev.mysql.com/doc/refman/5.1/en/information-schema-optimization.html there are 6 new “Extra” values for EXPLAIN that are used only for INFORMATION_SCHEMA queries.

The first 2 “Extra” values for EXPLAIN are mostly self-explanatory:
Scanned 1 database – Only one database directory needs to be scanned.
Scanned all databases – All database directories are scanned. This is more dangerous than only scanning one database.

Note that there is no middle ground — there is no optimization to only scan 2 databases; either all database directories are scanned, or only one is. If your query spans more than one database, then all database directories are scanned. Note that this

SHOW statements are less dangerous than using INFORMATION_SCHEMA because they only use one database at a time. If you have an INFORMATION_SCHEMA query that produces an “Extra” value of “Scanned 1 database”, it is just as safe as a SHOW statement.

The optimizations went even further, though. From the most “dangerous” — ie, resource intensive — to the least, here are the other 4 “Extra” values introduced in MySQL 5.1.21 (which, for the record, came out in August 2007, so it is a feature that has been around for 2.5 years at this point):

Open_full_table
Open_trigger_only
Open_frm_only
Skip_open_table

A bit more explanation, and some examples:

Open_full_table – Needs to open all the metadata, including the tables format file (.frm) and data/index files such as .MYD and .MYI. The previously linked to manual page about the optimization includes which information will show each “Extra” type — for example, the AUTO_INCREMENT and DATA_LENGTH fields of the TABLES table require opening all the metadata.

mysql> EXPLAIN SELECT TABLE_SCHEMA,TABLE_NAME,AUTO_INCREMENT FROM TABLESG
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: TABLES
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: NULL
Extra: Open_full_table; Scanned all databases
1 row in set (0.00 sec)

Let’s see an example that only scans 1 database:

mysql> EXPLAIN TABLE_NAME,AUTO_INCREMENT FROM TABLES WHERE TABLE_SCHEMA='test'G
ERROR 1109 (42S02): Unknown table 'TABLE_NAME' in information_schema
mysql> EXPLAIN SELECT TABLE_NAME,AUTO_INCREMENT FROM TABLES WHERE TABLE_SCHEMA='test'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: TABLES
type: ALL
possible_keys: NULL
key: TABLE_SCHEMA
key_len: NULL
ref: NULL
rows: NULL
Extra: Using where; Open_full_table; Scanned 1 database
1 row in set (0.00 sec)

Note that “Scanned all databases” will apply if there is any way there could be more than one database. For example, on my test server, only the ‘test’ and ’sakila’ databases exist (other than ‘mysql’ and ‘INFORMATION_SCHEMA’ of course) and yet when I do

EXPLAIN SELECT TABLE_NAME,AUTO_INCREMENT FROM TABLES WHERE TABLE_SCHEMA LIKE 'test%'G

I still get “Scanned all databases”. So be careful.

One of the basic pieces of advice I see to optimize queries can be applied to queries on the INFORMATION_SCHEMA — Do not use SELECT * unless you actually want to get every single piece of information. In the case of INFORMATION_SCHEMA, optimizing your queries can mean the difference between the server crashing and the server staying up.

Open_trigger_only – Only the .TRG file needs to be opened. Interestingly enough, this does not seem to have an example that applies. The manual page says that the TRIGGERS table uses Open_full_table for all fields. When I tested it, though, I did not get anything in the “Extra” field at all — not “Open_trigger_only” and not even “Open_full_table”:

mysql> select @@version;
+---------------------+
| @@version           |
+---------------------+
| 5.1.37-1ubuntu5-log |
+---------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM TRIGGERSG
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: TRIGGERS
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: NULL
Extra:
1 row in set (0.00 sec)

Open_frm_only – Only the format file (.frm) of the table needs to be open. Again, check the manual page for the fields that can use this optimization — fields such as CREATE_OPTIONS and ENGINE in the TABLES table do, for example.

Skip_open_table – This is the last new “Extra” feature, and it is the best. This optimization type means that no files need to be opened. The database directories are scanned and information can be obtained — mostly the table name, so when querying only the TABLE_NAME and TABLE_SCHEMA fields from the TABLES table, your query is safe.

So instead of putting your head in the sand and never using the great tool that is the INFORMATION_SCHEMA, first EXPLAIN your query to see if it will work or not.

(Note, if you are still on MySQL 5.0, what are you waiting for? The upgrade to MySQL 5.1 is relatively painless, and Pythian has a comprehensive checklist for how to upgrade while keeping your sanity).

Just the facts:
What: MySQL user community dinner
Who: me, prescription and many MySQL community members
When: Monday, there April 12th – Meet at 6:30 at the Hyatt Santa Clara or at 7 pm at the restaurant
Where: Pedro’s Restaurant and Cantina – 3935 Freedom Circle, Santa Clara, CA 95054
How: Comment on this blog post to add your name to the list of probable attendees

I was sad that last year there was no community dinner, and I missed the one the year before when Jonathan Schwartz and Rich Green made an appearance. This year I am determined not to miss it, and so I am calling for a community (pay-your-own-way) dinner on Monday, April 12th, at Pedro’s – a Mexican restaurant that has vegetarian and vegan options. I think Monday is a better time because many folks arrive Sunday evening, or even Monday morning (there are tutorials on Monday, but not everyone attends).

Pedro’s can handle large groups of people, but we would like to have a vague idea of how many people are attending — while you are not required to RSVP, we would like to make an accurate reservation at Pedro’s….In 2008, there was a wiki page with a list of attendees, and I was disappointed because there were so many people on that list I wanted to see.

Meet us at 6:30 pm on Monday in the lobby of the Hyatt Santa Clara, or at 7 pm at Pedro’s. If you want to come later, just show up at Pedro’s whenever you can.

Since commenting on this blog does not require registration (as the wiki does), I invite folks to comment on this blog post and I’ll add you to the list of attendees:

Sheeri K. Cabral (The Pythian Group)
Paul Vallee (The Pythian Group)
Rob Hamel (The Pythian Group)
Giuseppe Maxia (Sun)
Brian Aker (Drizzle)
Konstantin Osipov (Sun)
Mark Callaghan (Facebook) (will arrive later)
Wagner Bianchi (EAC Software, Brazil)
Bill Karwin (Karwin Software Solutions)
Maxim Volkov (OpenCandy)
Brian Moon (DealNews) – note: Monday Apr 12th is Brian’s birthday!
Rob Peck (DealNews)
Arjen Lentz (OpenQuery)
Vadim Tkachenko (Percona)
Rohit Nadhani (WebYog)

Ronald Bradford’s recent warning to be sure to know your my.cnf sections reminded me of a similar issue that I ran into last summer, sickness where putting the “group” option in both the [mysqld_safe] and [mysqld] directives resulted in a mostly silent problem.

I started noticing this in MySQL 5.1 and it affected both the official MySQL binary and the Percona binary. In trying to be conscientious, cialis sale I had the following set:

[mysqld_safe]
user=mysql
group=mysql

[mysqld]
user=mysql
group=mysql

However, purchase when the MySQL server started up, the error log showed

[Warning] option 'group_concat_max_len': unsigned value 0 adjusted to 4

This was obviously a problem, but I only started noticing it during MySQL restarts, which was mostly during upgrades to MySQL 5.1. I tracked it down and realized that when I removed the “group” option from the [mysqld] directive, the warning did not come up.

The problem is that [safe_mysqld] sees “group” as the “group” option, but [mysqld] does not know about the “group” option. The MySQL server allows the shortest unique identifier of an option to *be* that option. Thus, “group” is an acceptable abbreviation for “group_concat_max_len”.

So mysqld was taking:
group=mysql

and translating it to:
group_concat_max_len=mysql

but “mysql” is a string, not a number, so MySQL tried to be helpful by converting to a number….so it was as if I stated:
group_concat_max_len=0

I filed a bug for this back in June:
http://bugs.mysql.com/bug.php?id=45379. The response was “If 3 different people ask about removing this feature reclassifying report to feature request with new synopsis.”

So, a second moral: make a bug report if you want things to get changed, and if you see a bug report for a problem you’re encountering, make sure to add your voice so that MySQL understands that an issue is indeed serious.

Last month at the Boston MySQL User Group, more about .

Here’s the video:

There are those that are very adamant about letting people know that using INFORMATION_SCHEMA can crash your database. For example, plague in making changes to many tables at once Baron writes:

“querying the INFORMATION_SCHEMA database on MySQL can completely lock a busy server for a long time. It can even crash it. It is very dangerous.”

Though Baron is telling the truth here, he left out one extremely important piece of information: you can actually figure out how dangerous your INFORMATION_SCHEMA query will be, ahead of time, using EXPLAIN.


In MySQL 5.1.21 and higher, not only were optimizations made to the INFORMATION_SCHEMA, but new values were added so that EXPLAIN had better visibility into what MySQL is actually doing. As per http://dev.mysql.com/doc/refman/5.1/en/information-schema-optimization.html there are 6 new “Extra” values for EXPLAIN that are used only for INFORMATION_SCHEMA queries.

The first 2 “Extra” values for EXPLAIN are mostly self-explanatory:
Scanned 1 database – Only one database directory needs to be scanned.
Scanned all databases – All database directories are scanned. This is more dangerous than only scanning one database.

Note that there is no middle ground — there is no optimization to only scan 2 databases; either all database directories are scanned, or only one is. If your query spans more than one database, then all database directories are scanned. Note that this

SHOW statements are less dangerous than using INFORMATION_SCHEMA because they only use one database at a time. If you have an INFORMATION_SCHEMA query that produces an “Extra” value of “Scanned 1 database”, it is just as safe as a SHOW statement.

The optimizations went even further, though. From the most “dangerous” — ie, resource intensive — to the least, here are the other 4 “Extra” values introduced in MySQL 5.1.21 (which, for the record, came out in August 2007, so it is a feature that has been around for 2.5 years at this point):

Open_full_table
Open_trigger_only
Open_frm_only
Skip_open_table

A bit more explanation, and some examples:

Open_full_table – Needs to open all the metadata, including the tables format file (.frm) and data/index files such as .MYD and .MYI. The previously linked to manual page about the optimization includes which information will show each “Extra” type — for example, the AUTO_INCREMENT and DATA_LENGTH fields of the TABLES table require opening all the metadata.

mysql> EXPLAIN SELECT TABLE_SCHEMA,TABLE_NAME,AUTO_INCREMENT FROM TABLESG
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: TABLES
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: NULL
Extra: Open_full_table; Scanned all databases
1 row in set (0.00 sec)

Let’s see an example that only scans 1 database:

mysql> EXPLAIN TABLE_NAME,AUTO_INCREMENT FROM TABLES WHERE TABLE_SCHEMA='test'G
ERROR 1109 (42S02): Unknown table 'TABLE_NAME' in information_schema
mysql> EXPLAIN SELECT TABLE_NAME,AUTO_INCREMENT FROM TABLES WHERE TABLE_SCHEMA='test'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: TABLES
type: ALL
possible_keys: NULL
key: TABLE_SCHEMA
key_len: NULL
ref: NULL
rows: NULL
Extra: Using where; Open_full_table; Scanned 1 database
1 row in set (0.00 sec)

Note that “Scanned all databases” will apply if there is any way there could be more than one database. For example, on my test server, only the ‘test’ and ’sakila’ databases exist (other than ‘mysql’ and ‘INFORMATION_SCHEMA’ of course) and yet when I do

EXPLAIN SELECT TABLE_NAME,AUTO_INCREMENT FROM TABLES WHERE TABLE_SCHEMA LIKE 'test%'G

I still get “Scanned all databases”. So be careful.

One of the basic pieces of advice I see to optimize queries can be applied to queries on the INFORMATION_SCHEMA — Do not use SELECT * unless you actually want to get every single piece of information. In the case of INFORMATION_SCHEMA, optimizing your queries can mean the difference between the server crashing and the server staying up.

Open_trigger_only – Only the .TRG file needs to be opened. Interestingly enough, this does not seem to have an example that applies. The manual page says that the TRIGGERS table uses Open_full_table for all fields. When I tested it, though, I did not get anything in the “Extra” field at all — not “Open_trigger_only” and not even “Open_full_table”:

mysql> select @@version;
+---------------------+
| @@version           |
+---------------------+
| 5.1.37-1ubuntu5-log |
+---------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM TRIGGERSG
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: TRIGGERS
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: NULL
Extra:
1 row in set (0.00 sec)

Open_frm_only – Only the format file (.frm) of the table needs to be open. Again, check the manual page for the fields that can use this optimization — fields such as CREATE_OPTIONS and ENGINE in the TABLES table do, for example.

Skip_open_table – This is the last new “Extra” feature, and it is the best. This optimization type means that no files need to be opened. The database directories are scanned and information can be obtained — mostly the table name, so when querying only the TABLE_NAME and TABLE_SCHEMA fields from the TABLES table, your query is safe.

So instead of putting your head in the sand and never using the great tool that is the INFORMATION_SCHEMA, first EXPLAIN your query to see if it will work or not.

(Note, if you are still on MySQL 5.0, what are you waiting for? The upgrade to MySQL 5.1 is relatively painless, and Pythian has a comprehensive checklist for how to upgrade while keeping your sanity).

Just the facts:
What: MySQL user community dinner
Who: me, prescription and many MySQL community members
When: Monday, there April 12th – Meet at 6:30 at the Hyatt Santa Clara or at 7 pm at the restaurant
Where: Pedro’s Restaurant and Cantina – 3935 Freedom Circle, Santa Clara, CA 95054
How: Comment on this blog post to add your name to the list of probable attendees

I was sad that last year there was no community dinner, and I missed the one the year before when Jonathan Schwartz and Rich Green made an appearance. This year I am determined not to miss it, and so I am calling for a community (pay-your-own-way) dinner on Monday, April 12th, at Pedro’s – a Mexican restaurant that has vegetarian and vegan options. I think Monday is a better time because many folks arrive Sunday evening, or even Monday morning (there are tutorials on Monday, but not everyone attends).

Pedro’s can handle large groups of people, but we would like to have a vague idea of how many people are attending — while you are not required to RSVP, we would like to make an accurate reservation at Pedro’s….In 2008, there was a wiki page with a list of attendees, and I was disappointed because there were so many people on that list I wanted to see.

Meet us at 6:30 pm on Monday in the lobby of the Hyatt Santa Clara, or at 7 pm at Pedro’s. If you want to come later, just show up at Pedro’s whenever you can.

Since commenting on this blog does not require registration (as the wiki does), I invite folks to comment on this blog post and I’ll add you to the list of attendees:

Sheeri K. Cabral (The Pythian Group)
Paul Vallee (The Pythian Group)
Rob Hamel (The Pythian Group)
Giuseppe Maxia (Sun)
Brian Aker (Drizzle)
Konstantin Osipov (Sun)
Mark Callaghan (Facebook) (will arrive later)
Wagner Bianchi (EAC Software, Brazil)
Bill Karwin (Karwin Software Solutions)
Maxim Volkov (OpenCandy)
Brian Moon (DealNews) – note: Monday Apr 12th is Brian’s birthday!
Rob Peck (DealNews)
Arjen Lentz (OpenQuery)
Vadim Tkachenko (Percona)
Rohit Nadhani (WebYog)

Applying binary logs to a MySQL instance is not particularly difficult, ophthalmologist using the mysqlbinlog command line utility:

$> mysqlbinlog mysql-bin.000003 > 03.sql
$> mysql < 03.sql

Turning off binary logging for a session is not difficult, page from the MySQL commandline, if you authenticate as a user with the SUPER privilege:

mysql> SET SESSION sql_log_bin=0;

However, sometimes you want to apply binary logs to a MySQL instance, without having those changes applied to the binary logs themselves. One option is to restart the server binary logging disabled, and after the load is finished, restart the server with binary logging re-enabled. This is not always possible nor desirable, so there’s a better way, that works in at least versions 4.1 and up:

The mysqlbinlog utility has the --disable-log-bin option. All the option does is add the SET SESSION sql_log_bin=0; statement to the beginning of the output, but it is certainly much better than restarting the server twice!

Here’s the manual page for the --disable-log-bin option of mysqlbinlog: http://dev.mysql.com/doc/refman/5.1/en/mysqlbinlog.html#option_mysqlbinlog_disable-log-bin

When I heard that MySQL 5.6 was implementing EXPLAIN for writes (INSERT, buy information pills no rx UPDATE, pilule DELETE,REPLACE), I was pretty excited. Then I heard that MySQL 5.6 also was implementing a JSON format for EXPLAIN and my thought was “I do not care about that!”

Boy, was I wrong. The JSON format does not just put the output into JSON format, it also gives extra information that’s actually pretty useful! It can tell you when you are doing an implicit cast, which parts of a composite index are being used, and when index condition pushdown are being used. None of these are shown in regular EXPLAIN (which seems odd, why could they extend the JSON format but not put the information into the regular EXPLAIN format?), so using the JSON format is actually a good idea even if you do not care about what format your output is in.

As a note, MySQL Workbench’s Visual Explain (go to Query->Visual Explain Current Statement) also gives this information.

attached_condition and implicit casts

In a talk about EXPLAIN I do, I use the Sakila sample database. Here is an example of a “bad” query:

mysql> EXPLAIN SELECT rental_id FROM rental WHERE DATE(rental_date) = '2006-02-14'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: index
possible_keys: NULL
key: rental_date
key_len: 10
ref: NULL
rows: 16005
Extra: Using where; Using index
1 row in set (0.00 sec)

This query is “bad” because it is doing a full index scan (type: index) instead of doing a range scan for just the range of dates we want (should be type: range). Ironically, the EXPLAIN does not actually explain why.

However, the JSON format does explain why:
mysql> EXPLAIN FORMAT=JSON SELECT rental_id FROM rental WHERE DATE(rental_date) = '2006-02-14'G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"table": {
"table_name": "rental",
"access_type": "index",
"key": "rental_date",
"used_key_parts": [
"rental_date",
"inventory_id",
"customer_id"
],
"key_length": "10",
"rows": 16005,
"filtered": 100,
"using_index": true,
"attached_condition": "(cast(`sakila`.`rental`.`rental_date` as date) = '2006-02-14')"
}
}
}

Note that the attached_condition shows the implicit cast. This is MUCH more friendly to a developer or administrator who is trying to figure out why MySQL is not doing what they want it to do!

The visual EXPLAIN from MySQL Workbench also shows the implicit cast:

Workbench showing attached_condition and implicit CAST

You may also notice it shows the “filtered” attribute, which is not in regular EXPLAIN but is part of EXPLAIN EXTENDED – “filtered” is the percentage of “rows” that are estimated to be returned. A higher number here is better, if it is low it means that you are examining a lot of rows that you do not return.

used_key_parts

You may have noticed above that there is a used_key_parts array that does not show up in the traditional EXPLAIN. In a traditional EXPLAIN (or EXPLAIN EXTENDED), you do get to see the index length with the key_len field, so you can guess that only part of a composite index is used. Both the previous query and the following query use this index:

UNIQUE KEY `rental_date` (`rental_date`,`inventory_id`,`customer_id`)

Here is the traditional EXPLAIN – note that it shows the rental_date index is used, and the key_len is 5, which infers that only the first field fo the index, rental_date is being used, not the other 2 id fields. But you have to deduce that for yourself:

mysql> EXPLAIN EXTENDED SELECT rental_id FROM rental WHERE rental_date BETWEEN '2006-02-14 00:00:00' and '2006-02-14 23:59:59'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: range
possible_keys: rental_date
key: rental_date
key_len: 5
ref: NULL
rows: 181
filtered: 100.00
Extra: Using where; Using index

Here is the JSON format, which shows the used_key_parts field, which reveals very clearly that only the first field of the index is used:

mysql> EXPLAIN FORMAT=JSON SELECT rental_id FROM rental WHERE rental_date BETWEEN '2006-02-14 00:00:00' and '2006-02-14 23:59:59'G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"table": {
"table_name": "rental",
"access_type": "range",
"possible_keys": [
"rental_date"
],
"key": "rental_date",
"used_key_parts": [
"rental_date"
],
"key_length": "5",
"rows": 181,
"filtered": 100,
"using_index": true,
"attached_condition": "(`sakila`.`rental`.`rental_date` between '2006-02-14 00:00:00' and '2006-02-14 23:59:59')"
}
}
}

And here is the MySQL Workbench Visual EXPLAIN that shows the used_key_parts clearly:

Workbench showing used_key_parts and BETWEEN

I am glad I took a second look at EXPLAIN FORMAT=JSON – the new features are awesome! My only complaint is that I think they should be added to either EXPLAIN or EXPLAIN EXTENDED. I also hope that tools like pt-query-digest will be updated to use the extra information.

When I heard that MySQL 5.6 was implementing EXPLAIN for writes (INSERT, buy information pills no rx UPDATE, pilule DELETE,REPLACE), I was pretty excited. Then I heard that MySQL 5.6 also was implementing a JSON format for EXPLAIN and my thought was “I do not care about that!”

Boy, was I wrong. The JSON format does not just put the output into JSON format, it also gives extra information that’s actually pretty useful! It can tell you when you are doing an implicit cast, which parts of a composite index are being used, and when index condition pushdown are being used. None of these are shown in regular EXPLAIN (which seems odd, why could they extend the JSON format but not put the information into the regular EXPLAIN format?), so using the JSON format is actually a good idea even if you do not care about what format your output is in.

As a note, MySQL Workbench’s Visual Explain (go to Query->Visual Explain Current Statement) also gives this information.

attached_condition and implicit casts

In a talk about EXPLAIN I do, I use the Sakila sample database. Here is an example of a “bad” query:

mysql> EXPLAIN SELECT rental_id FROM rental WHERE DATE(rental_date) = '2006-02-14'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: index
possible_keys: NULL
key: rental_date
key_len: 10
ref: NULL
rows: 16005
Extra: Using where; Using index
1 row in set (0.00 sec)

This query is “bad” because it is doing a full index scan (type: index) instead of doing a range scan for just the range of dates we want (should be type: range). Ironically, the EXPLAIN does not actually explain why.

However, the JSON format does explain why:
mysql> EXPLAIN FORMAT=JSON SELECT rental_id FROM rental WHERE DATE(rental_date) = '2006-02-14'G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"table": {
"table_name": "rental",
"access_type": "index",
"key": "rental_date",
"used_key_parts": [
"rental_date",
"inventory_id",
"customer_id"
],
"key_length": "10",
"rows": 16005,
"filtered": 100,
"using_index": true,
"attached_condition": "(cast(`sakila`.`rental`.`rental_date` as date) = '2006-02-14')"
}
}
}

Note that the attached_condition shows the implicit cast. This is MUCH more friendly to a developer or administrator who is trying to figure out why MySQL is not doing what they want it to do!

The visual EXPLAIN from MySQL Workbench also shows the implicit cast:

Workbench showing attached_condition and implicit CAST

You may also notice it shows the “filtered” attribute, which is not in regular EXPLAIN but is part of EXPLAIN EXTENDED – “filtered” is the percentage of “rows” that are estimated to be returned. A higher number here is better, if it is low it means that you are examining a lot of rows that you do not return.

used_key_parts

You may have noticed above that there is a used_key_parts array that does not show up in the traditional EXPLAIN. In a traditional EXPLAIN (or EXPLAIN EXTENDED), you do get to see the index length with the key_len field, so you can guess that only part of a composite index is used. Both the previous query and the following query use this index:

UNIQUE KEY `rental_date` (`rental_date`,`inventory_id`,`customer_id`)

Here is the traditional EXPLAIN – note that it shows the rental_date index is used, and the key_len is 5, which infers that only the first field fo the index, rental_date is being used, not the other 2 id fields. But you have to deduce that for yourself:

mysql> EXPLAIN EXTENDED SELECT rental_id FROM rental WHERE rental_date BETWEEN '2006-02-14 00:00:00' and '2006-02-14 23:59:59'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: range
possible_keys: rental_date
key: rental_date
key_len: 5
ref: NULL
rows: 181
filtered: 100.00
Extra: Using where; Using index

Here is the JSON format, which shows the used_key_parts field, which reveals very clearly that only the first field of the index is used:

mysql> EXPLAIN FORMAT=JSON SELECT rental_id FROM rental WHERE rental_date BETWEEN '2006-02-14 00:00:00' and '2006-02-14 23:59:59'G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"table": {
"table_name": "rental",
"access_type": "range",
"possible_keys": [
"rental_date"
],
"key": "rental_date",
"used_key_parts": [
"rental_date"
],
"key_length": "5",
"rows": 181,
"filtered": 100,
"using_index": true,
"attached_condition": "(`sakila`.`rental`.`rental_date` between '2006-02-14 00:00:00' and '2006-02-14 23:59:59')"
}
}
}

And here is the MySQL Workbench Visual EXPLAIN that shows the used_key_parts clearly:

Workbench showing used_key_parts and BETWEEN

I am glad I took a second look at EXPLAIN FORMAT=JSON – the new features are awesome! My only complaint is that I think they should be added to either EXPLAIN or EXPLAIN EXTENDED. I also hope that tools like pt-query-digest will be updated to use the extra information.

Baron makes an excellent point in rate is [often] the important thing to look at.

This is something that, order takes this into account, this and the default configuration includes looking at both ratios and rates.

If I told you that your database had a ratio of temporary tables written to disk of 20%, visit web you might think “aha, my database is slow because of a lot of file I/O caused by writing temporary tables to disk!”. However, that 20% ratio may actually mean a rate of 2 per hour — which is most likely not causing excessive I/O.

To get a sense of this concept, and also how mysqltuner works, I will show the lines from the mysqltuner default configuration that deal with temporary tables written to disk. The format is that the fields are separated by three pipes (|||), and the fields are:

label
threshold check
formula
recommendation if “threshold check” is met

Here is the line from the default configuration file that calculates the rate of temporary tables written to disk:

% temp disk tables|||>25|||Created_tmp_disk_tables / (Created_tmp_tables + Created_tmp_disk_tables) * 100|||Too many temporary tables are being written to disk.  Increase max_heap_table_size and tmp_table_size.

mysqltuner will parse that as:

if
the value of Created_tmp_disk_tables/(Created_tmp_tables + Created_tmp_disk_tables)*100
>25
then print out the last field.

So that means that a ratio of 25% or more is the threshold. But we found that many clients have a ratio much less than 25%, but still had excessive temporary tables written to disk. So the default configuration also contains a rate calculation of temporary tables written to disk:

temp disk rate|||=~ /second|minute/|||&hr_bytime(Created_tmp_disk_tables/Uptime)|||Too many temporary tables are being written to disk.  Increase max_heap_table_size and tmp_table_size.

mysqltuner will parse that as:

if
the value of &hr_bytime(Created_tmp_disk_tables/Uptime)
matches “second” or “minute”
then print out the last field.

The hr_bytime() function in mysqltuner takes a number that is a per-second rate and makes it “human readable” (hence “hr”) by returning the order of magnitude at which the value is >1. For example:

hr_bytime(2) returns “2.0 per second”
hr_bytime(0.2) returns “12.0 per minute”
hr_bytime(0.02) returns “1.2 per minute”
hr_bytime(0.002) returns “7.2 per hour”
hr_bytime(0.0002) returns “17.28 per day”

Certainly, 0.02 looks small, but “12 per minute” is a better metric for a DBA to understand the problem.

Because the configuration file for mysqltuner 2.0 contains the threshold and check, it is fairly simple to change what the threshold is, and to check both rates and ratios. mysqltuner also allows you to output in different formats (currently there’s “pretty” and “csv”, but it’s easy to add a perl subroutine to do something different with the output), which makes it ideal for doing regular tuning checks for what is most important for you.

Pythian uses it on one client to provide weekly reports, which we add to a spreadsheet so that differences are easy to see. (yes, output directly to a database is on the “features we want to add” — mysqltuner is just a perl script, so if anyone in the community wants to add it, they can create a branch and request the feature to be added into the main trunk…it is all on launchpad, at https://launchpad.net/mysqltuner, so community contributions are recommended and encouraged.)

When I heard that MySQL 5.6 was implementing EXPLAIN for writes (INSERT, buy information pills no rx UPDATE, pilule DELETE,REPLACE), I was pretty excited. Then I heard that MySQL 5.6 also was implementing a JSON format for EXPLAIN and my thought was “I do not care about that!”

Boy, was I wrong. The JSON format does not just put the output into JSON format, it also gives extra information that’s actually pretty useful! It can tell you when you are doing an implicit cast, which parts of a composite index are being used, and when index condition pushdown are being used. None of these are shown in regular EXPLAIN (which seems odd, why could they extend the JSON format but not put the information into the regular EXPLAIN format?), so using the JSON format is actually a good idea even if you do not care about what format your output is in.

As a note, MySQL Workbench’s Visual Explain (go to Query->Visual Explain Current Statement) also gives this information.

attached_condition and implicit casts

In a talk about EXPLAIN I do, I use the Sakila sample database. Here is an example of a “bad” query:

mysql> EXPLAIN SELECT rental_id FROM rental WHERE DATE(rental_date) = '2006-02-14'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: index
possible_keys: NULL
key: rental_date
key_len: 10
ref: NULL
rows: 16005
Extra: Using where; Using index
1 row in set (0.00 sec)

This query is “bad” because it is doing a full index scan (type: index) instead of doing a range scan for just the range of dates we want (should be type: range). Ironically, the EXPLAIN does not actually explain why.

However, the JSON format does explain why:
mysql> EXPLAIN FORMAT=JSON SELECT rental_id FROM rental WHERE DATE(rental_date) = '2006-02-14'G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"table": {
"table_name": "rental",
"access_type": "index",
"key": "rental_date",
"used_key_parts": [
"rental_date",
"inventory_id",
"customer_id"
],
"key_length": "10",
"rows": 16005,
"filtered": 100,
"using_index": true,
"attached_condition": "(cast(`sakila`.`rental`.`rental_date` as date) = '2006-02-14')"
}
}
}

Note that the attached_condition shows the implicit cast. This is MUCH more friendly to a developer or administrator who is trying to figure out why MySQL is not doing what they want it to do!

The visual EXPLAIN from MySQL Workbench also shows the implicit cast:

Workbench showing attached_condition and implicit CAST

You may also notice it shows the “filtered” attribute, which is not in regular EXPLAIN but is part of EXPLAIN EXTENDED – “filtered” is the percentage of “rows” that are estimated to be returned. A higher number here is better, if it is low it means that you are examining a lot of rows that you do not return.

used_key_parts

You may have noticed above that there is a used_key_parts array that does not show up in the traditional EXPLAIN. In a traditional EXPLAIN (or EXPLAIN EXTENDED), you do get to see the index length with the key_len field, so you can guess that only part of a composite index is used. Both the previous query and the following query use this index:

UNIQUE KEY `rental_date` (`rental_date`,`inventory_id`,`customer_id`)

Here is the traditional EXPLAIN – note that it shows the rental_date index is used, and the key_len is 5, which infers that only the first field fo the index, rental_date is being used, not the other 2 id fields. But you have to deduce that for yourself:

mysql> EXPLAIN EXTENDED SELECT rental_id FROM rental WHERE rental_date BETWEEN '2006-02-14 00:00:00' and '2006-02-14 23:59:59'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: range
possible_keys: rental_date
key: rental_date
key_len: 5
ref: NULL
rows: 181
filtered: 100.00
Extra: Using where; Using index

Here is the JSON format, which shows the used_key_parts field, which reveals very clearly that only the first field of the index is used:

mysql> EXPLAIN FORMAT=JSON SELECT rental_id FROM rental WHERE rental_date BETWEEN '2006-02-14 00:00:00' and '2006-02-14 23:59:59'G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"table": {
"table_name": "rental",
"access_type": "range",
"possible_keys": [
"rental_date"
],
"key": "rental_date",
"used_key_parts": [
"rental_date"
],
"key_length": "5",
"rows": 181,
"filtered": 100,
"using_index": true,
"attached_condition": "(`sakila`.`rental`.`rental_date` between '2006-02-14 00:00:00' and '2006-02-14 23:59:59')"
}
}
}

And here is the MySQL Workbench Visual EXPLAIN that shows the used_key_parts clearly:

Workbench showing used_key_parts and BETWEEN

I am glad I took a second look at EXPLAIN FORMAT=JSON – the new features are awesome! My only complaint is that I think they should be added to either EXPLAIN or EXPLAIN EXTENDED. I also hope that tools like pt-query-digest will be updated to use the extra information.

Baron makes an excellent point in rate is [often] the important thing to look at.

This is something that, order takes this into account, this and the default configuration includes looking at both ratios and rates.

If I told you that your database had a ratio of temporary tables written to disk of 20%, visit web you might think “aha, my database is slow because of a lot of file I/O caused by writing temporary tables to disk!”. However, that 20% ratio may actually mean a rate of 2 per hour — which is most likely not causing excessive I/O.

To get a sense of this concept, and also how mysqltuner works, I will show the lines from the mysqltuner default configuration that deal with temporary tables written to disk. The format is that the fields are separated by three pipes (|||), and the fields are:

label
threshold check
formula
recommendation if “threshold check” is met

Here is the line from the default configuration file that calculates the rate of temporary tables written to disk:

% temp disk tables|||>25|||Created_tmp_disk_tables / (Created_tmp_tables + Created_tmp_disk_tables) * 100|||Too many temporary tables are being written to disk.  Increase max_heap_table_size and tmp_table_size.

mysqltuner will parse that as:

if
the value of Created_tmp_disk_tables/(Created_tmp_tables + Created_tmp_disk_tables)*100
>25
then print out the last field.

So that means that a ratio of 25% or more is the threshold. But we found that many clients have a ratio much less than 25%, but still had excessive temporary tables written to disk. So the default configuration also contains a rate calculation of temporary tables written to disk:

temp disk rate|||=~ /second|minute/|||&hr_bytime(Created_tmp_disk_tables/Uptime)|||Too many temporary tables are being written to disk.  Increase max_heap_table_size and tmp_table_size.

mysqltuner will parse that as:

if
the value of &hr_bytime(Created_tmp_disk_tables/Uptime)
matches “second” or “minute”
then print out the last field.

The hr_bytime() function in mysqltuner takes a number that is a per-second rate and makes it “human readable” (hence “hr”) by returning the order of magnitude at which the value is >1. For example:

hr_bytime(2) returns “2.0 per second”
hr_bytime(0.2) returns “12.0 per minute”
hr_bytime(0.02) returns “1.2 per minute”
hr_bytime(0.002) returns “7.2 per hour”
hr_bytime(0.0002) returns “17.28 per day”

Certainly, 0.02 looks small, but “12 per minute” is a better metric for a DBA to understand the problem.

Because the configuration file for mysqltuner 2.0 contains the threshold and check, it is fairly simple to change what the threshold is, and to check both rates and ratios. mysqltuner also allows you to output in different formats (currently there’s “pretty” and “csv”, but it’s easy to add a perl subroutine to do something different with the output), which makes it ideal for doing regular tuning checks for what is most important for you.

Pythian uses it on one client to provide weekly reports, which we add to a spreadsheet so that differences are easy to see. (yes, output directly to a database is on the “features we want to add” — mysqltuner is just a perl script, so if anyone in the community wants to add it, they can create a branch and request the feature to be added into the main trunk…it is all on launchpad, at https://launchpad.net/mysqltuner, so community contributions are recommended and encouraged.)

If you do not know what International Women’s Day is: http://www.internationalwomensday.com/

Start planning your blog posts for Ada Lovelace day now (March 24th, infertility podcasting, more about comic drawing etc.!) to draw attention to the achievements of women in technology and science.)

To that end, I would like to point out all the women currently in science and tech fields that I admire and think are doing great things. I think it would be great if everyone, male or female, made a list like this:

The women that have taught me science/tech along the way:

High School:
Mary Lou Ciavarra (Physics)
Maria Petretti (Pre-Algebra, and Academic Decathlon)
Reneé Fishman (Biology)
Lisa Acquaire (Economics during Academic Decathlon)

College:
Professor Kalpana White (Biology), and in whose fruit fly lab I worked for 2 semesters.
Professor Eve Marder (Introductory Neuroscience)

Though Brandeis does have female faculty in the Computer Science department, I did not manage to have any classes with female Computer Science faculty members.

My current female DBA co-workers at Pythian: Isabel Pinarci (Oracle), Michelle Gutzait (SQL Server), Catherine Chow (Oracle) and Jasmine Wen (Oracle).

And to folks in the greater MySQL/tech community and tech co-workers past and present, especially those I have been inspired and helped by: Tracy Gangwer, Selena Deckelmann (Postgres), Amy Rich, Anne Cross, and more (If I have forgotten you, I apologize!).

When I heard that MySQL 5.6 was implementing EXPLAIN for writes (INSERT, buy information pills no rx UPDATE, pilule DELETE,REPLACE), I was pretty excited. Then I heard that MySQL 5.6 also was implementing a JSON format for EXPLAIN and my thought was “I do not care about that!”

Boy, was I wrong. The JSON format does not just put the output into JSON format, it also gives extra information that’s actually pretty useful! It can tell you when you are doing an implicit cast, which parts of a composite index are being used, and when index condition pushdown are being used. None of these are shown in regular EXPLAIN (which seems odd, why could they extend the JSON format but not put the information into the regular EXPLAIN format?), so using the JSON format is actually a good idea even if you do not care about what format your output is in.

As a note, MySQL Workbench’s Visual Explain (go to Query->Visual Explain Current Statement) also gives this information.

attached_condition and implicit casts

In a talk about EXPLAIN I do, I use the Sakila sample database. Here is an example of a “bad” query:

mysql> EXPLAIN SELECT rental_id FROM rental WHERE DATE(rental_date) = '2006-02-14'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: index
possible_keys: NULL
key: rental_date
key_len: 10
ref: NULL
rows: 16005
Extra: Using where; Using index
1 row in set (0.00 sec)

This query is “bad” because it is doing a full index scan (type: index) instead of doing a range scan for just the range of dates we want (should be type: range). Ironically, the EXPLAIN does not actually explain why.

However, the JSON format does explain why:
mysql> EXPLAIN FORMAT=JSON SELECT rental_id FROM rental WHERE DATE(rental_date) = '2006-02-14'G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"table": {
"table_name": "rental",
"access_type": "index",
"key": "rental_date",
"used_key_parts": [
"rental_date",
"inventory_id",
"customer_id"
],
"key_length": "10",
"rows": 16005,
"filtered": 100,
"using_index": true,
"attached_condition": "(cast(`sakila`.`rental`.`rental_date` as date) = '2006-02-14')"
}
}
}

Note that the attached_condition shows the implicit cast. This is MUCH more friendly to a developer or administrator who is trying to figure out why MySQL is not doing what they want it to do!

The visual EXPLAIN from MySQL Workbench also shows the implicit cast:

Workbench showing attached_condition and implicit CAST

You may also notice it shows the “filtered” attribute, which is not in regular EXPLAIN but is part of EXPLAIN EXTENDED – “filtered” is the percentage of “rows” that are estimated to be returned. A higher number here is better, if it is low it means that you are examining a lot of rows that you do not return.

used_key_parts

You may have noticed above that there is a used_key_parts array that does not show up in the traditional EXPLAIN. In a traditional EXPLAIN (or EXPLAIN EXTENDED), you do get to see the index length with the key_len field, so you can guess that only part of a composite index is used. Both the previous query and the following query use this index:

UNIQUE KEY `rental_date` (`rental_date`,`inventory_id`,`customer_id`)

Here is the traditional EXPLAIN – note that it shows the rental_date index is used, and the key_len is 5, which infers that only the first field fo the index, rental_date is being used, not the other 2 id fields. But you have to deduce that for yourself:

mysql> EXPLAIN EXTENDED SELECT rental_id FROM rental WHERE rental_date BETWEEN '2006-02-14 00:00:00' and '2006-02-14 23:59:59'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: range
possible_keys: rental_date
key: rental_date
key_len: 5
ref: NULL
rows: 181
filtered: 100.00
Extra: Using where; Using index

Here is the JSON format, which shows the used_key_parts field, which reveals very clearly that only the first field of the index is used:

mysql> EXPLAIN FORMAT=JSON SELECT rental_id FROM rental WHERE rental_date BETWEEN '2006-02-14 00:00:00' and '2006-02-14 23:59:59'G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"table": {
"table_name": "rental",
"access_type": "range",
"possible_keys": [
"rental_date"
],
"key": "rental_date",
"used_key_parts": [
"rental_date"
],
"key_length": "5",
"rows": 181,
"filtered": 100,
"using_index": true,
"attached_condition": "(`sakila`.`rental`.`rental_date` between '2006-02-14 00:00:00' and '2006-02-14 23:59:59')"
}
}
}

And here is the MySQL Workbench Visual EXPLAIN that shows the used_key_parts clearly:

Workbench showing used_key_parts and BETWEEN

I am glad I took a second look at EXPLAIN FORMAT=JSON – the new features are awesome! My only complaint is that I think they should be added to either EXPLAIN or EXPLAIN EXTENDED. I also hope that tools like pt-query-digest will be updated to use the extra information.

Baron makes an excellent point in rate is [often] the important thing to look at.

This is something that, order takes this into account, this and the default configuration includes looking at both ratios and rates.

If I told you that your database had a ratio of temporary tables written to disk of 20%, visit web you might think “aha, my database is slow because of a lot of file I/O caused by writing temporary tables to disk!”. However, that 20% ratio may actually mean a rate of 2 per hour — which is most likely not causing excessive I/O.

To get a sense of this concept, and also how mysqltuner works, I will show the lines from the mysqltuner default configuration that deal with temporary tables written to disk. The format is that the fields are separated by three pipes (|||), and the fields are:

label
threshold check
formula
recommendation if “threshold check” is met

Here is the line from the default configuration file that calculates the rate of temporary tables written to disk:

% temp disk tables|||>25|||Created_tmp_disk_tables / (Created_tmp_tables + Created_tmp_disk_tables) * 100|||Too many temporary tables are being written to disk.  Increase max_heap_table_size and tmp_table_size.

mysqltuner will parse that as:

if
the value of Created_tmp_disk_tables/(Created_tmp_tables + Created_tmp_disk_tables)*100
>25
then print out the last field.

So that means that a ratio of 25% or more is the threshold. But we found that many clients have a ratio much less than 25%, but still had excessive temporary tables written to disk. So the default configuration also contains a rate calculation of temporary tables written to disk:

temp disk rate|||=~ /second|minute/|||&hr_bytime(Created_tmp_disk_tables/Uptime)|||Too many temporary tables are being written to disk.  Increase max_heap_table_size and tmp_table_size.

mysqltuner will parse that as:

if
the value of &hr_bytime(Created_tmp_disk_tables/Uptime)
matches “second” or “minute”
then print out the last field.

The hr_bytime() function in mysqltuner takes a number that is a per-second rate and makes it “human readable” (hence “hr”) by returning the order of magnitude at which the value is >1. For example:

hr_bytime(2) returns “2.0 per second”
hr_bytime(0.2) returns “12.0 per minute”
hr_bytime(0.02) returns “1.2 per minute”
hr_bytime(0.002) returns “7.2 per hour”
hr_bytime(0.0002) returns “17.28 per day”

Certainly, 0.02 looks small, but “12 per minute” is a better metric for a DBA to understand the problem.

Because the configuration file for mysqltuner 2.0 contains the threshold and check, it is fairly simple to change what the threshold is, and to check both rates and ratios. mysqltuner also allows you to output in different formats (currently there’s “pretty” and “csv”, but it’s easy to add a perl subroutine to do something different with the output), which makes it ideal for doing regular tuning checks for what is most important for you.

Pythian uses it on one client to provide weekly reports, which we add to a spreadsheet so that differences are easy to see. (yes, output directly to a database is on the “features we want to add” — mysqltuner is just a perl script, so if anyone in the community wants to add it, they can create a branch and request the feature to be added into the main trunk…it is all on launchpad, at https://launchpad.net/mysqltuner, so community contributions are recommended and encouraged.)

If you do not know what International Women’s Day is: http://www.internationalwomensday.com/

Start planning your blog posts for Ada Lovelace day now (March 24th, infertility podcasting, more about comic drawing etc.!) to draw attention to the achievements of women in technology and science.)

To that end, I would like to point out all the women currently in science and tech fields that I admire and think are doing great things. I think it would be great if everyone, male or female, made a list like this:

The women that have taught me science/tech along the way:

High School:
Mary Lou Ciavarra (Physics)
Maria Petretti (Pre-Algebra, and Academic Decathlon)
Reneé Fishman (Biology)
Lisa Acquaire (Economics during Academic Decathlon)

College:
Professor Kalpana White (Biology), and in whose fruit fly lab I worked for 2 semesters.
Professor Eve Marder (Introductory Neuroscience)

Though Brandeis does have female faculty in the Computer Science department, I did not manage to have any classes with female Computer Science faculty members.

My current female DBA co-workers at Pythian: Isabel Pinarci (Oracle), Michelle Gutzait (SQL Server), Catherine Chow (Oracle) and Jasmine Wen (Oracle).

And to folks in the greater MySQL/tech community and tech co-workers past and present, especially those I have been inspired and helped by: Tracy Gangwer, Selena Deckelmann (Postgres), Amy Rich, Anne Cross, and more (If I have forgotten you, I apologize!).

Most of this stuff is not PHP specific, weight loss and Python or Ruby or Java or .NET developers can use the tools in this talk.

The session on .

Slides are at http://talks.php.net/show/confoo10

“My name is Rasmus, I’ve been around for a long time. I’ve been doing this web stuff since 1992/1993.”

“Generally performance is not a PHP problem.” Webservers not config’d, no expire headers on images, no favicon.

Tools: Firefox/Firebug extension called YSlow (developed by yahoo) gives you a grade on your site.

Google has developed the Firefox/Firebug pagespeed tool.

Today Rasmus will pick on wordpress. He checks out the code, then uses Siege to do a baseline benchmark — see the slide for the results.

Before you do anything else install an opcode cache like APC. WordPress really likes this type of caching, see this slide for the results. Set the timezone, to make sure conversions aren’t being done all the time.

Make sure you are cpu-bound, NOT I/O bound. Otherwise, speed up the I/O.

Then strace your webserver processs. There are common config issues that you can spot in your strace code. grep for ENOENT which shows you “No such file or directory” errors.

AllowOverride None to turn off .htaccess for every directory, just read settings once from your config file….(unless you’re an ISP).

Make sure DirectoryIndex is set appropriately, watch your include_path. All this low-hanging fruit has examples on the common config issues slide.

Install pecl/inclued and generate a graph – here is the graph image (I have linked it because you really want to zoom in to the graph…)

In strace output check the open() calls. Conditional includes, function calls that include files, etc. need runtime context before knowing what to open. In the example, every request checks to see if we have the config file, once we have config’d we can get rid of that stuff. Get rid of all the conditionals and hard-code “include wp-config.php”. Examples are on the slide.

His tips to change:
Conditional config include in wp-load.php (as just mentioned)
Conditional did-header check in wp-blog-header.php
Don’t call require_wp_db() from wp-settings.php
Remove conditional require logic from wp_start_object_cache

Then check strace again, now all Rasmus sees is theming and translations, which he decided to keep, because that’s the good benefit of WordPress – Performance is all about costs vs. flexibility. You don’t want to get rid of all of your flexibility, but you want to be fast.

Set error_reporting(-1) in wp-settings.php to catch all warnings — warnings slow you down, so get rid of all errors. PHP error handling is very slow, so getting rid of errors will make you faster.

The slide of warnings that wordpress throws.

Look at all C-level calls made, using callgrind, which sits under valgrind, a CPU emulator used for debugging. See the image of what callgrind shows.

Now dive into the PHP executor, by installing XDebug.

Check xhprofFacebook open sourced this about a year ago, it’s a PECL extension. The output is pretty cool, try it on your own site, Rasmus does show you how to use it. It shows you functions sorted by the most expensive to the least expensive.

For example, use $_SERVER[REQUEST_TIME] instead of time(). Use pconnect() if MySQL can handle the amount of webserver connections that will be persistent, etc.

After you have changed a lot of the stuff above, benchmark again with siege to see how much faster you are. In this case there is not much gained so far.

So keep going….the blogroll is very slow — Rasmus gets rid of it by commenting out in the sidebar.php file. I’d like to see something to make it “semi-dynamic” — that is, make it a static file that can be re-generated, since you might want the blogroll but links are not changed every second…..

At this point we’re out of low-hanging fruit.

HipHop is a PHP to C++ converter & compiler, including a threaded, event-driven server that replaces apache. Rasmus’ slide says “WordPress is well-suited for HipHop because it doesn’t have a lot of dynamic runtime code. This is using the standard WordPress-svn checkout with a few tweaks.”

Then, of course, benchmark again.

The first time you compile WordPress with HipHop, you give it a list of files to add to the binary, it will complain about php code that generate file names, so you do have to fix that kind of stuff. There’s a huge mess of errors the first time you run it (”pages and pages”), and Rasmus had to patch HipHop (and WordPress) but the changes in HipHop have been put back into HipHop, so you should be good for the most part.

Check out the errors, lots of them show logical errors like $foo.”bar” instead of $foo.=”bar” and $foo=”bar” instead of $foo==”bar” in an if statement. Which of course is nice for your own code, to find those logical errors.

(WordPress takes in a $user_ID argument and immediately initializes a global $user_ID variable, which overwrites the argument passed in, so you can change the name of the argument passed in….)

You can also get rid of some code, things that check for existence of the same thing more than once. So it will take a bit of tweaking, but it’s worth it.

There are limitations to HipHop, for example:

  • It doesn’t support any of the new PHP 5.3 language features
  • Private properties don’t really exist under HipHop. They are treated as if they are protected instead.
  • You can’t unset variables. unset will clear the variable, but it will still be in the symbol table.
  • eval and create_function are limited
  • Variable variables $$var are not supported
  • Dynamic defines won’t work: define($name,$value)
  • get_loaded_extensions(), get_extension_funcs(), phpinfo(), debug_backtrace() don’t work
  • Conditional and dynamically created include filenames don’t work as you might expect
  • Default unix-domain socket filename isn’t set for MySQL so connecting to localhost doesn’t work

and HipHop does not support all extensions — see the list Rasmus has of extensions HipHop supports.

Then Rasmus showed an example using Twit (which he wrote) including the benchmarks. He shows that you can see what’s going on, like 5 MySQL calls on the home page and what happens when you don’t have a favicon.ico (in yellow).

In summary, “performance is all about architecture”, “know your costs”.

Be careful, because some tools (like valgrind and xdebug) you don’t want to put it on production systems, you could capture production traffic and replay it on a dev/testing box, but “you just have to minimize the differences and do your best”.

When I heard that MySQL 5.6 was implementing EXPLAIN for writes (INSERT, buy information pills no rx UPDATE, pilule DELETE,REPLACE), I was pretty excited. Then I heard that MySQL 5.6 also was implementing a JSON format for EXPLAIN and my thought was “I do not care about that!”

Boy, was I wrong. The JSON format does not just put the output into JSON format, it also gives extra information that’s actually pretty useful! It can tell you when you are doing an implicit cast, which parts of a composite index are being used, and when index condition pushdown are being used. None of these are shown in regular EXPLAIN (which seems odd, why could they extend the JSON format but not put the information into the regular EXPLAIN format?), so using the JSON format is actually a good idea even if you do not care about what format your output is in.

As a note, MySQL Workbench’s Visual Explain (go to Query->Visual Explain Current Statement) also gives this information.

attached_condition and implicit casts

In a talk about EXPLAIN I do, I use the Sakila sample database. Here is an example of a “bad” query:

mysql> EXPLAIN SELECT rental_id FROM rental WHERE DATE(rental_date) = '2006-02-14'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: index
possible_keys: NULL
key: rental_date
key_len: 10
ref: NULL
rows: 16005
Extra: Using where; Using index
1 row in set (0.00 sec)

This query is “bad” because it is doing a full index scan (type: index) instead of doing a range scan for just the range of dates we want (should be type: range). Ironically, the EXPLAIN does not actually explain why.

However, the JSON format does explain why:
mysql> EXPLAIN FORMAT=JSON SELECT rental_id FROM rental WHERE DATE(rental_date) = '2006-02-14'G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"table": {
"table_name": "rental",
"access_type": "index",
"key": "rental_date",
"used_key_parts": [
"rental_date",
"inventory_id",
"customer_id"
],
"key_length": "10",
"rows": 16005,
"filtered": 100,
"using_index": true,
"attached_condition": "(cast(`sakila`.`rental`.`rental_date` as date) = '2006-02-14')"
}
}
}

Note that the attached_condition shows the implicit cast. This is MUCH more friendly to a developer or administrator who is trying to figure out why MySQL is not doing what they want it to do!

The visual EXPLAIN from MySQL Workbench also shows the implicit cast:

Workbench showing attached_condition and implicit CAST

You may also notice it shows the “filtered” attribute, which is not in regular EXPLAIN but is part of EXPLAIN EXTENDED – “filtered” is the percentage of “rows” that are estimated to be returned. A higher number here is better, if it is low it means that you are examining a lot of rows that you do not return.

used_key_parts

You may have noticed above that there is a used_key_parts array that does not show up in the traditional EXPLAIN. In a traditional EXPLAIN (or EXPLAIN EXTENDED), you do get to see the index length with the key_len field, so you can guess that only part of a composite index is used. Both the previous query and the following query use this index:

UNIQUE KEY `rental_date` (`rental_date`,`inventory_id`,`customer_id`)

Here is the traditional EXPLAIN – note that it shows the rental_date index is used, and the key_len is 5, which infers that only the first field fo the index, rental_date is being used, not the other 2 id fields. But you have to deduce that for yourself:

mysql> EXPLAIN EXTENDED SELECT rental_id FROM rental WHERE rental_date BETWEEN '2006-02-14 00:00:00' and '2006-02-14 23:59:59'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: range
possible_keys: rental_date
key: rental_date
key_len: 5
ref: NULL
rows: 181
filtered: 100.00
Extra: Using where; Using index

Here is the JSON format, which shows the used_key_parts field, which reveals very clearly that only the first field of the index is used:

mysql> EXPLAIN FORMAT=JSON SELECT rental_id FROM rental WHERE rental_date BETWEEN '2006-02-14 00:00:00' and '2006-02-14 23:59:59'G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"table": {
"table_name": "rental",
"access_type": "range",
"possible_keys": [
"rental_date"
],
"key": "rental_date",
"used_key_parts": [
"rental_date"
],
"key_length": "5",
"rows": 181,
"filtered": 100,
"using_index": true,
"attached_condition": "(`sakila`.`rental`.`rental_date` between '2006-02-14 00:00:00' and '2006-02-14 23:59:59')"
}
}
}

And here is the MySQL Workbench Visual EXPLAIN that shows the used_key_parts clearly:

Workbench showing used_key_parts and BETWEEN

I am glad I took a second look at EXPLAIN FORMAT=JSON – the new features are awesome! My only complaint is that I think they should be added to either EXPLAIN or EXPLAIN EXTENDED. I also hope that tools like pt-query-digest will be updated to use the extra information.

Baron makes an excellent point in rate is [often] the important thing to look at.

This is something that, order takes this into account, this and the default configuration includes looking at both ratios and rates.

If I told you that your database had a ratio of temporary tables written to disk of 20%, visit web you might think “aha, my database is slow because of a lot of file I/O caused by writing temporary tables to disk!”. However, that 20% ratio may actually mean a rate of 2 per hour — which is most likely not causing excessive I/O.

To get a sense of this concept, and also how mysqltuner works, I will show the lines from the mysqltuner default configuration that deal with temporary tables written to disk. The format is that the fields are separated by three pipes (|||), and the fields are:

label
threshold check
formula
recommendation if “threshold check” is met

Here is the line from the default configuration file that calculates the rate of temporary tables written to disk:

% temp disk tables|||>25|||Created_tmp_disk_tables / (Created_tmp_tables + Created_tmp_disk_tables) * 100|||Too many temporary tables are being written to disk.  Increase max_heap_table_size and tmp_table_size.

mysqltuner will parse that as:

if
the value of Created_tmp_disk_tables/(Created_tmp_tables + Created_tmp_disk_tables)*100
>25
then print out the last field.

So that means that a ratio of 25% or more is the threshold. But we found that many clients have a ratio much less than 25%, but still had excessive temporary tables written to disk. So the default configuration also contains a rate calculation of temporary tables written to disk:

temp disk rate|||=~ /second|minute/|||&hr_bytime(Created_tmp_disk_tables/Uptime)|||Too many temporary tables are being written to disk.  Increase max_heap_table_size and tmp_table_size.

mysqltuner will parse that as:

if
the value of &hr_bytime(Created_tmp_disk_tables/Uptime)
matches “second” or “minute”
then print out the last field.

The hr_bytime() function in mysqltuner takes a number that is a per-second rate and makes it “human readable” (hence “hr”) by returning the order of magnitude at which the value is >1. For example:

hr_bytime(2) returns “2.0 per second”
hr_bytime(0.2) returns “12.0 per minute”
hr_bytime(0.02) returns “1.2 per minute”
hr_bytime(0.002) returns “7.2 per hour”
hr_bytime(0.0002) returns “17.28 per day”

Certainly, 0.02 looks small, but “12 per minute” is a better metric for a DBA to understand the problem.

Because the configuration file for mysqltuner 2.0 contains the threshold and check, it is fairly simple to change what the threshold is, and to check both rates and ratios. mysqltuner also allows you to output in different formats (currently there’s “pretty” and “csv”, but it’s easy to add a perl subroutine to do something different with the output), which makes it ideal for doing regular tuning checks for what is most important for you.

Pythian uses it on one client to provide weekly reports, which we add to a spreadsheet so that differences are easy to see. (yes, output directly to a database is on the “features we want to add” — mysqltuner is just a perl script, so if anyone in the community wants to add it, they can create a branch and request the feature to be added into the main trunk…it is all on launchpad, at https://launchpad.net/mysqltuner, so community contributions are recommended and encouraged.)

If you do not know what International Women’s Day is: http://www.internationalwomensday.com/

Start planning your blog posts for Ada Lovelace day now (March 24th, infertility podcasting, more about comic drawing etc.!) to draw attention to the achievements of women in technology and science.)

To that end, I would like to point out all the women currently in science and tech fields that I admire and think are doing great things. I think it would be great if everyone, male or female, made a list like this:

The women that have taught me science/tech along the way:

High School:
Mary Lou Ciavarra (Physics)
Maria Petretti (Pre-Algebra, and Academic Decathlon)
Reneé Fishman (Biology)
Lisa Acquaire (Economics during Academic Decathlon)

College:
Professor Kalpana White (Biology), and in whose fruit fly lab I worked for 2 semesters.
Professor Eve Marder (Introductory Neuroscience)

Though Brandeis does have female faculty in the Computer Science department, I did not manage to have any classes with female Computer Science faculty members.

My current female DBA co-workers at Pythian: Isabel Pinarci (Oracle), Michelle Gutzait (SQL Server), Catherine Chow (Oracle) and Jasmine Wen (Oracle).

And to folks in the greater MySQL/tech community and tech co-workers past and present, especially those I have been inspired and helped by: Tracy Gangwer, Selena Deckelmann (Postgres), Amy Rich, Anne Cross, and more (If I have forgotten you, I apologize!).

Most of this stuff is not PHP specific, weight loss and Python or Ruby or Java or .NET developers can use the tools in this talk.

The session on .

Slides are at http://talks.php.net/show/confoo10

“My name is Rasmus, I’ve been around for a long time. I’ve been doing this web stuff since 1992/1993.”

“Generally performance is not a PHP problem.” Webservers not config’d, no expire headers on images, no favicon.

Tools: Firefox/Firebug extension called YSlow (developed by yahoo) gives you a grade on your site.

Google has developed the Firefox/Firebug pagespeed tool.

Today Rasmus will pick on wordpress. He checks out the code, then uses Siege to do a baseline benchmark — see the slide for the results.

Before you do anything else install an opcode cache like APC. WordPress really likes this type of caching, see this slide for the results. Set the timezone, to make sure conversions aren’t being done all the time.

Make sure you are cpu-bound, NOT I/O bound. Otherwise, speed up the I/O.

Then strace your webserver processs. There are common config issues that you can spot in your strace code. grep for ENOENT which shows you “No such file or directory” errors.

AllowOverride None to turn off .htaccess for every directory, just read settings once from your config file….(unless you’re an ISP).

Make sure DirectoryIndex is set appropriately, watch your include_path. All this low-hanging fruit has examples on the common config issues slide.

Install pecl/inclued and generate a graph – here is the graph image (I have linked it because you really want to zoom in to the graph…)

In strace output check the open() calls. Conditional includes, function calls that include files, etc. need runtime context before knowing what to open. In the example, every request checks to see if we have the config file, once we have config’d we can get rid of that stuff. Get rid of all the conditionals and hard-code “include wp-config.php”. Examples are on the slide.

His tips to change:
Conditional config include in wp-load.php (as just mentioned)
Conditional did-header check in wp-blog-header.php
Don’t call require_wp_db() from wp-settings.php
Remove conditional require logic from wp_start_object_cache

Then check strace again, now all Rasmus sees is theming and translations, which he decided to keep, because that’s the good benefit of WordPress – Performance is all about costs vs. flexibility. You don’t want to get rid of all of your flexibility, but you want to be fast.

Set error_reporting(-1) in wp-settings.php to catch all warnings — warnings slow you down, so get rid of all errors. PHP error handling is very slow, so getting rid of errors will make you faster.

The slide of warnings that wordpress throws.

Look at all C-level calls made, using callgrind, which sits under valgrind, a CPU emulator used for debugging. See the image of what callgrind shows.

Now dive into the PHP executor, by installing XDebug.

Check xhprofFacebook open sourced this about a year ago, it’s a PECL extension. The output is pretty cool, try it on your own site, Rasmus does show you how to use it. It shows you functions sorted by the most expensive to the least expensive.

For example, use $_SERVER[REQUEST_TIME] instead of time(). Use pconnect() if MySQL can handle the amount of webserver connections that will be persistent, etc.

After you have changed a lot of the stuff above, benchmark again with siege to see how much faster you are. In this case there is not much gained so far.

So keep going….the blogroll is very slow — Rasmus gets rid of it by commenting out in the sidebar.php file. I’d like to see something to make it “semi-dynamic” — that is, make it a static file that can be re-generated, since you might want the blogroll but links are not changed every second…..

At this point we’re out of low-hanging fruit.

HipHop is a PHP to C++ converter & compiler, including a threaded, event-driven server that replaces apache. Rasmus’ slide says “WordPress is well-suited for HipHop because it doesn’t have a lot of dynamic runtime code. This is using the standard WordPress-svn checkout with a few tweaks.”

Then, of course, benchmark again.

The first time you compile WordPress with HipHop, you give it a list of files to add to the binary, it will complain about php code that generate file names, so you do have to fix that kind of stuff. There’s a huge mess of errors the first time you run it (”pages and pages”), and Rasmus had to patch HipHop (and WordPress) but the changes in HipHop have been put back into HipHop, so you should be good for the most part.

Check out the errors, lots of them show logical errors like $foo.”bar” instead of $foo.=”bar” and $foo=”bar” instead of $foo==”bar” in an if statement. Which of course is nice for your own code, to find those logical errors.

(WordPress takes in a $user_ID argument and immediately initializes a global $user_ID variable, which overwrites the argument passed in, so you can change the name of the argument passed in….)

You can also get rid of some code, things that check for existence of the same thing more than once. So it will take a bit of tweaking, but it’s worth it.

There are limitations to HipHop, for example:

  • It doesn’t support any of the new PHP 5.3 language features
  • Private properties don’t really exist under HipHop. They are treated as if they are protected instead.
  • You can’t unset variables. unset will clear the variable, but it will still be in the symbol table.
  • eval and create_function are limited
  • Variable variables $$var are not supported
  • Dynamic defines won’t work: define($name,$value)
  • get_loaded_extensions(), get_extension_funcs(), phpinfo(), debug_backtrace() don’t work
  • Conditional and dynamically created include filenames don’t work as you might expect
  • Default unix-domain socket filename isn’t set for MySQL so connecting to localhost doesn’t work

and HipHop does not support all extensions — see the list Rasmus has of extensions HipHop supports.

Then Rasmus showed an example using Twit (which he wrote) including the benchmarks. He shows that you can see what’s going on, like 5 MySQL calls on the home page and what happens when you don’t have a favicon.ico (in yellow).

In summary, “performance is all about architecture”, “know your costs”.

Be careful, because some tools (like valgrind and xdebug) you don’t want to put it on production systems, you could capture production traffic and replay it on a dev/testing box, but “you just have to minimize the differences and do your best”.

subtitle: Monetizing Social Media

Why is social media and social networking essential to you and your business? (because it will drive sales, apoplexy but there’s very few analytics for ROI on social networking and social media)

Relying on advertising is no longer working for print newspapers and television. So why do we think it will work on internet media?

Blogging — you must post 2-4 quality blog posts every week to maintain readership. This takes a lot of work! Content is king.

No matter how cool the technology/product/service is, people still buy more often and more easily from people they know and trust.

Social media is a way to show people that you are an industry expert, and that is how you should use them (not to spam and only say “buy my product”).

If you do not love your job and try to sell it (say, on social networking), you are going to fail, because you are not passionate about it.

Start small, do not promise a lot, it is better to have more to say than to have dead air time (radio analogy).

Social media is all about building good relationships by having good content that people trust.

Lots of people spend a lot of money on their website, but the website is just a vector to show people your content, and the content is the most important thing.

Cross-pollination – I think he means forward on information you learn (like, say, liveblogging!)

Get expert guest bloggers — he did not explain that you can leverage the relationships you form by asking them to blog. We do this with the Log Buffers….

How to make money:
sponsorships
white paper composition
paid articles
consulting
adjunct tie-ins to other related venues
branded blogs

I am personally disappointed because I wanted to learn more, and I feel as though Pythian already uses the knowledge presented — we have great exposure through our blog, and have started really using Twitter, Facebook and other social media sites, etc.

Hansen’s information was good, and absolutely 100% correct, but I felt that for me it was very basic. I would like to know some more advanced topics, like:
– How do you know when you have reached the tipping point?
– How do you convert anonymous readers/followers to people you know, without turning them away because they feel they’re being watched, spammed or don’t want to give out their info to you?
– When does copy/paste to send out your information start to bother people, how do you know how not to do too much?
– How do you convert readers/followers (anon or not) to paid customers without making them feel like you’re all about $$, what about if you have some free content and some paid content, how do you know how much to have?

When I heard that MySQL 5.6 was implementing EXPLAIN for writes (INSERT, buy information pills no rx UPDATE, pilule DELETE,REPLACE), I was pretty excited. Then I heard that MySQL 5.6 also was implementing a JSON format for EXPLAIN and my thought was “I do not care about that!”

Boy, was I wrong. The JSON format does not just put the output into JSON format, it also gives extra information that’s actually pretty useful! It can tell you when you are doing an implicit cast, which parts of a composite index are being used, and when index condition pushdown are being used. None of these are shown in regular EXPLAIN (which seems odd, why could they extend the JSON format but not put the information into the regular EXPLAIN format?), so using the JSON format is actually a good idea even if you do not care about what format your output is in.

As a note, MySQL Workbench’s Visual Explain (go to Query->Visual Explain Current Statement) also gives this information.

attached_condition and implicit casts

In a talk about EXPLAIN I do, I use the Sakila sample database. Here is an example of a “bad” query:

mysql> EXPLAIN SELECT rental_id FROM rental WHERE DATE(rental_date) = '2006-02-14'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: index
possible_keys: NULL
key: rental_date
key_len: 10
ref: NULL
rows: 16005
Extra: Using where; Using index
1 row in set (0.00 sec)

This query is “bad” because it is doing a full index scan (type: index) instead of doing a range scan for just the range of dates we want (should be type: range). Ironically, the EXPLAIN does not actually explain why.

However, the JSON format does explain why:
mysql> EXPLAIN FORMAT=JSON SELECT rental_id FROM rental WHERE DATE(rental_date) = '2006-02-14'G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"table": {
"table_name": "rental",
"access_type": "index",
"key": "rental_date",
"used_key_parts": [
"rental_date",
"inventory_id",
"customer_id"
],
"key_length": "10",
"rows": 16005,
"filtered": 100,
"using_index": true,
"attached_condition": "(cast(`sakila`.`rental`.`rental_date` as date) = '2006-02-14')"
}
}
}

Note that the attached_condition shows the implicit cast. This is MUCH more friendly to a developer or administrator who is trying to figure out why MySQL is not doing what they want it to do!

The visual EXPLAIN from MySQL Workbench also shows the implicit cast:

Workbench showing attached_condition and implicit CAST

You may also notice it shows the “filtered” attribute, which is not in regular EXPLAIN but is part of EXPLAIN EXTENDED – “filtered” is the percentage of “rows” that are estimated to be returned. A higher number here is better, if it is low it means that you are examining a lot of rows that you do not return.

used_key_parts

You may have noticed above that there is a used_key_parts array that does not show up in the traditional EXPLAIN. In a traditional EXPLAIN (or EXPLAIN EXTENDED), you do get to see the index length with the key_len field, so you can guess that only part of a composite index is used. Both the previous query and the following query use this index:

UNIQUE KEY `rental_date` (`rental_date`,`inventory_id`,`customer_id`)

Here is the traditional EXPLAIN – note that it shows the rental_date index is used, and the key_len is 5, which infers that only the first field fo the index, rental_date is being used, not the other 2 id fields. But you have to deduce that for yourself:

mysql> EXPLAIN EXTENDED SELECT rental_id FROM rental WHERE rental_date BETWEEN '2006-02-14 00:00:00' and '2006-02-14 23:59:59'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: range
possible_keys: rental_date
key: rental_date
key_len: 5
ref: NULL
rows: 181
filtered: 100.00
Extra: Using where; Using index

Here is the JSON format, which shows the used_key_parts field, which reveals very clearly that only the first field of the index is used:

mysql> EXPLAIN FORMAT=JSON SELECT rental_id FROM rental WHERE rental_date BETWEEN '2006-02-14 00:00:00' and '2006-02-14 23:59:59'G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"table": {
"table_name": "rental",
"access_type": "range",
"possible_keys": [
"rental_date"
],
"key": "rental_date",
"used_key_parts": [
"rental_date"
],
"key_length": "5",
"rows": 181,
"filtered": 100,
"using_index": true,
"attached_condition": "(`sakila`.`rental`.`rental_date` between '2006-02-14 00:00:00' and '2006-02-14 23:59:59')"
}
}
}

And here is the MySQL Workbench Visual EXPLAIN that shows the used_key_parts clearly:

Workbench showing used_key_parts and BETWEEN

I am glad I took a second look at EXPLAIN FORMAT=JSON – the new features are awesome! My only complaint is that I think they should be added to either EXPLAIN or EXPLAIN EXTENDED. I also hope that tools like pt-query-digest will be updated to use the extra information.

Baron makes an excellent point in rate is [often] the important thing to look at.

This is something that, order takes this into account, this and the default configuration includes looking at both ratios and rates.

If I told you that your database had a ratio of temporary tables written to disk of 20%, visit web you might think “aha, my database is slow because of a lot of file I/O caused by writing temporary tables to disk!”. However, that 20% ratio may actually mean a rate of 2 per hour — which is most likely not causing excessive I/O.

To get a sense of this concept, and also how mysqltuner works, I will show the lines from the mysqltuner default configuration that deal with temporary tables written to disk. The format is that the fields are separated by three pipes (|||), and the fields are:

label
threshold check
formula
recommendation if “threshold check” is met

Here is the line from the default configuration file that calculates the rate of temporary tables written to disk:

% temp disk tables|||>25|||Created_tmp_disk_tables / (Created_tmp_tables + Created_tmp_disk_tables) * 100|||Too many temporary tables are being written to disk.  Increase max_heap_table_size and tmp_table_size.

mysqltuner will parse that as:

if
the value of Created_tmp_disk_tables/(Created_tmp_tables + Created_tmp_disk_tables)*100
>25
then print out the last field.

So that means that a ratio of 25% or more is the threshold. But we found that many clients have a ratio much less than 25%, but still had excessive temporary tables written to disk. So the default configuration also contains a rate calculation of temporary tables written to disk:

temp disk rate|||=~ /second|minute/|||&hr_bytime(Created_tmp_disk_tables/Uptime)|||Too many temporary tables are being written to disk.  Increase max_heap_table_size and tmp_table_size.

mysqltuner will parse that as:

if
the value of &hr_bytime(Created_tmp_disk_tables/Uptime)
matches “second” or “minute”
then print out the last field.

The hr_bytime() function in mysqltuner takes a number that is a per-second rate and makes it “human readable” (hence “hr”) by returning the order of magnitude at which the value is >1. For example:

hr_bytime(2) returns “2.0 per second”
hr_bytime(0.2) returns “12.0 per minute”
hr_bytime(0.02) returns “1.2 per minute”
hr_bytime(0.002) returns “7.2 per hour”
hr_bytime(0.0002) returns “17.28 per day”

Certainly, 0.02 looks small, but “12 per minute” is a better metric for a DBA to understand the problem.

Because the configuration file for mysqltuner 2.0 contains the threshold and check, it is fairly simple to change what the threshold is, and to check both rates and ratios. mysqltuner also allows you to output in different formats (currently there’s “pretty” and “csv”, but it’s easy to add a perl subroutine to do something different with the output), which makes it ideal for doing regular tuning checks for what is most important for you.

Pythian uses it on one client to provide weekly reports, which we add to a spreadsheet so that differences are easy to see. (yes, output directly to a database is on the “features we want to add” — mysqltuner is just a perl script, so if anyone in the community wants to add it, they can create a branch and request the feature to be added into the main trunk…it is all on launchpad, at https://launchpad.net/mysqltuner, so community contributions are recommended and encouraged.)

If you do not know what International Women’s Day is: http://www.internationalwomensday.com/

Start planning your blog posts for Ada Lovelace day now (March 24th, infertility podcasting, more about comic drawing etc.!) to draw attention to the achievements of women in technology and science.)

To that end, I would like to point out all the women currently in science and tech fields that I admire and think are doing great things. I think it would be great if everyone, male or female, made a list like this:

The women that have taught me science/tech along the way:

High School:
Mary Lou Ciavarra (Physics)
Maria Petretti (Pre-Algebra, and Academic Decathlon)
Reneé Fishman (Biology)
Lisa Acquaire (Economics during Academic Decathlon)

College:
Professor Kalpana White (Biology), and in whose fruit fly lab I worked for 2 semesters.
Professor Eve Marder (Introductory Neuroscience)

Though Brandeis does have female faculty in the Computer Science department, I did not manage to have any classes with female Computer Science faculty members.

My current female DBA co-workers at Pythian: Isabel Pinarci (Oracle), Michelle Gutzait (SQL Server), Catherine Chow (Oracle) and Jasmine Wen (Oracle).

And to folks in the greater MySQL/tech community and tech co-workers past and present, especially those I have been inspired and helped by: Tracy Gangwer, Selena Deckelmann (Postgres), Amy Rich, Anne Cross, and more (If I have forgotten you, I apologize!).

Most of this stuff is not PHP specific, weight loss and Python or Ruby or Java or .NET developers can use the tools in this talk.

The session on .

Slides are at http://talks.php.net/show/confoo10

“My name is Rasmus, I’ve been around for a long time. I’ve been doing this web stuff since 1992/1993.”

“Generally performance is not a PHP problem.” Webservers not config’d, no expire headers on images, no favicon.

Tools: Firefox/Firebug extension called YSlow (developed by yahoo) gives you a grade on your site.

Google has developed the Firefox/Firebug pagespeed tool.

Today Rasmus will pick on wordpress. He checks out the code, then uses Siege to do a baseline benchmark — see the slide for the results.

Before you do anything else install an opcode cache like APC. WordPress really likes this type of caching, see this slide for the results. Set the timezone, to make sure conversions aren’t being done all the time.

Make sure you are cpu-bound, NOT I/O bound. Otherwise, speed up the I/O.

Then strace your webserver processs. There are common config issues that you can spot in your strace code. grep for ENOENT which shows you “No such file or directory” errors.

AllowOverride None to turn off .htaccess for every directory, just read settings once from your config file….(unless you’re an ISP).

Make sure DirectoryIndex is set appropriately, watch your include_path. All this low-hanging fruit has examples on the common config issues slide.

Install pecl/inclued and generate a graph – here is the graph image (I have linked it because you really want to zoom in to the graph…)

In strace output check the open() calls. Conditional includes, function calls that include files, etc. need runtime context before knowing what to open. In the example, every request checks to see if we have the config file, once we have config’d we can get rid of that stuff. Get rid of all the conditionals and hard-code “include wp-config.php”. Examples are on the slide.

His tips to change:
Conditional config include in wp-load.php (as just mentioned)
Conditional did-header check in wp-blog-header.php
Don’t call require_wp_db() from wp-settings.php
Remove conditional require logic from wp_start_object_cache

Then check strace again, now all Rasmus sees is theming and translations, which he decided to keep, because that’s the good benefit of WordPress – Performance is all about costs vs. flexibility. You don’t want to get rid of all of your flexibility, but you want to be fast.

Set error_reporting(-1) in wp-settings.php to catch all warnings — warnings slow you down, so get rid of all errors. PHP error handling is very slow, so getting rid of errors will make you faster.

The slide of warnings that wordpress throws.

Look at all C-level calls made, using callgrind, which sits under valgrind, a CPU emulator used for debugging. See the image of what callgrind shows.

Now dive into the PHP executor, by installing XDebug.

Check xhprofFacebook open sourced this about a year ago, it’s a PECL extension. The output is pretty cool, try it on your own site, Rasmus does show you how to use it. It shows you functions sorted by the most expensive to the least expensive.

For example, use $_SERVER[REQUEST_TIME] instead of time(). Use pconnect() if MySQL can handle the amount of webserver connections that will be persistent, etc.

After you have changed a lot of the stuff above, benchmark again with siege to see how much faster you are. In this case there is not much gained so far.

So keep going….the blogroll is very slow — Rasmus gets rid of it by commenting out in the sidebar.php file. I’d like to see something to make it “semi-dynamic” — that is, make it a static file that can be re-generated, since you might want the blogroll but links are not changed every second…..

At this point we’re out of low-hanging fruit.

HipHop is a PHP to C++ converter & compiler, including a threaded, event-driven server that replaces apache. Rasmus’ slide says “WordPress is well-suited for HipHop because it doesn’t have a lot of dynamic runtime code. This is using the standard WordPress-svn checkout with a few tweaks.”

Then, of course, benchmark again.

The first time you compile WordPress with HipHop, you give it a list of files to add to the binary, it will complain about php code that generate file names, so you do have to fix that kind of stuff. There’s a huge mess of errors the first time you run it (”pages and pages”), and Rasmus had to patch HipHop (and WordPress) but the changes in HipHop have been put back into HipHop, so you should be good for the most part.

Check out the errors, lots of them show logical errors like $foo.”bar” instead of $foo.=”bar” and $foo=”bar” instead of $foo==”bar” in an if statement. Which of course is nice for your own code, to find those logical errors.

(WordPress takes in a $user_ID argument and immediately initializes a global $user_ID variable, which overwrites the argument passed in, so you can change the name of the argument passed in….)

You can also get rid of some code, things that check for existence of the same thing more than once. So it will take a bit of tweaking, but it’s worth it.

There are limitations to HipHop, for example:

  • It doesn’t support any of the new PHP 5.3 language features
  • Private properties don’t really exist under HipHop. They are treated as if they are protected instead.
  • You can’t unset variables. unset will clear the variable, but it will still be in the symbol table.
  • eval and create_function are limited
  • Variable variables $$var are not supported
  • Dynamic defines won’t work: define($name,$value)
  • get_loaded_extensions(), get_extension_funcs(), phpinfo(), debug_backtrace() don’t work
  • Conditional and dynamically created include filenames don’t work as you might expect
  • Default unix-domain socket filename isn’t set for MySQL so connecting to localhost doesn’t work

and HipHop does not support all extensions — see the list Rasmus has of extensions HipHop supports.

Then Rasmus showed an example using Twit (which he wrote) including the benchmarks. He shows that you can see what’s going on, like 5 MySQL calls on the home page and what happens when you don’t have a favicon.ico (in yellow).

In summary, “performance is all about architecture”, “know your costs”.

Be careful, because some tools (like valgrind and xdebug) you don’t want to put it on production systems, you could capture production traffic and replay it on a dev/testing box, but “you just have to minimize the differences and do your best”.

subtitle: Monetizing Social Media

Why is social media and social networking essential to you and your business? (because it will drive sales, apoplexy but there’s very few analytics for ROI on social networking and social media)

Relying on advertising is no longer working for print newspapers and television. So why do we think it will work on internet media?

Blogging — you must post 2-4 quality blog posts every week to maintain readership. This takes a lot of work! Content is king.

No matter how cool the technology/product/service is, people still buy more often and more easily from people they know and trust.

Social media is a way to show people that you are an industry expert, and that is how you should use them (not to spam and only say “buy my product”).

If you do not love your job and try to sell it (say, on social networking), you are going to fail, because you are not passionate about it.

Start small, do not promise a lot, it is better to have more to say than to have dead air time (radio analogy).

Social media is all about building good relationships by having good content that people trust.

Lots of people spend a lot of money on their website, but the website is just a vector to show people your content, and the content is the most important thing.

Cross-pollination – I think he means forward on information you learn (like, say, liveblogging!)

Get expert guest bloggers — he did not explain that you can leverage the relationships you form by asking them to blog. We do this with the Log Buffers….

How to make money:
sponsorships
white paper composition
paid articles
consulting
adjunct tie-ins to other related venues
branded blogs

I am personally disappointed because I wanted to learn more, and I feel as though Pythian already uses the knowledge presented — we have great exposure through our blog, and have started really using Twitter, Facebook and other social media sites, etc.

Hansen’s information was good, and absolutely 100% correct, but I felt that for me it was very basic. I would like to know some more advanced topics, like:
– How do you know when you have reached the tipping point?
– How do you convert anonymous readers/followers to people you know, without turning them away because they feel they’re being watched, spammed or don’t want to give out their info to you?
– When does copy/paste to send out your information start to bother people, how do you know how not to do too much?
– How do you convert readers/followers (anon or not) to paid customers without making them feel like you’re all about $$, what about if you have some free content and some paid content, how do you know how much to have?

What is confoo? It is the sequel to the PHP Quebéc Conference (2003 – 2009). This year PHP Quebec decided to team up with Montreal-Python, sickness W3Quebéc and OWASP Montréal to produce confoo.

And now, on to Mark Pilgrim of Google speaking on HTML5.

Timeline
1991 – HTML 1
1994 – HTML 2
1995 – Netscape discovers web, ruins it
1996 – CSS1 + JavaScript
1996 – Microsoft discovers web, ruins it
1997 – HTML4 + EMCAScript1
1998 – CSS2 + EMCAScript2 + DOM1
2000 – XHTML1 + EMCAScript3 + DOM2
2001 – XHTML 1.1
[long break!]
2009 – HTML 5 + ECMA5 + CSS 2.1

HTML5 is not a spec, it’s a marketing term. It’s really HTML5 + CSS3 + JavaScript.

IsHTML5ReadyYet.com and IsHTML5Ready.com are both real websites that give different answers to the question “is HTML 5 ready?”

Semantics
HTML started as a semantic language (until Netscape came along).

New elements (html tags) that do not do anything – they are for semantic use only:

&lt;header&gt; &lt;footer&gt;
&lt;section&gt;
&lt;article&gt;
&lt;nav&gt;
&lt;aside&gt; (pull quotes and such)
&lt;time&gt; (datetime markup)
&lt;mark&gt; (marking up runs of text)
&lt;figure&gt; &lt;figcaption&gt;

Instead of “div class=_____” use these tags….for example:

&lt;body&gt;
&lt;header&gt;
&lt;hgroup&gt;
&lt;h2&gt;page title&lt;/h2&gt;
&lt;h3&gt;page subtitle&lt;/h3&gt;
&lt;/hgroup&gt;
&lt;/header&gt;

&lt;nav&gt;
&lt;ul&gt; Navigation......
.....
&lt;/ul&gt;
&lt;/nav&gt;

&lt;section&gt;
&lt;article&gt;
&lt;header&gt;
&lt;h2&gt;Title&lt;/h2&gt;
&lt;/header&gt;
&lt;/section&gt;

Caveat: This doesn’t work in IE but there is a workaround…..

This can help blind people navigate better….and bots too!

“Google is just another blind web user with 7 million friends”

Forms
Web forms 2.0
To make a slider from 0-50:

&lt;input type='range' mix='0' max='50' value='0'&gt;&lt;/input&gt;

To use autofocus:

&lt;input autofocus&gt;

(works in 3 browsers)

Talking about blind users again: “Focus tracking is VERY important if you can’t see. You really need to know where on the page you are, if you start typing what will happen.”

Placeholder text — in a text box, that light text that goes away when you click:

&lt;input type='text' placeholder='click here and this will disappear'&gt;

(works in 2 browsers)

New input types
These are semantic types, do different things in different browsers

&lt;input type='email'&gt; (on the iphone you get a different keyboard, by default you just get a textfield, so these things degrade gracefully if the browser does not support the feature)
&lt;input type='url'&gt; (a browser like &lt;A HREF=&quot;http://www.opera.com&quot;&gt;Opera&lt;/A&gt; can validate a URL for you instead of you doing it yourself!)
&lt;input type='datetime'&gt; (and more...date pickers are tedious)
&lt;input type='file' multiple&gt; (multiple files without using flash!)

For all the inputs HTML5 supports and which browsers support them (Opera is leading the way) search for “HTML5 input support”

Accessibility
ARIA = “accessible rich internet applications”. Alt-text is technology that’s long behind. ARIA does stuff like making tree views accessible. For example, right now with a tree view you have to tab through each item, which is a pain. With code like this:

&lt;ul id='tree1' role='tree' tabindex='0' aria-labelledby='label_1'&gt;
&lt;li role='treeitem' tabindex='-1' aria-expanded='true'&gt;Fruits &lt;/li&gt;
&lt;li role='group'&gt;
&lt;ul&gt;
&lt;li role='treeitem' tabindex='-1'&gt;Oranges&lt;/li&gt;
&lt;li role='treeitem' tabindex='-1'&gt;Pineapples&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

….keyboard users can tab to the treeview itself, then use arrow keys to navigate and spacebar to select. This makes selecting an item at the end of a tree view much easier, and also makes it easy to move beyond the tree view without having to press Tab a million times.

Use your favorite search engine for “ARIA accessibility” to learn more.

CSS
Mark threw this image up on the screen:


(image from http://www.zazzle.com/stevenfrank – on that site you can buy this coffee mug or a T-shirt with the design)

Web fonts finally work in CSS3 – you can use more than Times, Courier, Arial, and occasionally Helvetica. This works EVERYWHERE – Chrome, IE, Firefox, Opera, Safari, etc. Well, it’s true that they all use it, but they all have different fonts they support. Read Bulletproof font face for tips on how to get the font you want no matter what browser is used (yes, even IE).

Opacity is easy [author’s note – it’s just the “opacity” element, see examples at http://www.css3.info/preview/opacity/].

Rounded corners are EASY – Mark’s slide passed too fast for me, so I grabbed an example from http://24ways.org/2006/rounded-corner-boxes-the-css3-way:

.box {
border-radius: 1.6em;
}

Gradients are easy [author’s note — looks like you need webkit, there’s examples at http://gradients.glrzad.com/]

To test CSS3 stuff, use www.css3please.com – “This element will receive inline changes as you edit the CSS rules on the left.”

[Author’s note — while searching I found http://www.webappers.com/2009/08/10/70-must-have-css3-and-html5-tutorials-and-resources/ which is definitely a “must have”.]

Canvas
A canvas is a blank slate where you can draw whatever you want, use the canvas tag and id, width and height attributes, everything else is javascript. Pretty awesome. [Author’s note — Mark had examples but I did not have time to capture them. I did find a nice tutorial at https://developer.mozilla.org/en/Canvas_tutorial.]

Multimedia
Video with no flash! YouTube has HTML5 integration. Here’s sample code of how to do movies in HTML5:

&lt;video src='movie.ogv' controls&gt;&lt;/video&gt;
&lt;video src='movie.ogv' loop&gt;&lt;/video&gt;
&lt;video src='movie.ogv' preload='none'&gt;&lt;/video&gt;  -- don't preload the movie
&lt;video src='movie.ogv' preload='auto'&gt;&lt;/video&gt;
&lt;video src='movie.ogv' autoplay&gt;&lt;/video&gt; -- if you don't have this you don't do evil autoplay....

Multimedia is in the DOM and responds to CSS effects, such as reflection:

&lt;video src='movie.ogv' loop style='webkit-box-reflect: below 1px;'&gt;&lt;/video&gt;

(this code might be wrong, the slide flipped fast)

Of course the problem — codecs. Right now, .ogv and .mp4 (h264).

Audio inline too, same problem — only .oga and .mp3:

&lt;audio src ='podcast.oga' controls&gt;&lt;/audio&gt;

Geolocation
IsGeolocationPartofHTML5.com is a real site, go to it to get the answer.
Geolocation demos — very much the same, find your location and display it. Simple but cool.

Cache manifest
Get everything you need for offline usage…

&lt;html manifest='another-sky.manifest'&gt;

CACHE MANIFEST
/avatars/zoe.png
/avatars/tamara.png
/scripts/holoband.jpg

search for “google for mobile HTML5 series” – good series of articles on using this stuff.

HTML 5 has much more
Local storage
Web workers
Web sockets (2way connections, like raw tcp/ip cxns over the web)
3D canvas (webgl)
Microdata (enhanced semantics)
Desktop notifications
Drag and Drop

Learn more:
whatwg.org/html5
diveintohtml5.org

When I heard that MySQL 5.6 was implementing EXPLAIN for writes (INSERT, buy information pills no rx UPDATE, pilule DELETE,REPLACE), I was pretty excited. Then I heard that MySQL 5.6 also was implementing a JSON format for EXPLAIN and my thought was “I do not care about that!”

Boy, was I wrong. The JSON format does not just put the output into JSON format, it also gives extra information that’s actually pretty useful! It can tell you when you are doing an implicit cast, which parts of a composite index are being used, and when index condition pushdown are being used. None of these are shown in regular EXPLAIN (which seems odd, why could they extend the JSON format but not put the information into the regular EXPLAIN format?), so using the JSON format is actually a good idea even if you do not care about what format your output is in.

As a note, MySQL Workbench’s Visual Explain (go to Query->Visual Explain Current Statement) also gives this information.

attached_condition and implicit casts

In a talk about EXPLAIN I do, I use the Sakila sample database. Here is an example of a “bad” query:

mysql> EXPLAIN SELECT rental_id FROM rental WHERE DATE(rental_date) = '2006-02-14'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: index
possible_keys: NULL
key: rental_date
key_len: 10
ref: NULL
rows: 16005
Extra: Using where; Using index
1 row in set (0.00 sec)

This query is “bad” because it is doing a full index scan (type: index) instead of doing a range scan for just the range of dates we want (should be type: range). Ironically, the EXPLAIN does not actually explain why.

However, the JSON format does explain why:
mysql> EXPLAIN FORMAT=JSON SELECT rental_id FROM rental WHERE DATE(rental_date) = '2006-02-14'G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"table": {
"table_name": "rental",
"access_type": "index",
"key": "rental_date",
"used_key_parts": [
"rental_date",
"inventory_id",
"customer_id"
],
"key_length": "10",
"rows": 16005,
"filtered": 100,
"using_index": true,
"attached_condition": "(cast(`sakila`.`rental`.`rental_date` as date) = '2006-02-14')"
}
}
}

Note that the attached_condition shows the implicit cast. This is MUCH more friendly to a developer or administrator who is trying to figure out why MySQL is not doing what they want it to do!

The visual EXPLAIN from MySQL Workbench also shows the implicit cast:

Workbench showing attached_condition and implicit CAST

You may also notice it shows the “filtered” attribute, which is not in regular EXPLAIN but is part of EXPLAIN EXTENDED – “filtered” is the percentage of “rows” that are estimated to be returned. A higher number here is better, if it is low it means that you are examining a lot of rows that you do not return.

used_key_parts

You may have noticed above that there is a used_key_parts array that does not show up in the traditional EXPLAIN. In a traditional EXPLAIN (or EXPLAIN EXTENDED), you do get to see the index length with the key_len field, so you can guess that only part of a composite index is used. Both the previous query and the following query use this index:

UNIQUE KEY `rental_date` (`rental_date`,`inventory_id`,`customer_id`)

Here is the traditional EXPLAIN – note that it shows the rental_date index is used, and the key_len is 5, which infers that only the first field fo the index, rental_date is being used, not the other 2 id fields. But you have to deduce that for yourself:

mysql> EXPLAIN EXTENDED SELECT rental_id FROM rental WHERE rental_date BETWEEN '2006-02-14 00:00:00' and '2006-02-14 23:59:59'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: range
possible_keys: rental_date
key: rental_date
key_len: 5
ref: NULL
rows: 181
filtered: 100.00
Extra: Using where; Using index

Here is the JSON format, which shows the used_key_parts field, which reveals very clearly that only the first field of the index is used:

mysql> EXPLAIN FORMAT=JSON SELECT rental_id FROM rental WHERE rental_date BETWEEN '2006-02-14 00:00:00' and '2006-02-14 23:59:59'G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"table": {
"table_name": "rental",
"access_type": "range",
"possible_keys": [
"rental_date"
],
"key": "rental_date",
"used_key_parts": [
"rental_date"
],
"key_length": "5",
"rows": 181,
"filtered": 100,
"using_index": true,
"attached_condition": "(`sakila`.`rental`.`rental_date` between '2006-02-14 00:00:00' and '2006-02-14 23:59:59')"
}
}
}

And here is the MySQL Workbench Visual EXPLAIN that shows the used_key_parts clearly:

Workbench showing used_key_parts and BETWEEN

I am glad I took a second look at EXPLAIN FORMAT=JSON – the new features are awesome! My only complaint is that I think they should be added to either EXPLAIN or EXPLAIN EXTENDED. I also hope that tools like pt-query-digest will be updated to use the extra information.

Baron makes an excellent point in rate is [often] the important thing to look at.

This is something that, order takes this into account, this and the default configuration includes looking at both ratios and rates.

If I told you that your database had a ratio of temporary tables written to disk of 20%, visit web you might think “aha, my database is slow because of a lot of file I/O caused by writing temporary tables to disk!”. However, that 20% ratio may actually mean a rate of 2 per hour — which is most likely not causing excessive I/O.

To get a sense of this concept, and also how mysqltuner works, I will show the lines from the mysqltuner default configuration that deal with temporary tables written to disk. The format is that the fields are separated by three pipes (|||), and the fields are:

label
threshold check
formula
recommendation if “threshold check” is met

Here is the line from the default configuration file that calculates the rate of temporary tables written to disk:

% temp disk tables|||>25|||Created_tmp_disk_tables / (Created_tmp_tables + Created_tmp_disk_tables) * 100|||Too many temporary tables are being written to disk.  Increase max_heap_table_size and tmp_table_size.

mysqltuner will parse that as:

if
the value of Created_tmp_disk_tables/(Created_tmp_tables + Created_tmp_disk_tables)*100
>25
then print out the last field.

So that means that a ratio of 25% or more is the threshold. But we found that many clients have a ratio much less than 25%, but still had excessive temporary tables written to disk. So the default configuration also contains a rate calculation of temporary tables written to disk:

temp disk rate|||=~ /second|minute/|||&hr_bytime(Created_tmp_disk_tables/Uptime)|||Too many temporary tables are being written to disk.  Increase max_heap_table_size and tmp_table_size.

mysqltuner will parse that as:

if
the value of &hr_bytime(Created_tmp_disk_tables/Uptime)
matches “second” or “minute”
then print out the last field.

The hr_bytime() function in mysqltuner takes a number that is a per-second rate and makes it “human readable” (hence “hr”) by returning the order of magnitude at which the value is >1. For example:

hr_bytime(2) returns “2.0 per second”
hr_bytime(0.2) returns “12.0 per minute”
hr_bytime(0.02) returns “1.2 per minute”
hr_bytime(0.002) returns “7.2 per hour”
hr_bytime(0.0002) returns “17.28 per day”

Certainly, 0.02 looks small, but “12 per minute” is a better metric for a DBA to understand the problem.

Because the configuration file for mysqltuner 2.0 contains the threshold and check, it is fairly simple to change what the threshold is, and to check both rates and ratios. mysqltuner also allows you to output in different formats (currently there’s “pretty” and “csv”, but it’s easy to add a perl subroutine to do something different with the output), which makes it ideal for doing regular tuning checks for what is most important for you.

Pythian uses it on one client to provide weekly reports, which we add to a spreadsheet so that differences are easy to see. (yes, output directly to a database is on the “features we want to add” — mysqltuner is just a perl script, so if anyone in the community wants to add it, they can create a branch and request the feature to be added into the main trunk…it is all on launchpad, at https://launchpad.net/mysqltuner, so community contributions are recommended and encouraged.)

If you do not know what International Women’s Day is: http://www.internationalwomensday.com/

Start planning your blog posts for Ada Lovelace day now (March 24th, infertility podcasting, more about comic drawing etc.!) to draw attention to the achievements of women in technology and science.)

To that end, I would like to point out all the women currently in science and tech fields that I admire and think are doing great things. I think it would be great if everyone, male or female, made a list like this:

The women that have taught me science/tech along the way:

High School:
Mary Lou Ciavarra (Physics)
Maria Petretti (Pre-Algebra, and Academic Decathlon)
Reneé Fishman (Biology)
Lisa Acquaire (Economics during Academic Decathlon)

College:
Professor Kalpana White (Biology), and in whose fruit fly lab I worked for 2 semesters.
Professor Eve Marder (Introductory Neuroscience)

Though Brandeis does have female faculty in the Computer Science department, I did not manage to have any classes with female Computer Science faculty members.

My current female DBA co-workers at Pythian: Isabel Pinarci (Oracle), Michelle Gutzait (SQL Server), Catherine Chow (Oracle) and Jasmine Wen (Oracle).

And to folks in the greater MySQL/tech community and tech co-workers past and present, especially those I have been inspired and helped by: Tracy Gangwer, Selena Deckelmann (Postgres), Amy Rich, Anne Cross, and more (If I have forgotten you, I apologize!).

Most of this stuff is not PHP specific, weight loss and Python or Ruby or Java or .NET developers can use the tools in this talk.

The session on .

Slides are at http://talks.php.net/show/confoo10

“My name is Rasmus, I’ve been around for a long time. I’ve been doing this web stuff since 1992/1993.”

“Generally performance is not a PHP problem.” Webservers not config’d, no expire headers on images, no favicon.

Tools: Firefox/Firebug extension called YSlow (developed by yahoo) gives you a grade on your site.

Google has developed the Firefox/Firebug pagespeed tool.

Today Rasmus will pick on wordpress. He checks out the code, then uses Siege to do a baseline benchmark — see the slide for the results.

Before you do anything else install an opcode cache like APC. WordPress really likes this type of caching, see this slide for the results. Set the timezone, to make sure conversions aren’t being done all the time.

Make sure you are cpu-bound, NOT I/O bound. Otherwise, speed up the I/O.

Then strace your webserver processs. There are common config issues that you can spot in your strace code. grep for ENOENT which shows you “No such file or directory” errors.

AllowOverride None to turn off .htaccess for every directory, just read settings once from your config file….(unless you’re an ISP).

Make sure DirectoryIndex is set appropriately, watch your include_path. All this low-hanging fruit has examples on the common config issues slide.

Install pecl/inclued and generate a graph – here is the graph image (I have linked it because you really want to zoom in to the graph…)

In strace output check the open() calls. Conditional includes, function calls that include files, etc. need runtime context before knowing what to open. In the example, every request checks to see if we have the config file, once we have config’d we can get rid of that stuff. Get rid of all the conditionals and hard-code “include wp-config.php”. Examples are on the slide.

His tips to change:
Conditional config include in wp-load.php (as just mentioned)
Conditional did-header check in wp-blog-header.php
Don’t call require_wp_db() from wp-settings.php
Remove conditional require logic from wp_start_object_cache

Then check strace again, now all Rasmus sees is theming and translations, which he decided to keep, because that’s the good benefit of WordPress – Performance is all about costs vs. flexibility. You don’t want to get rid of all of your flexibility, but you want to be fast.

Set error_reporting(-1) in wp-settings.php to catch all warnings — warnings slow you down, so get rid of all errors. PHP error handling is very slow, so getting rid of errors will make you faster.

The slide of warnings that wordpress throws.

Look at all C-level calls made, using callgrind, which sits under valgrind, a CPU emulator used for debugging. See the image of what callgrind shows.

Now dive into the PHP executor, by installing XDebug.

Check xhprofFacebook open sourced this about a year ago, it’s a PECL extension. The output is pretty cool, try it on your own site, Rasmus does show you how to use it. It shows you functions sorted by the most expensive to the least expensive.

For example, use $_SERVER[REQUEST_TIME] instead of time(). Use pconnect() if MySQL can handle the amount of webserver connections that will be persistent, etc.

After you have changed a lot of the stuff above, benchmark again with siege to see how much faster you are. In this case there is not much gained so far.

So keep going….the blogroll is very slow — Rasmus gets rid of it by commenting out in the sidebar.php file. I’d like to see something to make it “semi-dynamic” — that is, make it a static file that can be re-generated, since you might want the blogroll but links are not changed every second…..

At this point we’re out of low-hanging fruit.

HipHop is a PHP to C++ converter & compiler, including a threaded, event-driven server that replaces apache. Rasmus’ slide says “WordPress is well-suited for HipHop because it doesn’t have a lot of dynamic runtime code. This is using the standard WordPress-svn checkout with a few tweaks.”

Then, of course, benchmark again.

The first time you compile WordPress with HipHop, you give it a list of files to add to the binary, it will complain about php code that generate file names, so you do have to fix that kind of stuff. There’s a huge mess of errors the first time you run it (”pages and pages”), and Rasmus had to patch HipHop (and WordPress) but the changes in HipHop have been put back into HipHop, so you should be good for the most part.

Check out the errors, lots of them show logical errors like $foo.”bar” instead of $foo.=”bar” and $foo=”bar” instead of $foo==”bar” in an if statement. Which of course is nice for your own code, to find those logical errors.

(WordPress takes in a $user_ID argument and immediately initializes a global $user_ID variable, which overwrites the argument passed in, so you can change the name of the argument passed in….)

You can also get rid of some code, things that check for existence of the same thing more than once. So it will take a bit of tweaking, but it’s worth it.

There are limitations to HipHop, for example:

  • It doesn’t support any of the new PHP 5.3 language features
  • Private properties don’t really exist under HipHop. They are treated as if they are protected instead.
  • You can’t unset variables. unset will clear the variable, but it will still be in the symbol table.
  • eval and create_function are limited
  • Variable variables $$var are not supported
  • Dynamic defines won’t work: define($name,$value)
  • get_loaded_extensions(), get_extension_funcs(), phpinfo(), debug_backtrace() don’t work
  • Conditional and dynamically created include filenames don’t work as you might expect
  • Default unix-domain socket filename isn’t set for MySQL so connecting to localhost doesn’t work

and HipHop does not support all extensions — see the list Rasmus has of extensions HipHop supports.

Then Rasmus showed an example using Twit (which he wrote) including the benchmarks. He shows that you can see what’s going on, like 5 MySQL calls on the home page and what happens when you don’t have a favicon.ico (in yellow).

In summary, “performance is all about architecture”, “know your costs”.

Be careful, because some tools (like valgrind and xdebug) you don’t want to put it on production systems, you could capture production traffic and replay it on a dev/testing box, but “you just have to minimize the differences and do your best”.

subtitle: Monetizing Social Media

Why is social media and social networking essential to you and your business? (because it will drive sales, apoplexy but there’s very few analytics for ROI on social networking and social media)

Relying on advertising is no longer working for print newspapers and television. So why do we think it will work on internet media?

Blogging — you must post 2-4 quality blog posts every week to maintain readership. This takes a lot of work! Content is king.

No matter how cool the technology/product/service is, people still buy more often and more easily from people they know and trust.

Social media is a way to show people that you are an industry expert, and that is how you should use them (not to spam and only say “buy my product”).

If you do not love your job and try to sell it (say, on social networking), you are going to fail, because you are not passionate about it.

Start small, do not promise a lot, it is better to have more to say than to have dead air time (radio analogy).

Social media is all about building good relationships by having good content that people trust.

Lots of people spend a lot of money on their website, but the website is just a vector to show people your content, and the content is the most important thing.

Cross-pollination – I think he means forward on information you learn (like, say, liveblogging!)

Get expert guest bloggers — he did not explain that you can leverage the relationships you form by asking them to blog. We do this with the Log Buffers….

How to make money:
sponsorships
white paper composition
paid articles
consulting
adjunct tie-ins to other related venues
branded blogs

I am personally disappointed because I wanted to learn more, and I feel as though Pythian already uses the knowledge presented — we have great exposure through our blog, and have started really using Twitter, Facebook and other social media sites, etc.

Hansen’s information was good, and absolutely 100% correct, but I felt that for me it was very basic. I would like to know some more advanced topics, like:
– How do you know when you have reached the tipping point?
– How do you convert anonymous readers/followers to people you know, without turning them away because they feel they’re being watched, spammed or don’t want to give out their info to you?
– When does copy/paste to send out your information start to bother people, how do you know how not to do too much?
– How do you convert readers/followers (anon or not) to paid customers without making them feel like you’re all about $$, what about if you have some free content and some paid content, how do you know how much to have?

What is confoo? It is the sequel to the PHP Quebéc Conference (2003 – 2009). This year PHP Quebec decided to team up with Montreal-Python, sickness W3Quebéc and OWASP Montréal to produce confoo.

And now, on to Mark Pilgrim of Google speaking on HTML5.

Timeline
1991 – HTML 1
1994 – HTML 2
1995 – Netscape discovers web, ruins it
1996 – CSS1 + JavaScript
1996 – Microsoft discovers web, ruins it
1997 – HTML4 + EMCAScript1
1998 – CSS2 + EMCAScript2 + DOM1
2000 – XHTML1 + EMCAScript3 + DOM2
2001 – XHTML 1.1
[long break!]
2009 – HTML 5 + ECMA5 + CSS 2.1

HTML5 is not a spec, it’s a marketing term. It’s really HTML5 + CSS3 + JavaScript.

IsHTML5ReadyYet.com and IsHTML5Ready.com are both real websites that give different answers to the question “is HTML 5 ready?”

Semantics
HTML started as a semantic language (until Netscape came along).

New elements (html tags) that do not do anything – they are for semantic use only:

&lt;header&gt; &lt;footer&gt;
&lt;section&gt;
&lt;article&gt;
&lt;nav&gt;
&lt;aside&gt; (pull quotes and such)
&lt;time&gt; (datetime markup)
&lt;mark&gt; (marking up runs of text)
&lt;figure&gt; &lt;figcaption&gt;

Instead of “div class=_____” use these tags….for example:

&lt;body&gt;
&lt;header&gt;
&lt;hgroup&gt;
&lt;h2&gt;page title&lt;/h2&gt;
&lt;h3&gt;page subtitle&lt;/h3&gt;
&lt;/hgroup&gt;
&lt;/header&gt;

&lt;nav&gt;
&lt;ul&gt; Navigation......
.....
&lt;/ul&gt;
&lt;/nav&gt;

&lt;section&gt;
&lt;article&gt;
&lt;header&gt;
&lt;h2&gt;Title&lt;/h2&gt;
&lt;/header&gt;
&lt;/section&gt;

Caveat: This doesn’t work in IE but there is a workaround…..

This can help blind people navigate better….and bots too!

“Google is just another blind web user with 7 million friends”

Forms
Web forms 2.0
To make a slider from 0-50:

&lt;input type='range' mix='0' max='50' value='0'&gt;&lt;/input&gt;

To use autofocus:

&lt;input autofocus&gt;

(works in 3 browsers)

Talking about blind users again: “Focus tracking is VERY important if you can’t see. You really need to know where on the page you are, if you start typing what will happen.”

Placeholder text — in a text box, that light text that goes away when you click:

&lt;input type='text' placeholder='click here and this will disappear'&gt;

(works in 2 browsers)

New input types
These are semantic types, do different things in different browsers

&lt;input type='email'&gt; (on the iphone you get a different keyboard, by default you just get a textfield, so these things degrade gracefully if the browser does not support the feature)
&lt;input type='url'&gt; (a browser like &lt;A HREF=&quot;http://www.opera.com&quot;&gt;Opera&lt;/A&gt; can validate a URL for you instead of you doing it yourself!)
&lt;input type='datetime'&gt; (and more...date pickers are tedious)
&lt;input type='file' multiple&gt; (multiple files without using flash!)

For all the inputs HTML5 supports and which browsers support them (Opera is leading the way) search for “HTML5 input support”

Accessibility
ARIA = “accessible rich internet applications”. Alt-text is technology that’s long behind. ARIA does stuff like making tree views accessible. For example, right now with a tree view you have to tab through each item, which is a pain. With code like this:

&lt;ul id='tree1' role='tree' tabindex='0' aria-labelledby='label_1'&gt;
&lt;li role='treeitem' tabindex='-1' aria-expanded='true'&gt;Fruits &lt;/li&gt;
&lt;li role='group'&gt;
&lt;ul&gt;
&lt;li role='treeitem' tabindex='-1'&gt;Oranges&lt;/li&gt;
&lt;li role='treeitem' tabindex='-1'&gt;Pineapples&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

….keyboard users can tab to the treeview itself, then use arrow keys to navigate and spacebar to select. This makes selecting an item at the end of a tree view much easier, and also makes it easy to move beyond the tree view without having to press Tab a million times.

Use your favorite search engine for “ARIA accessibility” to learn more.

CSS
Mark threw this image up on the screen:


(image from http://www.zazzle.com/stevenfrank – on that site you can buy this coffee mug or a T-shirt with the design)

Web fonts finally work in CSS3 – you can use more than Times, Courier, Arial, and occasionally Helvetica. This works EVERYWHERE – Chrome, IE, Firefox, Opera, Safari, etc. Well, it’s true that they all use it, but they all have different fonts they support. Read Bulletproof font face for tips on how to get the font you want no matter what browser is used (yes, even IE).

Opacity is easy [author’s note – it’s just the “opacity” element, see examples at http://www.css3.info/preview/opacity/].

Rounded corners are EASY – Mark’s slide passed too fast for me, so I grabbed an example from http://24ways.org/2006/rounded-corner-boxes-the-css3-way:

.box {
border-radius: 1.6em;
}

Gradients are easy [author’s note — looks like you need webkit, there’s examples at http://gradients.glrzad.com/]

To test CSS3 stuff, use www.css3please.com – “This element will receive inline changes as you edit the CSS rules on the left.”

[Author’s note — while searching I found http://www.webappers.com/2009/08/10/70-must-have-css3-and-html5-tutorials-and-resources/ which is definitely a “must have”.]

Canvas
A canvas is a blank slate where you can draw whatever you want, use the canvas tag and id, width and height attributes, everything else is javascript. Pretty awesome. [Author’s note — Mark had examples but I did not have time to capture them. I did find a nice tutorial at https://developer.mozilla.org/en/Canvas_tutorial.]

Multimedia
Video with no flash! YouTube has HTML5 integration. Here’s sample code of how to do movies in HTML5:

&lt;video src='movie.ogv' controls&gt;&lt;/video&gt;
&lt;video src='movie.ogv' loop&gt;&lt;/video&gt;
&lt;video src='movie.ogv' preload='none'&gt;&lt;/video&gt;  -- don't preload the movie
&lt;video src='movie.ogv' preload='auto'&gt;&lt;/video&gt;
&lt;video src='movie.ogv' autoplay&gt;&lt;/video&gt; -- if you don't have this you don't do evil autoplay....

Multimedia is in the DOM and responds to CSS effects, such as reflection:

&lt;video src='movie.ogv' loop style='webkit-box-reflect: below 1px;'&gt;&lt;/video&gt;

(this code might be wrong, the slide flipped fast)

Of course the problem — codecs. Right now, .ogv and .mp4 (h264).

Audio inline too, same problem — only .oga and .mp3:

&lt;audio src ='podcast.oga' controls&gt;&lt;/audio&gt;

Geolocation
IsGeolocationPartofHTML5.com is a real site, go to it to get the answer.
Geolocation demos — very much the same, find your location and display it. Simple but cool.

Cache manifest
Get everything you need for offline usage…

&lt;html manifest='another-sky.manifest'&gt;

CACHE MANIFEST
/avatars/zoe.png
/avatars/tamara.png
/scripts/holoband.jpg

search for “google for mobile HTML5 series” – good series of articles on using this stuff.

HTML 5 has much more
Local storage
Web workers
Web sockets (2way connections, like raw tcp/ip cxns over the web)
3D canvas (webgl)
Microdata (enhanced semantics)
Desktop notifications
Drag and Drop

Learn more:
whatwg.org/html5
diveintohtml5.org

Persistence Smoothie: Blending NoSQL and SQL – see user feedback and comments at http://joind.in/talk/view/1332.

Michael Bleigh from . @mbleigh on twitter

NoSQL is a new way to think about persistence. Most NoSQL systems are not ACID compliant (Atomicity, malady Consistency, what is ed Isolation, case Durability).

Generally, most NoSQL systems have:

  • Denormalization
  • Eventual Consistency
  • Schema-Free
  • Horizontal Scale

NoSQL tries to scale (more) simply, it is starting to go mainstream – NY Times, BBC, SourceForge, Digg, Sony, ShopWiki, Meebo, and more. But it’s not *entirely* mainstream, it’s still hard to sell due to compliance and other reasons.

NoSQL has gotten very popular, lots of blog posts about them, but they reach this hype peak and obviously it can’t do everything.

“NoSQL is a (growing) collection of tools, not a new way of life.”

What is NoSQL? Can be several things:

  • Key-Value Stores
  • Document Databases
  • Column-oriented data stores
  • Graph Databases

Key-Value Stores


memcached is a “big hash in the sky” – it is a key value store. Similarly, NoSQL key-value stores “add to that big hash in the sky” and store to disk.

Speaker’s favorite is Redis because it’s similar to memcached.

  • key-value store + datatypes (list, sets, scored sets, soon hashes will be there)
  • cache-like functions (like expiration)
  • (Mostly) in-memory

Another interesting key-value store is Riak

  • Combination of key-value store and document database
  • heavy into HTTP REST
  • You can create links between documents, and do “link walking” that you don’t normally get out of a key-value store
  • built-in Map Reduce

Map Reduce:


  • Massively parallel way to process large datasets
  • First you scour data and “map” a new set of dataM
  • Then you “reduce” the data down to a salient result — for example, map reduce function to make a tag cloud: map function makes an array with a tag name and a count of 1 for each instance of that tag, and the reduce tag goes through that array and counts them…
  • http://en.wikipedia.org/wiki/MapReduce

Other key-value stores:

Document Databases


Some say that it’s the “closest” thing to real SQL.
  • MongoDB – Document store that speaks BSON (Binary JSON, which is compact). This is the speaker’s favorite because it has a rich query syntax that makes it close to SQL. Can’t do joins, but can embed objects in other objects, so it’s a tradeoff

    • Also has GridFS that can store large files efficiently, can scale to petabytes of data
    • does have MapReduce but it’s deliberate and you run it every so often.

  • CouchDB
    • Pure JSON Document Store – can query directly with nearly pure javascript (there are auth issues) but it’s an interesting paradigm to be able to run your app almost entirely through javascript.
    • HTTP REST interface
    • MapReduce only to see items in CouchDB. Incremental MapReduce, every time you add or modify a document, it dynamically changes the functions you’ve written. You can do really powerful queries as easy as you can do simple queries. However, some things are really complex, ie, pagination is almost impossible to do.
    • Intelligent Replication – CouchDB is designed to work with offline integration. Could be used instead of SQLite as the HTML5 data store, but you need CouchDB running locally to be doing offline stuff w/CouchDB

Column-oriented store


Columns are stored together (ie, names) instead of rows. Lets you be schema-less because you don’t care about a row’s consistency, you can just add a column to a table very easily.

Graph Databases


speaker’s opinion – there aren’t enough of these.
Neo4J – can handle modeling complex relationships – “friends of friends of cousins” but it requires a license.

When should I use this stuff?





If you have:Use
Complex, slow joins for an “activity stream”Denormalize, use a key-value store.
Variable schema, vertical interactionDocument database or column store
Modeling multi-step relationships (linkedin, friends of friends, etc)Graph

Don’t look for a single tool that does every job. Use more than one if it’s appropriate, weigh the tradeoffs (ie, don’t have 7 different data stores either!)

NoSQL solves real scalability and data design issues. But financial transactions HAVE to be atomic, so don’t use NoSQL for those.

A good presentation is http://www.slideshare.net/bscofield/the-state-of-nosql.

Using SQL and NoSQL together


Why? Well, your data is already in an SQL database (most likely).

You can blend by hand, but the easy way is DataMapper:
Generic, relational ORM (adapters for many SQL dbs and many NoSQL stores)
Implements Identity Map
Module-based inclusion (instead of extending from a class, you just include into a class).

You can set up multiple data targets (default is MySQL, example sets up MongoDB too).
DataMapper is:

  • Ultimate Polyglot ORM
  • simple r’ships btween persistence engines are easy
  • jack of all, master none
  • Sometimes perpetuates false assumptions –
  • If you’re in Ruby, your legacy stuff is in ActiveRecord, so you’re going to have to rewrite your code anyway.

Speaker’s idea to be less generic and better use of features of each data store – Gloo – “Gloo glues together different ORMs by providing relationship proxies.” this software is ALPHA ALPHA ALPHA.

The goal is to be able to define relationships on the terms of any ORM from any class, ORM or not
Right now – partially working activeRecord relationships
Is he doing it wrong? Is it a crazy/stupid idea? Maybe.

Example:





NeedUse
Assume you already have an auth systemit’s already in SQL, so leave it there.
Need users to be able to purchase items from the storefront – Can’t lose transactions, need full ACID complianceuse MySQL.
Social Graph – want to have activity streams and 1-way and 2-way relationships. Need speed, but not consistencyuse Redis
Product Listings — selling moves and books, both have different properties, products are pretty much non-relationaluse MongoDB

He wrote the example in about 3 hours, so integration of multiple data stores can be done quickly and work.

When I heard that MySQL 5.6 was implementing EXPLAIN for writes (INSERT, buy information pills no rx UPDATE, pilule DELETE,REPLACE), I was pretty excited. Then I heard that MySQL 5.6 also was implementing a JSON format for EXPLAIN and my thought was “I do not care about that!”

Boy, was I wrong. The JSON format does not just put the output into JSON format, it also gives extra information that’s actually pretty useful! It can tell you when you are doing an implicit cast, which parts of a composite index are being used, and when index condition pushdown are being used. None of these are shown in regular EXPLAIN (which seems odd, why could they extend the JSON format but not put the information into the regular EXPLAIN format?), so using the JSON format is actually a good idea even if you do not care about what format your output is in.

As a note, MySQL Workbench’s Visual Explain (go to Query->Visual Explain Current Statement) also gives this information.

attached_condition and implicit casts

In a talk about EXPLAIN I do, I use the Sakila sample database. Here is an example of a “bad” query:

mysql> EXPLAIN SELECT rental_id FROM rental WHERE DATE(rental_date) = '2006-02-14'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: index
possible_keys: NULL
key: rental_date
key_len: 10
ref: NULL
rows: 16005
Extra: Using where; Using index
1 row in set (0.00 sec)

This query is “bad” because it is doing a full index scan (type: index) instead of doing a range scan for just the range of dates we want (should be type: range). Ironically, the EXPLAIN does not actually explain why.

However, the JSON format does explain why:
mysql> EXPLAIN FORMAT=JSON SELECT rental_id FROM rental WHERE DATE(rental_date) = '2006-02-14'G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"table": {
"table_name": "rental",
"access_type": "index",
"key": "rental_date",
"used_key_parts": [
"rental_date",
"inventory_id",
"customer_id"
],
"key_length": "10",
"rows": 16005,
"filtered": 100,
"using_index": true,
"attached_condition": "(cast(`sakila`.`rental`.`rental_date` as date) = '2006-02-14')"
}
}
}

Note that the attached_condition shows the implicit cast. This is MUCH more friendly to a developer or administrator who is trying to figure out why MySQL is not doing what they want it to do!

The visual EXPLAIN from MySQL Workbench also shows the implicit cast:

Workbench showing attached_condition and implicit CAST

You may also notice it shows the “filtered” attribute, which is not in regular EXPLAIN but is part of EXPLAIN EXTENDED – “filtered” is the percentage of “rows” that are estimated to be returned. A higher number here is better, if it is low it means that you are examining a lot of rows that you do not return.

used_key_parts

You may have noticed above that there is a used_key_parts array that does not show up in the traditional EXPLAIN. In a traditional EXPLAIN (or EXPLAIN EXTENDED), you do get to see the index length with the key_len field, so you can guess that only part of a composite index is used. Both the previous query and the following query use this index:

UNIQUE KEY `rental_date` (`rental_date`,`inventory_id`,`customer_id`)

Here is the traditional EXPLAIN – note that it shows the rental_date index is used, and the key_len is 5, which infers that only the first field fo the index, rental_date is being used, not the other 2 id fields. But you have to deduce that for yourself:

mysql> EXPLAIN EXTENDED SELECT rental_id FROM rental WHERE rental_date BETWEEN '2006-02-14 00:00:00' and '2006-02-14 23:59:59'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: range
possible_keys: rental_date
key: rental_date
key_len: 5
ref: NULL
rows: 181
filtered: 100.00
Extra: Using where; Using index

Here is the JSON format, which shows the used_key_parts field, which reveals very clearly that only the first field of the index is used:

mysql> EXPLAIN FORMAT=JSON SELECT rental_id FROM rental WHERE rental_date BETWEEN '2006-02-14 00:00:00' and '2006-02-14 23:59:59'G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"table": {
"table_name": "rental",
"access_type": "range",
"possible_keys": [
"rental_date"
],
"key": "rental_date",
"used_key_parts": [
"rental_date"
],
"key_length": "5",
"rows": 181,
"filtered": 100,
"using_index": true,
"attached_condition": "(`sakila`.`rental`.`rental_date` between '2006-02-14 00:00:00' and '2006-02-14 23:59:59')"
}
}
}

And here is the MySQL Workbench Visual EXPLAIN that shows the used_key_parts clearly:

Workbench showing used_key_parts and BETWEEN

I am glad I took a second look at EXPLAIN FORMAT=JSON – the new features are awesome! My only complaint is that I think they should be added to either EXPLAIN or EXPLAIN EXTENDED. I also hope that tools like pt-query-digest will be updated to use the extra information.

Baron makes an excellent point in rate is [often] the important thing to look at.

This is something that, order takes this into account, this and the default configuration includes looking at both ratios and rates.

If I told you that your database had a ratio of temporary tables written to disk of 20%, visit web you might think “aha, my database is slow because of a lot of file I/O caused by writing temporary tables to disk!”. However, that 20% ratio may actually mean a rate of 2 per hour — which is most likely not causing excessive I/O.

To get a sense of this concept, and also how mysqltuner works, I will show the lines from the mysqltuner default configuration that deal with temporary tables written to disk. The format is that the fields are separated by three pipes (|||), and the fields are:

label
threshold check
formula
recommendation if “threshold check” is met

Here is the line from the default configuration file that calculates the rate of temporary tables written to disk:

% temp disk tables|||>25|||Created_tmp_disk_tables / (Created_tmp_tables + Created_tmp_disk_tables) * 100|||Too many temporary tables are being written to disk.  Increase max_heap_table_size and tmp_table_size.

mysqltuner will parse that as:

if
the value of Created_tmp_disk_tables/(Created_tmp_tables + Created_tmp_disk_tables)*100
>25
then print out the last field.

So that means that a ratio of 25% or more is the threshold. But we found that many clients have a ratio much less than 25%, but still had excessive temporary tables written to disk. So the default configuration also contains a rate calculation of temporary tables written to disk:

temp disk rate|||=~ /second|minute/|||&hr_bytime(Created_tmp_disk_tables/Uptime)|||Too many temporary tables are being written to disk.  Increase max_heap_table_size and tmp_table_size.

mysqltuner will parse that as:

if
the value of &hr_bytime(Created_tmp_disk_tables/Uptime)
matches “second” or “minute”
then print out the last field.

The hr_bytime() function in mysqltuner takes a number that is a per-second rate and makes it “human readable” (hence “hr”) by returning the order of magnitude at which the value is >1. For example:

hr_bytime(2) returns “2.0 per second”
hr_bytime(0.2) returns “12.0 per minute”
hr_bytime(0.02) returns “1.2 per minute”
hr_bytime(0.002) returns “7.2 per hour”
hr_bytime(0.0002) returns “17.28 per day”

Certainly, 0.02 looks small, but “12 per minute” is a better metric for a DBA to understand the problem.

Because the configuration file for mysqltuner 2.0 contains the threshold and check, it is fairly simple to change what the threshold is, and to check both rates and ratios. mysqltuner also allows you to output in different formats (currently there’s “pretty” and “csv”, but it’s easy to add a perl subroutine to do something different with the output), which makes it ideal for doing regular tuning checks for what is most important for you.

Pythian uses it on one client to provide weekly reports, which we add to a spreadsheet so that differences are easy to see. (yes, output directly to a database is on the “features we want to add” — mysqltuner is just a perl script, so if anyone in the community wants to add it, they can create a branch and request the feature to be added into the main trunk…it is all on launchpad, at https://launchpad.net/mysqltuner, so community contributions are recommended and encouraged.)

If you do not know what International Women’s Day is: http://www.internationalwomensday.com/

Start planning your blog posts for Ada Lovelace day now (March 24th, infertility podcasting, more about comic drawing etc.!) to draw attention to the achievements of women in technology and science.)

To that end, I would like to point out all the women currently in science and tech fields that I admire and think are doing great things. I think it would be great if everyone, male or female, made a list like this:

The women that have taught me science/tech along the way:

High School:
Mary Lou Ciavarra (Physics)
Maria Petretti (Pre-Algebra, and Academic Decathlon)
Reneé Fishman (Biology)
Lisa Acquaire (Economics during Academic Decathlon)

College:
Professor Kalpana White (Biology), and in whose fruit fly lab I worked for 2 semesters.
Professor Eve Marder (Introductory Neuroscience)

Though Brandeis does have female faculty in the Computer Science department, I did not manage to have any classes with female Computer Science faculty members.

My current female DBA co-workers at Pythian: Isabel Pinarci (Oracle), Michelle Gutzait (SQL Server), Catherine Chow (Oracle) and Jasmine Wen (Oracle).

And to folks in the greater MySQL/tech community and tech co-workers past and present, especially those I have been inspired and helped by: Tracy Gangwer, Selena Deckelmann (Postgres), Amy Rich, Anne Cross, and more (If I have forgotten you, I apologize!).

Most of this stuff is not PHP specific, weight loss and Python or Ruby or Java or .NET developers can use the tools in this talk.

The session on .

Slides are at http://talks.php.net/show/confoo10

“My name is Rasmus, I’ve been around for a long time. I’ve been doing this web stuff since 1992/1993.”

“Generally performance is not a PHP problem.” Webservers not config’d, no expire headers on images, no favicon.

Tools: Firefox/Firebug extension called YSlow (developed by yahoo) gives you a grade on your site.

Google has developed the Firefox/Firebug pagespeed tool.

Today Rasmus will pick on wordpress. He checks out the code, then uses Siege to do a baseline benchmark — see the slide for the results.

Before you do anything else install an opcode cache like APC. WordPress really likes this type of caching, see this slide for the results. Set the timezone, to make sure conversions aren’t being done all the time.

Make sure you are cpu-bound, NOT I/O bound. Otherwise, speed up the I/O.

Then strace your webserver processs. There are common config issues that you can spot in your strace code. grep for ENOENT which shows you “No such file or directory” errors.

AllowOverride None to turn off .htaccess for every directory, just read settings once from your config file….(unless you’re an ISP).

Make sure DirectoryIndex is set appropriately, watch your include_path. All this low-hanging fruit has examples on the common config issues slide.

Install pecl/inclued and generate a graph – here is the graph image (I have linked it because you really want to zoom in to the graph…)

In strace output check the open() calls. Conditional includes, function calls that include files, etc. need runtime context before knowing what to open. In the example, every request checks to see if we have the config file, once we have config’d we can get rid of that stuff. Get rid of all the conditionals and hard-code “include wp-config.php”. Examples are on the slide.

His tips to change:
Conditional config include in wp-load.php (as just mentioned)
Conditional did-header check in wp-blog-header.php
Don’t call require_wp_db() from wp-settings.php
Remove conditional require logic from wp_start_object_cache

Then check strace again, now all Rasmus sees is theming and translations, which he decided to keep, because that’s the good benefit of WordPress – Performance is all about costs vs. flexibility. You don’t want to get rid of all of your flexibility, but you want to be fast.

Set error_reporting(-1) in wp-settings.php to catch all warnings — warnings slow you down, so get rid of all errors. PHP error handling is very slow, so getting rid of errors will make you faster.

The slide of warnings that wordpress throws.

Look at all C-level calls made, using callgrind, which sits under valgrind, a CPU emulator used for debugging. See the image of what callgrind shows.

Now dive into the PHP executor, by installing XDebug.

Check xhprofFacebook open sourced this about a year ago, it’s a PECL extension. The output is pretty cool, try it on your own site, Rasmus does show you how to use it. It shows you functions sorted by the most expensive to the least expensive.

For example, use $_SERVER[REQUEST_TIME] instead of time(). Use pconnect() if MySQL can handle the amount of webserver connections that will be persistent, etc.

After you have changed a lot of the stuff above, benchmark again with siege to see how much faster you are. In this case there is not much gained so far.

So keep going….the blogroll is very slow — Rasmus gets rid of it by commenting out in the sidebar.php file. I’d like to see something to make it “semi-dynamic” — that is, make it a static file that can be re-generated, since you might want the blogroll but links are not changed every second…..

At this point we’re out of low-hanging fruit.

HipHop is a PHP to C++ converter & compiler, including a threaded, event-driven server that replaces apache. Rasmus’ slide says “WordPress is well-suited for HipHop because it doesn’t have a lot of dynamic runtime code. This is using the standard WordPress-svn checkout with a few tweaks.”

Then, of course, benchmark again.

The first time you compile WordPress with HipHop, you give it a list of files to add to the binary, it will complain about php code that generate file names, so you do have to fix that kind of stuff. There’s a huge mess of errors the first time you run it (”pages and pages”), and Rasmus had to patch HipHop (and WordPress) but the changes in HipHop have been put back into HipHop, so you should be good for the most part.

Check out the errors, lots of them show logical errors like $foo.”bar” instead of $foo.=”bar” and $foo=”bar” instead of $foo==”bar” in an if statement. Which of course is nice for your own code, to find those logical errors.

(WordPress takes in a $user_ID argument and immediately initializes a global $user_ID variable, which overwrites the argument passed in, so you can change the name of the argument passed in….)

You can also get rid of some code, things that check for existence of the same thing more than once. So it will take a bit of tweaking, but it’s worth it.

There are limitations to HipHop, for example:

  • It doesn’t support any of the new PHP 5.3 language features
  • Private properties don’t really exist under HipHop. They are treated as if they are protected instead.
  • You can’t unset variables. unset will clear the variable, but it will still be in the symbol table.
  • eval and create_function are limited
  • Variable variables $$var are not supported
  • Dynamic defines won’t work: define($name,$value)
  • get_loaded_extensions(), get_extension_funcs(), phpinfo(), debug_backtrace() don’t work
  • Conditional and dynamically created include filenames don’t work as you might expect
  • Default unix-domain socket filename isn’t set for MySQL so connecting to localhost doesn’t work

and HipHop does not support all extensions — see the list Rasmus has of extensions HipHop supports.

Then Rasmus showed an example using Twit (which he wrote) including the benchmarks. He shows that you can see what’s going on, like 5 MySQL calls on the home page and what happens when you don’t have a favicon.ico (in yellow).

In summary, “performance is all about architecture”, “know your costs”.

Be careful, because some tools (like valgrind and xdebug) you don’t want to put it on production systems, you could capture production traffic and replay it on a dev/testing box, but “you just have to minimize the differences and do your best”.

subtitle: Monetizing Social Media

Why is social media and social networking essential to you and your business? (because it will drive sales, apoplexy but there’s very few analytics for ROI on social networking and social media)

Relying on advertising is no longer working for print newspapers and television. So why do we think it will work on internet media?

Blogging — you must post 2-4 quality blog posts every week to maintain readership. This takes a lot of work! Content is king.

No matter how cool the technology/product/service is, people still buy more often and more easily from people they know and trust.

Social media is a way to show people that you are an industry expert, and that is how you should use them (not to spam and only say “buy my product”).

If you do not love your job and try to sell it (say, on social networking), you are going to fail, because you are not passionate about it.

Start small, do not promise a lot, it is better to have more to say than to have dead air time (radio analogy).

Social media is all about building good relationships by having good content that people trust.

Lots of people spend a lot of money on their website, but the website is just a vector to show people your content, and the content is the most important thing.

Cross-pollination – I think he means forward on information you learn (like, say, liveblogging!)

Get expert guest bloggers — he did not explain that you can leverage the relationships you form by asking them to blog. We do this with the Log Buffers….

How to make money:
sponsorships
white paper composition
paid articles
consulting
adjunct tie-ins to other related venues
branded blogs

I am personally disappointed because I wanted to learn more, and I feel as though Pythian already uses the knowledge presented — we have great exposure through our blog, and have started really using Twitter, Facebook and other social media sites, etc.

Hansen’s information was good, and absolutely 100% correct, but I felt that for me it was very basic. I would like to know some more advanced topics, like:
– How do you know when you have reached the tipping point?
– How do you convert anonymous readers/followers to people you know, without turning them away because they feel they’re being watched, spammed or don’t want to give out their info to you?
– When does copy/paste to send out your information start to bother people, how do you know how not to do too much?
– How do you convert readers/followers (anon or not) to paid customers without making them feel like you’re all about $$, what about if you have some free content and some paid content, how do you know how much to have?

What is confoo? It is the sequel to the PHP Quebéc Conference (2003 – 2009). This year PHP Quebec decided to team up with Montreal-Python, sickness W3Quebéc and OWASP Montréal to produce confoo.

And now, on to Mark Pilgrim of Google speaking on HTML5.

Timeline
1991 – HTML 1
1994 – HTML 2
1995 – Netscape discovers web, ruins it
1996 – CSS1 + JavaScript
1996 – Microsoft discovers web, ruins it
1997 – HTML4 + EMCAScript1
1998 – CSS2 + EMCAScript2 + DOM1
2000 – XHTML1 + EMCAScript3 + DOM2
2001 – XHTML 1.1
[long break!]
2009 – HTML 5 + ECMA5 + CSS 2.1

HTML5 is not a spec, it’s a marketing term. It’s really HTML5 + CSS3 + JavaScript.

IsHTML5ReadyYet.com and IsHTML5Ready.com are both real websites that give different answers to the question “is HTML 5 ready?”

Semantics
HTML started as a semantic language (until Netscape came along).

New elements (html tags) that do not do anything – they are for semantic use only:

&lt;header&gt; &lt;footer&gt;
&lt;section&gt;
&lt;article&gt;
&lt;nav&gt;
&lt;aside&gt; (pull quotes and such)
&lt;time&gt; (datetime markup)
&lt;mark&gt; (marking up runs of text)
&lt;figure&gt; &lt;figcaption&gt;

Instead of “div class=_____” use these tags….for example:

&lt;body&gt;
&lt;header&gt;
&lt;hgroup&gt;
&lt;h2&gt;page title&lt;/h2&gt;
&lt;h3&gt;page subtitle&lt;/h3&gt;
&lt;/hgroup&gt;
&lt;/header&gt;

&lt;nav&gt;
&lt;ul&gt; Navigation......
.....
&lt;/ul&gt;
&lt;/nav&gt;

&lt;section&gt;
&lt;article&gt;
&lt;header&gt;
&lt;h2&gt;Title&lt;/h2&gt;
&lt;/header&gt;
&lt;/section&gt;

Caveat: This doesn’t work in IE but there is a workaround…..

This can help blind people navigate better….and bots too!

“Google is just another blind web user with 7 million friends”

Forms
Web forms 2.0
To make a slider from 0-50:

&lt;input type='range' mix='0' max='50' value='0'&gt;&lt;/input&gt;

To use autofocus:

&lt;input autofocus&gt;

(works in 3 browsers)

Talking about blind users again: “Focus tracking is VERY important if you can’t see. You really need to know where on the page you are, if you start typing what will happen.”

Placeholder text — in a text box, that light text that goes away when you click:

&lt;input type='text' placeholder='click here and this will disappear'&gt;

(works in 2 browsers)

New input types
These are semantic types, do different things in different browsers

&lt;input type='email'&gt; (on the iphone you get a different keyboard, by default you just get a textfield, so these things degrade gracefully if the browser does not support the feature)
&lt;input type='url'&gt; (a browser like &lt;A HREF=&quot;http://www.opera.com&quot;&gt;Opera&lt;/A&gt; can validate a URL for you instead of you doing it yourself!)
&lt;input type='datetime'&gt; (and more...date pickers are tedious)
&lt;input type='file' multiple&gt; (multiple files without using flash!)

For all the inputs HTML5 supports and which browsers support them (Opera is leading the way) search for “HTML5 input support”

Accessibility
ARIA = “accessible rich internet applications”. Alt-text is technology that’s long behind. ARIA does stuff like making tree views accessible. For example, right now with a tree view you have to tab through each item, which is a pain. With code like this:

&lt;ul id='tree1' role='tree' tabindex='0' aria-labelledby='label_1'&gt;
&lt;li role='treeitem' tabindex='-1' aria-expanded='true'&gt;Fruits &lt;/li&gt;
&lt;li role='group'&gt;
&lt;ul&gt;
&lt;li role='treeitem' tabindex='-1'&gt;Oranges&lt;/li&gt;
&lt;li role='treeitem' tabindex='-1'&gt;Pineapples&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

….keyboard users can tab to the treeview itself, then use arrow keys to navigate and spacebar to select. This makes selecting an item at the end of a tree view much easier, and also makes it easy to move beyond the tree view without having to press Tab a million times.

Use your favorite search engine for “ARIA accessibility” to learn more.

CSS
Mark threw this image up on the screen:


(image from http://www.zazzle.com/stevenfrank – on that site you can buy this coffee mug or a T-shirt with the design)

Web fonts finally work in CSS3 – you can use more than Times, Courier, Arial, and occasionally Helvetica. This works EVERYWHERE – Chrome, IE, Firefox, Opera, Safari, etc. Well, it’s true that they all use it, but they all have different fonts they support. Read Bulletproof font face for tips on how to get the font you want no matter what browser is used (yes, even IE).

Opacity is easy [author’s note – it’s just the “opacity” element, see examples at http://www.css3.info/preview/opacity/].

Rounded corners are EASY – Mark’s slide passed too fast for me, so I grabbed an example from http://24ways.org/2006/rounded-corner-boxes-the-css3-way:

.box {
border-radius: 1.6em;
}

Gradients are easy [author’s note — looks like you need webkit, there’s examples at http://gradients.glrzad.com/]

To test CSS3 stuff, use www.css3please.com – “This element will receive inline changes as you edit the CSS rules on the left.”

[Author’s note — while searching I found http://www.webappers.com/2009/08/10/70-must-have-css3-and-html5-tutorials-and-resources/ which is definitely a “must have”.]

Canvas
A canvas is a blank slate where you can draw whatever you want, use the canvas tag and id, width and height attributes, everything else is javascript. Pretty awesome. [Author’s note — Mark had examples but I did not have time to capture them. I did find a nice tutorial at https://developer.mozilla.org/en/Canvas_tutorial.]

Multimedia
Video with no flash! YouTube has HTML5 integration. Here’s sample code of how to do movies in HTML5:

&lt;video src='movie.ogv' controls&gt;&lt;/video&gt;
&lt;video src='movie.ogv' loop&gt;&lt;/video&gt;
&lt;video src='movie.ogv' preload='none'&gt;&lt;/video&gt;  -- don't preload the movie
&lt;video src='movie.ogv' preload='auto'&gt;&lt;/video&gt;
&lt;video src='movie.ogv' autoplay&gt;&lt;/video&gt; -- if you don't have this you don't do evil autoplay....

Multimedia is in the DOM and responds to CSS effects, such as reflection:

&lt;video src='movie.ogv' loop style='webkit-box-reflect: below 1px;'&gt;&lt;/video&gt;

(this code might be wrong, the slide flipped fast)

Of course the problem — codecs. Right now, .ogv and .mp4 (h264).

Audio inline too, same problem — only .oga and .mp3:

&lt;audio src ='podcast.oga' controls&gt;&lt;/audio&gt;

Geolocation
IsGeolocationPartofHTML5.com is a real site, go to it to get the answer.
Geolocation demos — very much the same, find your location and display it. Simple but cool.

Cache manifest
Get everything you need for offline usage…

&lt;html manifest='another-sky.manifest'&gt;

CACHE MANIFEST
/avatars/zoe.png
/avatars/tamara.png
/scripts/holoband.jpg

search for “google for mobile HTML5 series” – good series of articles on using this stuff.

HTML 5 has much more
Local storage
Web workers
Web sockets (2way connections, like raw tcp/ip cxns over the web)
3D canvas (webgl)
Microdata (enhanced semantics)
Desktop notifications
Drag and Drop

Learn more:
whatwg.org/html5
diveintohtml5.org

Persistence Smoothie: Blending NoSQL and SQL – see user feedback and comments at http://joind.in/talk/view/1332.

Michael Bleigh from . @mbleigh on twitter

NoSQL is a new way to think about persistence. Most NoSQL systems are not ACID compliant (Atomicity, malady Consistency, what is ed Isolation, case Durability).

Generally, most NoSQL systems have:

  • Denormalization
  • Eventual Consistency
  • Schema-Free
  • Horizontal Scale

NoSQL tries to scale (more) simply, it is starting to go mainstream – NY Times, BBC, SourceForge, Digg, Sony, ShopWiki, Meebo, and more. But it’s not *entirely* mainstream, it’s still hard to sell due to compliance and other reasons.

NoSQL has gotten very popular, lots of blog posts about them, but they reach this hype peak and obviously it can’t do everything.

“NoSQL is a (growing) collection of tools, not a new way of life.”

What is NoSQL? Can be several things:

  • Key-Value Stores
  • Document Databases
  • Column-oriented data stores
  • Graph Databases

Key-Value Stores


memcached is a “big hash in the sky” – it is a key value store. Similarly, NoSQL key-value stores “add to that big hash in the sky” and store to disk.

Speaker’s favorite is Redis because it’s similar to memcached.

  • key-value store + datatypes (list, sets, scored sets, soon hashes will be there)
  • cache-like functions (like expiration)
  • (Mostly) in-memory

Another interesting key-value store is Riak

  • Combination of key-value store and document database
  • heavy into HTTP REST
  • You can create links between documents, and do “link walking” that you don’t normally get out of a key-value store
  • built-in Map Reduce

Map Reduce:


  • Massively parallel way to process large datasets
  • First you scour data and “map” a new set of dataM
  • Then you “reduce” the data down to a salient result — for example, map reduce function to make a tag cloud: map function makes an array with a tag name and a count of 1 for each instance of that tag, and the reduce tag goes through that array and counts them…
  • http://en.wikipedia.org/wiki/MapReduce

Other key-value stores:

Document Databases


Some say that it’s the “closest” thing to real SQL.
  • MongoDB – Document store that speaks BSON (Binary JSON, which is compact). This is the speaker’s favorite because it has a rich query syntax that makes it close to SQL. Can’t do joins, but can embed objects in other objects, so it’s a tradeoff

    • Also has GridFS that can store large files efficiently, can scale to petabytes of data
    • does have MapReduce but it’s deliberate and you run it every so often.

  • CouchDB
    • Pure JSON Document Store – can query directly with nearly pure javascript (there are auth issues) but it’s an interesting paradigm to be able to run your app almost entirely through javascript.
    • HTTP REST interface
    • MapReduce only to see items in CouchDB. Incremental MapReduce, every time you add or modify a document, it dynamically changes the functions you’ve written. You can do really powerful queries as easy as you can do simple queries. However, some things are really complex, ie, pagination is almost impossible to do.
    • Intelligent Replication – CouchDB is designed to work with offline integration. Could be used instead of SQLite as the HTML5 data store, but you need CouchDB running locally to be doing offline stuff w/CouchDB

Column-oriented store


Columns are stored together (ie, names) instead of rows. Lets you be schema-less because you don’t care about a row’s consistency, you can just add a column to a table very easily.

Graph Databases


speaker’s opinion – there aren’t enough of these.
Neo4J – can handle modeling complex relationships – “friends of friends of cousins” but it requires a license.

When should I use this stuff?





If you have:Use
Complex, slow joins for an “activity stream”Denormalize, use a key-value store.
Variable schema, vertical interactionDocument database or column store
Modeling multi-step relationships (linkedin, friends of friends, etc)Graph

Don’t look for a single tool that does every job. Use more than one if it’s appropriate, weigh the tradeoffs (ie, don’t have 7 different data stores either!)

NoSQL solves real scalability and data design issues. But financial transactions HAVE to be atomic, so don’t use NoSQL for those.

A good presentation is http://www.slideshare.net/bscofield/the-state-of-nosql.

Using SQL and NoSQL together


Why? Well, your data is already in an SQL database (most likely).

You can blend by hand, but the easy way is DataMapper:
Generic, relational ORM (adapters for many SQL dbs and many NoSQL stores)
Implements Identity Map
Module-based inclusion (instead of extending from a class, you just include into a class).

You can set up multiple data targets (default is MySQL, example sets up MongoDB too).
DataMapper is:

  • Ultimate Polyglot ORM
  • simple r’ships btween persistence engines are easy
  • jack of all, master none
  • Sometimes perpetuates false assumptions –
  • If you’re in Ruby, your legacy stuff is in ActiveRecord, so you’re going to have to rewrite your code anyway.

Speaker’s idea to be less generic and better use of features of each data store – Gloo – “Gloo glues together different ORMs by providing relationship proxies.” this software is ALPHA ALPHA ALPHA.

The goal is to be able to define relationships on the terms of any ORM from any class, ORM or not
Right now – partially working activeRecord relationships
Is he doing it wrong? Is it a crazy/stupid idea? Maybe.

Example:





NeedUse
Assume you already have an auth systemit’s already in SQL, so leave it there.
Need users to be able to purchase items from the storefront – Can’t lose transactions, need full ACID complianceuse MySQL.
Social Graph – want to have activity streams and 1-way and 2-way relationships. Need speed, but not consistencyuse Redis
Product Listings — selling moves and books, both have different properties, products are pretty much non-relationaluse MongoDB

He wrote the example in about 3 hours, so integration of multiple data stores can be done quickly and work.

I am happy and proud to announce that there will be an Open Database Camp held at this year’s Northeast LinuxFest! The venue is at Harvard University in Cambridge, website like this March 16 and Sunday, March 17, 2013.

Northeast LinuxFest and Open Database Camp are both free, but there is no reciprocal membership. To register for Open Database Camp, just sign up at the Attendee List wiki page. We are also soliciting session ideas ahead of time, and attendees will choose sessions during the Saturday morning planning session, as usual for Open DB Camp.

If you are interested in sponsoring, do so directly to Northeast LinuxFest and let them know it’s for Open Database Camp!

Open Database Camp is for all open databases – whether it’s MySQL, Postgres, NoSQL, been around for years or something you’re thinking about. You can see previous session ideas at the OpenSQLCamp website.

When I heard that MySQL 5.6 was implementing EXPLAIN for writes (INSERT, buy information pills no rx UPDATE, pilule DELETE,REPLACE), I was pretty excited. Then I heard that MySQL 5.6 also was implementing a JSON format for EXPLAIN and my thought was “I do not care about that!”

Boy, was I wrong. The JSON format does not just put the output into JSON format, it also gives extra information that’s actually pretty useful! It can tell you when you are doing an implicit cast, which parts of a composite index are being used, and when index condition pushdown are being used. None of these are shown in regular EXPLAIN (which seems odd, why could they extend the JSON format but not put the information into the regular EXPLAIN format?), so using the JSON format is actually a good idea even if you do not care about what format your output is in.

As a note, MySQL Workbench’s Visual Explain (go to Query->Visual Explain Current Statement) also gives this information.

attached_condition and implicit casts

In a talk about EXPLAIN I do, I use the Sakila sample database. Here is an example of a “bad” query:

mysql> EXPLAIN SELECT rental_id FROM rental WHERE DATE(rental_date) = '2006-02-14'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: index
possible_keys: NULL
key: rental_date
key_len: 10
ref: NULL
rows: 16005
Extra: Using where; Using index
1 row in set (0.00 sec)

This query is “bad” because it is doing a full index scan (type: index) instead of doing a range scan for just the range of dates we want (should be type: range). Ironically, the EXPLAIN does not actually explain why.

However, the JSON format does explain why:
mysql> EXPLAIN FORMAT=JSON SELECT rental_id FROM rental WHERE DATE(rental_date) = '2006-02-14'G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"table": {
"table_name": "rental",
"access_type": "index",
"key": "rental_date",
"used_key_parts": [
"rental_date",
"inventory_id",
"customer_id"
],
"key_length": "10",
"rows": 16005,
"filtered": 100,
"using_index": true,
"attached_condition": "(cast(`sakila`.`rental`.`rental_date` as date) = '2006-02-14')"
}
}
}

Note that the attached_condition shows the implicit cast. This is MUCH more friendly to a developer or administrator who is trying to figure out why MySQL is not doing what they want it to do!

The visual EXPLAIN from MySQL Workbench also shows the implicit cast:

Workbench showing attached_condition and implicit CAST

You may also notice it shows the “filtered” attribute, which is not in regular EXPLAIN but is part of EXPLAIN EXTENDED – “filtered” is the percentage of “rows” that are estimated to be returned. A higher number here is better, if it is low it means that you are examining a lot of rows that you do not return.

used_key_parts

You may have noticed above that there is a used_key_parts array that does not show up in the traditional EXPLAIN. In a traditional EXPLAIN (or EXPLAIN EXTENDED), you do get to see the index length with the key_len field, so you can guess that only part of a composite index is used. Both the previous query and the following query use this index:

UNIQUE KEY `rental_date` (`rental_date`,`inventory_id`,`customer_id`)

Here is the traditional EXPLAIN – note that it shows the rental_date index is used, and the key_len is 5, which infers that only the first field fo the index, rental_date is being used, not the other 2 id fields. But you have to deduce that for yourself:

mysql> EXPLAIN EXTENDED SELECT rental_id FROM rental WHERE rental_date BETWEEN '2006-02-14 00:00:00' and '2006-02-14 23:59:59'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: range
possible_keys: rental_date
key: rental_date
key_len: 5
ref: NULL
rows: 181
filtered: 100.00
Extra: Using where; Using index

Here is the JSON format, which shows the used_key_parts field, which reveals very clearly that only the first field of the index is used:

mysql> EXPLAIN FORMAT=JSON SELECT rental_id FROM rental WHERE rental_date BETWEEN '2006-02-14 00:00:00' and '2006-02-14 23:59:59'G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"table": {
"table_name": "rental",
"access_type": "range",
"possible_keys": [
"rental_date"
],
"key": "rental_date",
"used_key_parts": [
"rental_date"
],
"key_length": "5",
"rows": 181,
"filtered": 100,
"using_index": true,
"attached_condition": "(`sakila`.`rental`.`rental_date` between '2006-02-14 00:00:00' and '2006-02-14 23:59:59')"
}
}
}

And here is the MySQL Workbench Visual EXPLAIN that shows the used_key_parts clearly:

Workbench showing used_key_parts and BETWEEN

I am glad I took a second look at EXPLAIN FORMAT=JSON – the new features are awesome! My only complaint is that I think they should be added to either EXPLAIN or EXPLAIN EXTENDED. I also hope that tools like pt-query-digest will be updated to use the extra information.

Baron makes an excellent point in rate is [often] the important thing to look at.

This is something that, order takes this into account, this and the default configuration includes looking at both ratios and rates.

If I told you that your database had a ratio of temporary tables written to disk of 20%, visit web you might think “aha, my database is slow because of a lot of file I/O caused by writing temporary tables to disk!”. However, that 20% ratio may actually mean a rate of 2 per hour — which is most likely not causing excessive I/O.

To get a sense of this concept, and also how mysqltuner works, I will show the lines from the mysqltuner default configuration that deal with temporary tables written to disk. The format is that the fields are separated by three pipes (|||), and the fields are:

label
threshold check
formula
recommendation if “threshold check” is met

Here is the line from the default configuration file that calculates the rate of temporary tables written to disk:

% temp disk tables|||>25|||Created_tmp_disk_tables / (Created_tmp_tables + Created_tmp_disk_tables) * 100|||Too many temporary tables are being written to disk.  Increase max_heap_table_size and tmp_table_size.

mysqltuner will parse that as:

if
the value of Created_tmp_disk_tables/(Created_tmp_tables + Created_tmp_disk_tables)*100
>25
then print out the last field.

So that means that a ratio of 25% or more is the threshold. But we found that many clients have a ratio much less than 25%, but still had excessive temporary tables written to disk. So the default configuration also contains a rate calculation of temporary tables written to disk:

temp disk rate|||=~ /second|minute/|||&hr_bytime(Created_tmp_disk_tables/Uptime)|||Too many temporary tables are being written to disk.  Increase max_heap_table_size and tmp_table_size.

mysqltuner will parse that as:

if
the value of &hr_bytime(Created_tmp_disk_tables/Uptime)
matches “second” or “minute”
then print out the last field.

The hr_bytime() function in mysqltuner takes a number that is a per-second rate and makes it “human readable” (hence “hr”) by returning the order of magnitude at which the value is >1. For example:

hr_bytime(2) returns “2.0 per second”
hr_bytime(0.2) returns “12.0 per minute”
hr_bytime(0.02) returns “1.2 per minute”
hr_bytime(0.002) returns “7.2 per hour”
hr_bytime(0.0002) returns “17.28 per day”

Certainly, 0.02 looks small, but “12 per minute” is a better metric for a DBA to understand the problem.

Because the configuration file for mysqltuner 2.0 contains the threshold and check, it is fairly simple to change what the threshold is, and to check both rates and ratios. mysqltuner also allows you to output in different formats (currently there’s “pretty” and “csv”, but it’s easy to add a perl subroutine to do something different with the output), which makes it ideal for doing regular tuning checks for what is most important for you.

Pythian uses it on one client to provide weekly reports, which we add to a spreadsheet so that differences are easy to see. (yes, output directly to a database is on the “features we want to add” — mysqltuner is just a perl script, so if anyone in the community wants to add it, they can create a branch and request the feature to be added into the main trunk…it is all on launchpad, at https://launchpad.net/mysqltuner, so community contributions are recommended and encouraged.)

If you do not know what International Women’s Day is: http://www.internationalwomensday.com/

Start planning your blog posts for Ada Lovelace day now (March 24th, infertility podcasting, more about comic drawing etc.!) to draw attention to the achievements of women in technology and science.)

To that end, I would like to point out all the women currently in science and tech fields that I admire and think are doing great things. I think it would be great if everyone, male or female, made a list like this:

The women that have taught me science/tech along the way:

High School:
Mary Lou Ciavarra (Physics)
Maria Petretti (Pre-Algebra, and Academic Decathlon)
Reneé Fishman (Biology)
Lisa Acquaire (Economics during Academic Decathlon)

College:
Professor Kalpana White (Biology), and in whose fruit fly lab I worked for 2 semesters.
Professor Eve Marder (Introductory Neuroscience)

Though Brandeis does have female faculty in the Computer Science department, I did not manage to have any classes with female Computer Science faculty members.

My current female DBA co-workers at Pythian: Isabel Pinarci (Oracle), Michelle Gutzait (SQL Server), Catherine Chow (Oracle) and Jasmine Wen (Oracle).

And to folks in the greater MySQL/tech community and tech co-workers past and present, especially those I have been inspired and helped by: Tracy Gangwer, Selena Deckelmann (Postgres), Amy Rich, Anne Cross, and more (If I have forgotten you, I apologize!).

Most of this stuff is not PHP specific, weight loss and Python or Ruby or Java or .NET developers can use the tools in this talk.

The session on .

Slides are at http://talks.php.net/show/confoo10

“My name is Rasmus, I’ve been around for a long time. I’ve been doing this web stuff since 1992/1993.”

“Generally performance is not a PHP problem.” Webservers not config’d, no expire headers on images, no favicon.

Tools: Firefox/Firebug extension called YSlow (developed by yahoo) gives you a grade on your site.

Google has developed the Firefox/Firebug pagespeed tool.

Today Rasmus will pick on wordpress. He checks out the code, then uses Siege to do a baseline benchmark — see the slide for the results.

Before you do anything else install an opcode cache like APC. WordPress really likes this type of caching, see this slide for the results. Set the timezone, to make sure conversions aren’t being done all the time.

Make sure you are cpu-bound, NOT I/O bound. Otherwise, speed up the I/O.

Then strace your webserver processs. There are common config issues that you can spot in your strace code. grep for ENOENT which shows you “No such file or directory” errors.

AllowOverride None to turn off .htaccess for every directory, just read settings once from your config file….(unless you’re an ISP).

Make sure DirectoryIndex is set appropriately, watch your include_path. All this low-hanging fruit has examples on the common config issues slide.

Install pecl/inclued and generate a graph – here is the graph image (I have linked it because you really want to zoom in to the graph…)

In strace output check the open() calls. Conditional includes, function calls that include files, etc. need runtime context before knowing what to open. In the example, every request checks to see if we have the config file, once we have config’d we can get rid of that stuff. Get rid of all the conditionals and hard-code “include wp-config.php”. Examples are on the slide.

His tips to change:
Conditional config include in wp-load.php (as just mentioned)
Conditional did-header check in wp-blog-header.php
Don’t call require_wp_db() from wp-settings.php
Remove conditional require logic from wp_start_object_cache

Then check strace again, now all Rasmus sees is theming and translations, which he decided to keep, because that’s the good benefit of WordPress – Performance is all about costs vs. flexibility. You don’t want to get rid of all of your flexibility, but you want to be fast.

Set error_reporting(-1) in wp-settings.php to catch all warnings — warnings slow you down, so get rid of all errors. PHP error handling is very slow, so getting rid of errors will make you faster.

The slide of warnings that wordpress throws.

Look at all C-level calls made, using callgrind, which sits under valgrind, a CPU emulator used for debugging. See the image of what callgrind shows.

Now dive into the PHP executor, by installing XDebug.

Check xhprofFacebook open sourced this about a year ago, it’s a PECL extension. The output is pretty cool, try it on your own site, Rasmus does show you how to use it. It shows you functions sorted by the most expensive to the least expensive.

For example, use $_SERVER[REQUEST_TIME] instead of time(). Use pconnect() if MySQL can handle the amount of webserver connections that will be persistent, etc.

After you have changed a lot of the stuff above, benchmark again with siege to see how much faster you are. In this case there is not much gained so far.

So keep going….the blogroll is very slow — Rasmus gets rid of it by commenting out in the sidebar.php file. I’d like to see something to make it “semi-dynamic” — that is, make it a static file that can be re-generated, since you might want the blogroll but links are not changed every second…..

At this point we’re out of low-hanging fruit.

HipHop is a PHP to C++ converter & compiler, including a threaded, event-driven server that replaces apache. Rasmus’ slide says “WordPress is well-suited for HipHop because it doesn’t have a lot of dynamic runtime code. This is using the standard WordPress-svn checkout with a few tweaks.”

Then, of course, benchmark again.

The first time you compile WordPress with HipHop, you give it a list of files to add to the binary, it will complain about php code that generate file names, so you do have to fix that kind of stuff. There’s a huge mess of errors the first time you run it (”pages and pages”), and Rasmus had to patch HipHop (and WordPress) but the changes in HipHop have been put back into HipHop, so you should be good for the most part.

Check out the errors, lots of them show logical errors like $foo.”bar” instead of $foo.=”bar” and $foo=”bar” instead of $foo==”bar” in an if statement. Which of course is nice for your own code, to find those logical errors.

(WordPress takes in a $user_ID argument and immediately initializes a global $user_ID variable, which overwrites the argument passed in, so you can change the name of the argument passed in….)

You can also get rid of some code, things that check for existence of the same thing more than once. So it will take a bit of tweaking, but it’s worth it.

There are limitations to HipHop, for example:

  • It doesn’t support any of the new PHP 5.3 language features
  • Private properties don’t really exist under HipHop. They are treated as if they are protected instead.
  • You can’t unset variables. unset will clear the variable, but it will still be in the symbol table.
  • eval and create_function are limited
  • Variable variables $$var are not supported
  • Dynamic defines won’t work: define($name,$value)
  • get_loaded_extensions(), get_extension_funcs(), phpinfo(), debug_backtrace() don’t work
  • Conditional and dynamically created include filenames don’t work as you might expect
  • Default unix-domain socket filename isn’t set for MySQL so connecting to localhost doesn’t work

and HipHop does not support all extensions — see the list Rasmus has of extensions HipHop supports.

Then Rasmus showed an example using Twit (which he wrote) including the benchmarks. He shows that you can see what’s going on, like 5 MySQL calls on the home page and what happens when you don’t have a favicon.ico (in yellow).

In summary, “performance is all about architecture”, “know your costs”.

Be careful, because some tools (like valgrind and xdebug) you don’t want to put it on production systems, you could capture production traffic and replay it on a dev/testing box, but “you just have to minimize the differences and do your best”.

subtitle: Monetizing Social Media

Why is social media and social networking essential to you and your business? (because it will drive sales, apoplexy but there’s very few analytics for ROI on social networking and social media)

Relying on advertising is no longer working for print newspapers and television. So why do we think it will work on internet media?

Blogging — you must post 2-4 quality blog posts every week to maintain readership. This takes a lot of work! Content is king.

No matter how cool the technology/product/service is, people still buy more often and more easily from people they know and trust.

Social media is a way to show people that you are an industry expert, and that is how you should use them (not to spam and only say “buy my product”).

If you do not love your job and try to sell it (say, on social networking), you are going to fail, because you are not passionate about it.

Start small, do not promise a lot, it is better to have more to say than to have dead air time (radio analogy).

Social media is all about building good relationships by having good content that people trust.

Lots of people spend a lot of money on their website, but the website is just a vector to show people your content, and the content is the most important thing.

Cross-pollination – I think he means forward on information you learn (like, say, liveblogging!)

Get expert guest bloggers — he did not explain that you can leverage the relationships you form by asking them to blog. We do this with the Log Buffers….

How to make money:
sponsorships
white paper composition
paid articles
consulting
adjunct tie-ins to other related venues
branded blogs

I am personally disappointed because I wanted to learn more, and I feel as though Pythian already uses the knowledge presented — we have great exposure through our blog, and have started really using Twitter, Facebook and other social media sites, etc.

Hansen’s information was good, and absolutely 100% correct, but I felt that for me it was very basic. I would like to know some more advanced topics, like:
– How do you know when you have reached the tipping point?
– How do you convert anonymous readers/followers to people you know, without turning them away because they feel they’re being watched, spammed or don’t want to give out their info to you?
– When does copy/paste to send out your information start to bother people, how do you know how not to do too much?
– How do you convert readers/followers (anon or not) to paid customers without making them feel like you’re all about $$, what about if you have some free content and some paid content, how do you know how much to have?

What is confoo? It is the sequel to the PHP Quebéc Conference (2003 – 2009). This year PHP Quebec decided to team up with Montreal-Python, sickness W3Quebéc and OWASP Montréal to produce confoo.

And now, on to Mark Pilgrim of Google speaking on HTML5.

Timeline
1991 – HTML 1
1994 – HTML 2
1995 – Netscape discovers web, ruins it
1996 – CSS1 + JavaScript
1996 – Microsoft discovers web, ruins it
1997 – HTML4 + EMCAScript1
1998 – CSS2 + EMCAScript2 + DOM1
2000 – XHTML1 + EMCAScript3 + DOM2
2001 – XHTML 1.1
[long break!]
2009 – HTML 5 + ECMA5 + CSS 2.1

HTML5 is not a spec, it’s a marketing term. It’s really HTML5 + CSS3 + JavaScript.

IsHTML5ReadyYet.com and IsHTML5Ready.com are both real websites that give different answers to the question “is HTML 5 ready?”

Semantics
HTML started as a semantic language (until Netscape came along).

New elements (html tags) that do not do anything – they are for semantic use only:

&lt;header&gt; &lt;footer&gt;
&lt;section&gt;
&lt;article&gt;
&lt;nav&gt;
&lt;aside&gt; (pull quotes and such)
&lt;time&gt; (datetime markup)
&lt;mark&gt; (marking up runs of text)
&lt;figure&gt; &lt;figcaption&gt;

Instead of “div class=_____” use these tags….for example:

&lt;body&gt;
&lt;header&gt;
&lt;hgroup&gt;
&lt;h2&gt;page title&lt;/h2&gt;
&lt;h3&gt;page subtitle&lt;/h3&gt;
&lt;/hgroup&gt;
&lt;/header&gt;

&lt;nav&gt;
&lt;ul&gt; Navigation......
.....
&lt;/ul&gt;
&lt;/nav&gt;

&lt;section&gt;
&lt;article&gt;
&lt;header&gt;
&lt;h2&gt;Title&lt;/h2&gt;
&lt;/header&gt;
&lt;/section&gt;

Caveat: This doesn’t work in IE but there is a workaround…..

This can help blind people navigate better….and bots too!

“Google is just another blind web user with 7 million friends”

Forms
Web forms 2.0
To make a slider from 0-50:

&lt;input type='range' mix='0' max='50' value='0'&gt;&lt;/input&gt;

To use autofocus:

&lt;input autofocus&gt;

(works in 3 browsers)

Talking about blind users again: “Focus tracking is VERY important if you can’t see. You really need to know where on the page you are, if you start typing what will happen.”

Placeholder text — in a text box, that light text that goes away when you click:

&lt;input type='text' placeholder='click here and this will disappear'&gt;

(works in 2 browsers)

New input types
These are semantic types, do different things in different browsers

&lt;input type='email'&gt; (on the iphone you get a different keyboard, by default you just get a textfield, so these things degrade gracefully if the browser does not support the feature)
&lt;input type='url'&gt; (a browser like &lt;A HREF=&quot;http://www.opera.com&quot;&gt;Opera&lt;/A&gt; can validate a URL for you instead of you doing it yourself!)
&lt;input type='datetime'&gt; (and more...date pickers are tedious)
&lt;input type='file' multiple&gt; (multiple files without using flash!)

For all the inputs HTML5 supports and which browsers support them (Opera is leading the way) search for “HTML5 input support”

Accessibility
ARIA = “accessible rich internet applications”. Alt-text is technology that’s long behind. ARIA does stuff like making tree views accessible. For example, right now with a tree view you have to tab through each item, which is a pain. With code like this:

&lt;ul id='tree1' role='tree' tabindex='0' aria-labelledby='label_1'&gt;
&lt;li role='treeitem' tabindex='-1' aria-expanded='true'&gt;Fruits &lt;/li&gt;
&lt;li role='group'&gt;
&lt;ul&gt;
&lt;li role='treeitem' tabindex='-1'&gt;Oranges&lt;/li&gt;
&lt;li role='treeitem' tabindex='-1'&gt;Pineapples&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

….keyboard users can tab to the treeview itself, then use arrow keys to navigate and spacebar to select. This makes selecting an item at the end of a tree view much easier, and also makes it easy to move beyond the tree view without having to press Tab a million times.

Use your favorite search engine for “ARIA accessibility” to learn more.

CSS
Mark threw this image up on the screen:


(image from http://www.zazzle.com/stevenfrank – on that site you can buy this coffee mug or a T-shirt with the design)

Web fonts finally work in CSS3 – you can use more than Times, Courier, Arial, and occasionally Helvetica. This works EVERYWHERE – Chrome, IE, Firefox, Opera, Safari, etc. Well, it’s true that they all use it, but they all have different fonts they support. Read Bulletproof font face for tips on how to get the font you want no matter what browser is used (yes, even IE).

Opacity is easy [author’s note – it’s just the “opacity” element, see examples at http://www.css3.info/preview/opacity/].

Rounded corners are EASY – Mark’s slide passed too fast for me, so I grabbed an example from http://24ways.org/2006/rounded-corner-boxes-the-css3-way:

.box {
border-radius: 1.6em;
}

Gradients are easy [author’s note — looks like you need webkit, there’s examples at http://gradients.glrzad.com/]

To test CSS3 stuff, use www.css3please.com – “This element will receive inline changes as you edit the CSS rules on the left.”

[Author’s note — while searching I found http://www.webappers.com/2009/08/10/70-must-have-css3-and-html5-tutorials-and-resources/ which is definitely a “must have”.]

Canvas
A canvas is a blank slate where you can draw whatever you want, use the canvas tag and id, width and height attributes, everything else is javascript. Pretty awesome. [Author’s note — Mark had examples but I did not have time to capture them. I did find a nice tutorial at https://developer.mozilla.org/en/Canvas_tutorial.]

Multimedia
Video with no flash! YouTube has HTML5 integration. Here’s sample code of how to do movies in HTML5:

&lt;video src='movie.ogv' controls&gt;&lt;/video&gt;
&lt;video src='movie.ogv' loop&gt;&lt;/video&gt;
&lt;video src='movie.ogv' preload='none'&gt;&lt;/video&gt;  -- don't preload the movie
&lt;video src='movie.ogv' preload='auto'&gt;&lt;/video&gt;
&lt;video src='movie.ogv' autoplay&gt;&lt;/video&gt; -- if you don't have this you don't do evil autoplay....

Multimedia is in the DOM and responds to CSS effects, such as reflection:

&lt;video src='movie.ogv' loop style='webkit-box-reflect: below 1px;'&gt;&lt;/video&gt;

(this code might be wrong, the slide flipped fast)

Of course the problem — codecs. Right now, .ogv and .mp4 (h264).

Audio inline too, same problem — only .oga and .mp3:

&lt;audio src ='podcast.oga' controls&gt;&lt;/audio&gt;

Geolocation
IsGeolocationPartofHTML5.com is a real site, go to it to get the answer.
Geolocation demos — very much the same, find your location and display it. Simple but cool.

Cache manifest
Get everything you need for offline usage…

&lt;html manifest='another-sky.manifest'&gt;

CACHE MANIFEST
/avatars/zoe.png
/avatars/tamara.png
/scripts/holoband.jpg

search for “google for mobile HTML5 series” – good series of articles on using this stuff.

HTML 5 has much more
Local storage
Web workers
Web sockets (2way connections, like raw tcp/ip cxns over the web)
3D canvas (webgl)
Microdata (enhanced semantics)
Desktop notifications
Drag and Drop

Learn more:
whatwg.org/html5
diveintohtml5.org

Persistence Smoothie: Blending NoSQL and SQL – see user feedback and comments at http://joind.in/talk/view/1332.

Michael Bleigh from . @mbleigh on twitter

NoSQL is a new way to think about persistence. Most NoSQL systems are not ACID compliant (Atomicity, malady Consistency, what is ed Isolation, case Durability).

Generally, most NoSQL systems have:

  • Denormalization
  • Eventual Consistency
  • Schema-Free
  • Horizontal Scale

NoSQL tries to scale (more) simply, it is starting to go mainstream – NY Times, BBC, SourceForge, Digg, Sony, ShopWiki, Meebo, and more. But it’s not *entirely* mainstream, it’s still hard to sell due to compliance and other reasons.

NoSQL has gotten very popular, lots of blog posts about them, but they reach this hype peak and obviously it can’t do everything.

“NoSQL is a (growing) collection of tools, not a new way of life.”

What is NoSQL? Can be several things:

  • Key-Value Stores
  • Document Databases
  • Column-oriented data stores
  • Graph Databases

Key-Value Stores


memcached is a “big hash in the sky” – it is a key value store. Similarly, NoSQL key-value stores “add to that big hash in the sky” and store to disk.

Speaker’s favorite is Redis because it’s similar to memcached.

  • key-value store + datatypes (list, sets, scored sets, soon hashes will be there)
  • cache-like functions (like expiration)
  • (Mostly) in-memory

Another interesting key-value store is Riak

  • Combination of key-value store and document database
  • heavy into HTTP REST
  • You can create links between documents, and do “link walking” that you don’t normally get out of a key-value store
  • built-in Map Reduce

Map Reduce:


  • Massively parallel way to process large datasets
  • First you scour data and “map” a new set of dataM
  • Then you “reduce” the data down to a salient result — for example, map reduce function to make a tag cloud: map function makes an array with a tag name and a count of 1 for each instance of that tag, and the reduce tag goes through that array and counts them…
  • http://en.wikipedia.org/wiki/MapReduce

Other key-value stores:

Document Databases


Some say that it’s the “closest” thing to real SQL.
  • MongoDB – Document store that speaks BSON (Binary JSON, which is compact). This is the speaker’s favorite because it has a rich query syntax that makes it close to SQL. Can’t do joins, but can embed objects in other objects, so it’s a tradeoff

    • Also has GridFS that can store large files efficiently, can scale to petabytes of data
    • does have MapReduce but it’s deliberate and you run it every so often.

  • CouchDB
    • Pure JSON Document Store – can query directly with nearly pure javascript (there are auth issues) but it’s an interesting paradigm to be able to run your app almost entirely through javascript.
    • HTTP REST interface
    • MapReduce only to see items in CouchDB. Incremental MapReduce, every time you add or modify a document, it dynamically changes the functions you’ve written. You can do really powerful queries as easy as you can do simple queries. However, some things are really complex, ie, pagination is almost impossible to do.
    • Intelligent Replication – CouchDB is designed to work with offline integration. Could be used instead of SQLite as the HTML5 data store, but you need CouchDB running locally to be doing offline stuff w/CouchDB

Column-oriented store


Columns are stored together (ie, names) instead of rows. Lets you be schema-less because you don’t care about a row’s consistency, you can just add a column to a table very easily.

Graph Databases


speaker’s opinion – there aren’t enough of these.
Neo4J – can handle modeling complex relationships – “friends of friends of cousins” but it requires a license.

When should I use this stuff?





If you have:Use
Complex, slow joins for an “activity stream”Denormalize, use a key-value store.
Variable schema, vertical interactionDocument database or column store
Modeling multi-step relationships (linkedin, friends of friends, etc)Graph

Don’t look for a single tool that does every job. Use more than one if it’s appropriate, weigh the tradeoffs (ie, don’t have 7 different data stores either!)

NoSQL solves real scalability and data design issues. But financial transactions HAVE to be atomic, so don’t use NoSQL for those.

A good presentation is http://www.slideshare.net/bscofield/the-state-of-nosql.

Using SQL and NoSQL together


Why? Well, your data is already in an SQL database (most likely).

You can blend by hand, but the easy way is DataMapper:
Generic, relational ORM (adapters for many SQL dbs and many NoSQL stores)
Implements Identity Map
Module-based inclusion (instead of extending from a class, you just include into a class).

You can set up multiple data targets (default is MySQL, example sets up MongoDB too).
DataMapper is:

  • Ultimate Polyglot ORM
  • simple r’ships btween persistence engines are easy
  • jack of all, master none
  • Sometimes perpetuates false assumptions –
  • If you’re in Ruby, your legacy stuff is in ActiveRecord, so you’re going to have to rewrite your code anyway.

Speaker’s idea to be less generic and better use of features of each data store – Gloo – “Gloo glues together different ORMs by providing relationship proxies.” this software is ALPHA ALPHA ALPHA.

The goal is to be able to define relationships on the terms of any ORM from any class, ORM or not
Right now – partially working activeRecord relationships
Is he doing it wrong? Is it a crazy/stupid idea? Maybe.

Example:





NeedUse
Assume you already have an auth systemit’s already in SQL, so leave it there.
Need users to be able to purchase items from the storefront – Can’t lose transactions, need full ACID complianceuse MySQL.
Social Graph – want to have activity streams and 1-way and 2-way relationships. Need speed, but not consistencyuse Redis
Product Listings — selling moves and books, both have different properties, products are pretty much non-relationaluse MongoDB

He wrote the example in about 3 hours, so integration of multiple data stores can be done quickly and work.

I am happy and proud to announce that there will be an Open Database Camp held at this year’s Northeast LinuxFest! The venue is at Harvard University in Cambridge, website like this March 16 and Sunday, March 17, 2013.

Northeast LinuxFest and Open Database Camp are both free, but there is no reciprocal membership. To register for Open Database Camp, just sign up at the Attendee List wiki page. We are also soliciting session ideas ahead of time, and attendees will choose sessions during the Saturday morning planning session, as usual for Open DB Camp.

If you are interested in sponsoring, do so directly to Northeast LinuxFest and let them know it’s for Open Database Camp!

Open Database Camp is for all open databases – whether it’s MySQL, Postgres, NoSQL, been around for years or something you’re thinking about. You can see previous session ideas at the OpenSQLCamp website.

When I heard that MySQL 5.6 was implementing EXPLAIN for writes (INSERT, neuropathologist UPDATE, physician DELETE, approved REPLACE), I was pretty excited. Then I heard that MySQL 5.6 also was implementing a JSON format for EXPLAIN and my thought was “I do not care about that!”

Boy, was I wrong. The JSON format does not just put the output into JSON format, it also gives extra information that’s actually pretty useful! It can tell you when you are doing an implicit cast, which parts of a composite index are being used, and when index condition pushdown are being used. None of these are shown in regular EXPLAIN (which seems odd, why could they extend the JSON format but not put the information into the regular EXPLAIN format?), so using the JSON format is actually a good idea even if you do not care about what format your output is in.

As a note, MySQL Workbench’s Visual Explain (go to Query->Visual Explain Current Statement) also gives this information.

attached_condition and implicit casts

In a talk about EXPLAIN I do, I use the Sakila sample database. Here is an example of a “bad” query:

mysql> EXPLAIN SELECT rental_id FROM rental WHERE DATE(rental_date) = '2006-02-14'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: index
possible_keys: NULL
key: rental_date
key_len: 10
ref: NULL
rows: 16005
Extra: Using where; Using index
1 row in set (0.00 sec)

This query is “bad” because it is doing a full index scan (type: index) instead of doing a range scan for just the range of dates we want (should be type: range). Ironically, the EXPLAIN does not actually explain why.

However, the JSON format does explain why:
mysql> EXPLAIN FORMAT=JSON SELECT rental_id FROM rental WHERE DATE(rental_date) = '2006-02-14'G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"table": {
"table_name": "rental",
"access_type": "index",
"key": "rental_date",
"used_key_parts": [
"rental_date",
"inventory_id",
"customer_id"
],
"key_length": "10",
"rows": 16005,
"filtered": 100,
"using_index": true,
"attached_condition": "(cast(`sakila`.`rental`.`rental_date` as date) = '2006-02-14')"
}
}
}

Note that the attached_condition shows the implicit cast. This is MUCH more friendly to a developer or administrator who is trying to figure out why MySQL is not doing what they want it to do!

The visual EXPLAIN from MySQL Workbench also shows the implicit cast:

Workbench showing attached_condition and implicit CAST

You may also notice it shows the “filtered” attribute, which is not in regular EXPLAIN but is part of EXPLAIN EXTENDED – “filtered” is the percentage of “rows” that are estimated to be returned. A higher number here is better, if it is low it means that you are examining a lot of rows that you do not return.

used_key_parts

You may have noticed above that there is a used_key_parts array that does not show up in the traditional EXPLAIN. In a traditional EXPLAIN (or EXPLAIN EXTENDED), you do get to see the index length with the key_len field, so you can guess that only part of a composite index is used. Both the previous query and the following query use this index:

UNIQUE KEY `rental_date` (`rental_date`,`inventory_id`,`customer_id`)

Here is the traditional EXPLAIN – note that it shows the rental_date index is used, and the key_len is 5, which infers that only the first field fo the index, rental_date is being used, not the other 2 id fields. But you have to deduce that for yourself:

mysql> EXPLAIN EXTENDED SELECT rental_id FROM rental WHERE rental_date BETWEEN '2006-02-14 00:00:00' and '2006-02-14 23:59:59'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: range
possible_keys: rental_date
key: rental_date
key_len: 5
ref: NULL
rows: 181
filtered: 100.00
Extra: Using where; Using index

Here is the JSON format, which shows the used_key_parts field, which reveals very clearly that only the first field of the index is used:

mysql> EXPLAIN FORMAT=JSON SELECT rental_id FROM rental WHERE rental_date BETWEEN '2006-02-14 00:00:00' and '2006-02-14 23:59:59'G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"table": {
"table_name": "rental",
"access_type": "range",
"possible_keys": [
"rental_date"
],
"key": "rental_date",
"used_key_parts": [
"rental_date"
],
"key_length": "5",
"rows": 181,
"filtered": 100,
"using_index": true,
"attached_condition": "(`sakila`.`rental`.`rental_date` between '2006-02-14 00:00:00' and '2006-02-14 23:59:59')"
}
}
}

And here is the MySQL Workbench Visual EXPLAIN that shows the used_key_parts clearly:

Workbench showing used_key_parts and BETWEEN

Index condition pushdown is itself a new feature in MySQL 5.6, and I will talk about it in another blog post.

I am glad I took a second look at EXPLAIN FORMAT=JSON – the new features are awesome! My only complaint is that I think they should be added to either EXPLAIN or EXPLAIN EXTENDED. I also hope that tools like pt-query-digest will be updated to use the extra information.

For the past few days, web nurse I have been upgrading a few servers. We are going from Percona’s patched MySQL 5.1 to MariaDB 5.5 (the subquery optimization is excellent, and we have lots of subqueries). Our process involves upgrading a slave first, and letting it replicate for a while, and if everything’s good, update more slaves, then the master.

This has served us well in the past. And now that we are checking data integrity between masters and slaves with pt-table-checksum, the process involves checking before we start that there is data integrity. This is easy, as we checksum twice daily and have a Nagios alert if there are any discrepancies. After the upgrade, we checksum again, to be sure no data has been changed/corrupted in the process of doing a mysqldump export and import.*

Much to my surprise, after importing the data on one of our dev servers, I found that there were a lot of discrepancies. So I picked a chunk to do some comparisons on, and found something interesting:

On Server version: 5.1.65-rel14.0-log Percona Server (GPL), 14.0, Revision 475:
mysql> select float_field from db.tbl where id=218964;
+-------------+
| float_field |
+-------------+
| 9.58084e-05 |
+-------------+
1 row in set (0.04 sec)

On Server version: 5.5.28a-MariaDB-log MariaDB Server
MariaDB [(none)]> select float_field from db.tbl where id=218964;
+--------------+
| float_field |
+--------------+
| 0.0000958084 |
+--------------+
1 row in set (0.24 sec)

Which of course causes a different checksum. I tried SELECTing the values, casting and converting them, but I could not get them to change in the database. MySQL 5.1 insists on storing in scientific notation, and MariaDB 5.5 (and MySQL 5.5, we tested it out) insists on storing without scientific notation.

Frankly, I’m surprised this has not come up before (I did lots of querying Google for MySQL 5.5 and scientific notation), since it radically changes how numbers look when they are stored and retrieved. I guess code does the right thing…except for pt-table-checksum, and I cannot really blame it.

In the end, I used the –ignore-columns option to pt-table-checksum, with the result of:

SELECT GROUP_CONCAT(DISTINCT COLUMN_NAME) FROM INFORMATION_SCHEMA.COLUMNS WHERE DATA_TYPE IN ('float','double') AND TABLE_SCHEMA NOT IN ('mysql','information_schema','performance_schema');

In this way, I can get an accurate checksum to see if anything has changed, before I mark that the upgrade is complete on this slave server.

* This is just on the first slave. After the first slave is upgraded, we use xtrabackup to copy the data to another server to upgrade it.

For the past few days, web nurse I have been upgrading a few servers. We are going from Percona’s patched MySQL 5.1 to MariaDB 5.5 (the subquery optimization is excellent, and we have lots of subqueries). Our process involves upgrading a slave first, and letting it replicate for a while, and if everything’s good, update more slaves, then the master.

This has served us well in the past. And now that we are checking data integrity between masters and slaves with pt-table-checksum, the process involves checking before we start that there is data integrity. This is easy, as we checksum twice daily and have a Nagios alert if there are any discrepancies. After the upgrade, we checksum again, to be sure no data has been changed/corrupted in the process of doing a mysqldump export and import.*

Much to my surprise, after importing the data on one of our dev servers, I found that there were a lot of discrepancies. So I picked a chunk to do some comparisons on, and found something interesting:

On Server version: 5.1.65-rel14.0-log Percona Server (GPL), 14.0, Revision 475:
mysql> select float_field from db.tbl where id=218964;
+-------------+
| float_field |
+-------------+
| 9.58084e-05 |
+-------------+
1 row in set (0.04 sec)

On Server version: 5.5.28a-MariaDB-log MariaDB Server
MariaDB [(none)]> select float_field from db.tbl where id=218964;
+--------------+
| float_field |
+--------------+
| 0.0000958084 |
+--------------+
1 row in set (0.24 sec)

Which of course causes a different checksum. I tried SELECTing the values, casting and converting them, but I could not get them to change in the database. MySQL 5.1 insists on storing in scientific notation, and MariaDB 5.5 (and MySQL 5.5, we tested it out) insists on storing without scientific notation.

Frankly, I’m surprised this has not come up before (I did lots of querying Google for MySQL 5.5 and scientific notation), since it radically changes how numbers look when they are stored and retrieved. I guess code does the right thing…except for pt-table-checksum, and I cannot really blame it.

In the end, I used the –ignore-columns option to pt-table-checksum, with the result of:

SELECT GROUP_CONCAT(DISTINCT COLUMN_NAME) FROM INFORMATION_SCHEMA.COLUMNS WHERE DATA_TYPE IN ('float','double') AND TABLE_SCHEMA NOT IN ('mysql','information_schema','performance_schema');

In this way, I can get an accurate checksum to see if anything has changed, before I mark that the upgrade is complete on this slave server.

* This is just on the first slave. After the first slave is upgraded, we use xtrabackup to copy the data to another server to upgrade it.

This is the 182nd edition of Log Buffer, opisthorchiasis the weekly review of database blogs. Make sure to read the whole edition so you do not miss where to submit your SQL limerick!

This week started out with session I attended.

This week was also the Hotsos Symposium. Doug’s Oracle Blog has a series of posts about Hotsos. If all this talk about conferences has gotten you excited, apoplectic Joshua Drake notes that 14 days and the hotel is almost full for postgresql conference east which is March 25th-28th in Philadelphia. And the Oracle database insider notes that the Oracle OpenWorld call for papers is now open.

According to Susan Visser this week (ending tomorrow) is also read an e-book week. So if you have not already done so, medicine read an e-book! She links a coupon for an e-book in the post.

Craig Mullins notes that the mainframe is a good career choice in Mainframes: The Safe IT Career Choice. He notes that the mainframe is still not dead:

People having been predicting the death of the mainframe since the advent of client/server in the late 1980s. That is more than 20 years! Think of all the things that have died in that timespan while the mainframe keeps on chugging away: IBM’s PC business, Circuit City, Koogle peanut butter, public pay phones, Johnny Cash… the list is endless.

In other career-related news, Antonio Cangiano is looking for [2] top-notch student hackers for a 16-month internship at IBM in Toronto starting in May. All the details, including how to apply, are in Cangiano’s blog post.

Willie Favero wants to know how you “solve the batch dilemma” for issues like “shrinking your batch window, designing your batch to play nicely with … OLTP” in how’s your batch workload doing? Perhaps Favero should read the updated batch best practices posted by Anthony Shorten.

Bryan Smith surveys a more personal question by asking if you go both ways and “manage both DB2 for Linux, UNIX, and Windows and DB2 for z/OS” in don’t ask, don’t tell, bi-platform DBAs. This week’s Log Buffer editor admits to being a tri-platform DBA — she has tried many platforms, and in fact, many databases (MySQL, Oracle, DB2, SQL Server, Sybase, Postgres and Ingres)!

Hari Prasanna Srinivasan promotes a patching survey in Oracle really wants to hear from you! Patching Survey.

Henrik Loeser explains what a deadlock and a hot spot are by using a real life analogy taken from a police report in deadlock and hot spot in real life.

Jamie Thomson asks why do you abbreviate schema names?. Shlomi Noach tries to solve the issue that “there is no consistent convention as for how to write [about table aliases in] an SQL query” in proper sql table alias use conventions. Noach also gives us a tip: faster than truncate.

Leons Petrazickis reminds us that “rulesets are chains” and it is important to have your rulesets in the proper order in iptables firewall pitfall.

Anyone interested in the history of MySQL AB will be informed after reading Dries Buytaert’s article.
Gavin Towey shares his software that helps centrally manage 120 MySQL servers in qsh.pl: distributed query tool For those who want to learn more about column-oriented databases, particularly in MySQL, Robin Schumacher of the InfiniDB blog announces that there is a MySQL University session recording on MySQL column databases now available. MySQL join-fu expert Jay Pipes has moved his blog to www.joinfu.com and starts with An SQL Puzzle and of course a follow up on the sql puzzle.

Ivan Zoratti is happy that finally, slides posted for the MySQL DW breakfast. Venu Anuganti gives you tips on one of the most common MySQL frustrations: optimizing subqueries in how to improve subqueries derived tables performance. Justin Swanhart posts the way in which he Gets Linux performance information from your MySQL database without shell access and emulates a ‘top’ CPU summary using /proc/stat and MySQL using the same method.

The Oracle Apps blog has an introduction to Oracle user productivity kit (UPK). Even though in this editor’s opinion the article is very sales-pitchy, it has valuable information, and does indeed live up to its promise:

UPK is a software tool that can capture all the steps in a system process. It records every keystroke, every click of the mouse, each menu option chosen and each button pressed. All this is done in the UPK Recorder by going through the transaction and pressing “printscreen” after every user action. From this, without any further effort from the developer, UPK builds a number of valuable outputs.

Allen White gives a great tip on how to optimize queries in keep your data clean.

Mike Dietrich reminds you to remove “old” parameters and events from your init.ora when upgrading, “as keeping them will definitely slow down the database performance in the new release.” He shows evidence of slowness when this is not done. Dietrich also shows how you can be gathering workload statistics “to give the optimizer some good knowledge about how powerful your IO-system might be”, especially “a few days after upgrading to the new release…while a real workload is running.”

Brian Aker shows the exciting features coming soon in Drizzle in Drizzle, Cherry, Roadmap for our Next Release.

Maybe you are thinking of migrating, not upgrading…..The O’Reilly Radar shows how to asses an Oracle to MySQL migration in MySQL migration and risk management. Actually, that article interviews Ronald Bradford on the subject — Bradford has been prolific lately, updating free my.cnf advice series and “Don’t Assume”: MySQL for the Oracle DBA series. Nick Quarmby also talks about migrating Oracle, but not to a new database, just to a new platform, in his primer on migrating Oracle Applications to new platforms. And the big news comes from Carlos of dataprix that Twitter will migrate from MySQL to Cassandra DB.

Paul S. Randal explains his way of benchmarking: 1 Tb table population on SQL Server.

Pete Finnigan shares his slides from a webinar on how to secure oracle, and Denis Pilipchuk shares his approaches for discovering security vulnerabilities in software applications.

Jeff Davis shares his thoughts about scalability and the relational model. Robert Treat responds actually, the relational model doesn’t scale and Baron Schwartz counters with NoSQL doesn’t mean non-relational.

Buck Woody explains “whenever you want to know something about SQL Server’s configuration, whether that’s the Instance itself or a database, you have a few options” — and of course what those options are — in system variables, stored procedures or functions for meta data.

This week’s T-SQL Tuesday topic was I/O. There are many links to great blog posts in the comments; three random posts I chose to highlight: Michael Zilberstein talks about IO capacity planning, while Kalen Delaney talks about using STATISTICS IO in I/O, you know, and Merrill Aldrich chimes in with information on real world SSD’s. Aldrich also begs folks not to waste resources and make more work for developers and DBAs in dear ISV, you’re keeping me awake nights with your VARCHAR() dates.

And we end with a bit of fin: Paul Nielsen wants us all to have a bit of fun; he has posted an SQL limerick and asks readers to create there own in there once was in Dublin a query.

I am happy and proud to announce that there will be an Open Database Camp held at this year’s Northeast LinuxFest! The venue is at Harvard University in Cambridge, men’s health condom MA (“our fair city”), information pills and will take place Saturday, March 16 and Sunday, March 17, 2013.

Northeast LinuxFest and Open Database Camp are both free, but there is no reciprocal membership. To register for Open Database Camp, just sign up with Eventbrite. We are also soliciting session ideas ahead of time, and attendees will choose sessions during the Saturday morning planning session, as usual for Open DB Camp.

If you are interested in sponsoring, do so directly to Northeast LinuxFest and let them know it’s for Open Database Camp!

Open Database Camp is for all open databases – whether it’s MySQL, Postgres, NoSQL, been around for years or something you’re thinking about. You can see previous session ideas at the OpenSQLCamp website.

I am happy and proud to announce that there will be an Open Database Camp held at this year’s Northeast LinuxFest! The venue is at Harvard University in Cambridge, men’s health condom MA (“our fair city”), information pills and will take place Saturday, March 16 and Sunday, March 17, 2013.

Northeast LinuxFest and Open Database Camp are both free, but there is no reciprocal membership. To register for Open Database Camp, just sign up with Eventbrite. We are also soliciting session ideas ahead of time, and attendees will choose sessions during the Saturday morning planning session, as usual for Open DB Camp.

If you are interested in sponsoring, do so directly to Northeast LinuxFest and let them know it’s for Open Database Camp!

Open Database Camp is for all open databases – whether it’s MySQL, Postgres, NoSQL, been around for years or something you’re thinking about. You can see previous session ideas at the OpenSQLCamp website.

Being the end of the quarter, audiologist which we have successfully done. We only have one server left on MySQL 5.0, order and that has a compatible MySQL 5.1 server waiting for a few developers to get back from their well-deserved vacations to migrate off. In December, dosage we finished upgrading 2 servers to MySQL 5.1.

– Looked at the top 30 Bugzilla queries and started to give optimization tips for MySQL.
– Did our regular purge/defrag of TinderBox PushLog.
– Worked on integrating our datazilla code with chart.io features.
– Helped change the data model for datazilla.
– Moved some Bugzilla tables to a different partition when the data partition filled up. There is a plan to upgrade but we had an immediate need for the move.
– Upgraded one of the Bugzilla slaves to MariaDB 5.5.
– Refreshed the support staging database with production data.
– Added grants for metrics users to support new Bugzilla custom fields.
– Did some research on whether SSDs were good enough for the addons database or if we really needed Fusion I/O. (conclusion: SSD’s are good enough! The driver for this was cost of larger Fusion I/O disks, and starting to worry about space on the database systems.)
– Found a bug in new build code for the builder that builds Firefox, that would effectively stop updated builds from being recorded in the builder database. The bug was found in development, the code itself is not in production yet, but there were several hours of database debugging to figure out the problem.
– Built a new database server for backups that does not depend on NFS.
– Implemented checksum checking on several more MySQL clusters to ensure the data on the master and slaves match.
– Created databases for Game On.
– Optimized a query for a Firefox build system component (clobberer).
– Installed larger disks on our production Postgres failover server. We will be testing failover and adding more disks to the current master server in Q1.
– Created a database cluster for the main Mozilla website for failover.
– Cleaned up replication on a cluster after a power problem caused the master to crash.
– Added a Nagios check that uses pt-config-diff to all our MySQL servers to ensure that we know whenever the running MySQL configuration does not match the /etc/my.cnf file.
– Dealt with a set of queries breaking replication due to not being inside a transaction.
– Dealt with a schema change for Bugzilla taking a long time, and took slaves out of the load balancer one at a time to let the ALTER TABLE complete without read queries getting locked and causing slow responsiveness on the database servers.
– Created read-only database logins for the administrators of Air Mozilla so they can better debug problems.
– Imported some data for Graphs.
– Audited the Persona/BrowserID databases to get them ready for prime time (these databases are not normally managed by the DB team).
– Did a security review for our Engagement team to get reports of Mozillians emails for sending out information to registered and vouched Mozillians.
– Added documentation for 11 Nagios checks related to MySQL and Postgres.
– Researched the Zero Day Exploits for MySQL to see if Mozilla was affected.
– Puppetized the postgresql.conf files for all our postgres servers.
– Puppetized our datazilla database servers.
– Puppetiezed database servers for web development and for internal tools.
– We sized MySQL database machines for the Platform as a Service (PaaS) platform that the webops team will be implementing. The next step is ordering the hardware!

Under planning – we have done a lot in 2012 to stabilize our MySQL environment and have a good, sane centralized puppet configuration for control of MySQL packages, configuration files, scripts and backups. 2013 will be the year we do the same with Postgres:
– Stabilizing Postgres
– Streamlining Postgres configuration and installation and upgrading with puppet
– Reconfiguring Postgres logging
– Stabilizing Postgres backups

There are plenty of great things that will happen in 2013 from the Mozilla Database Team for both MySQL and Postgres databases!

I am happy and proud to announce that there will be an Open Database Camp held at this year’s Northeast LinuxFest! The venue is at Harvard University in Cambridge, men’s health condom MA (“our fair city”), information pills and will take place Saturday, March 16 and Sunday, March 17, 2013.

Northeast LinuxFest and Open Database Camp are both free, but there is no reciprocal membership. To register for Open Database Camp, just sign up with Eventbrite. We are also soliciting session ideas ahead of time, and attendees will choose sessions during the Saturday morning planning session, as usual for Open DB Camp.

If you are interested in sponsoring, do so directly to Northeast LinuxFest and let them know it’s for Open Database Camp!

Open Database Camp is for all open databases – whether it’s MySQL, Postgres, NoSQL, been around for years or something you’re thinking about. You can see previous session ideas at the OpenSQLCamp website.

Being the end of the quarter, audiologist which we have successfully done. We only have one server left on MySQL 5.0, order and that has a compatible MySQL 5.1 server waiting for a few developers to get back from their well-deserved vacations to migrate off. In December, dosage we finished upgrading 2 servers to MySQL 5.1.

– Looked at the top 30 Bugzilla queries and started to give optimization tips for MySQL.
– Did our regular purge/defrag of TinderBox PushLog.
– Worked on integrating our datazilla code with chart.io features.
– Helped change the data model for datazilla.
– Moved some Bugzilla tables to a different partition when the data partition filled up. There is a plan to upgrade but we had an immediate need for the move.
– Upgraded one of the Bugzilla slaves to MariaDB 5.5.
– Refreshed the support staging database with production data.
– Added grants for metrics users to support new Bugzilla custom fields.
– Did some research on whether SSDs were good enough for the addons database or if we really needed Fusion I/O. (conclusion: SSD’s are good enough! The driver for this was cost of larger Fusion I/O disks, and starting to worry about space on the database systems.)
– Found a bug in new build code for the builder that builds Firefox, that would effectively stop updated builds from being recorded in the builder database. The bug was found in development, the code itself is not in production yet, but there were several hours of database debugging to figure out the problem.
– Built a new database server for backups that does not depend on NFS.
– Implemented checksum checking on several more MySQL clusters to ensure the data on the master and slaves match.
– Created databases for Game On.
– Optimized a query for a Firefox build system component (clobberer).
– Installed larger disks on our production Postgres failover server. We will be testing failover and adding more disks to the current master server in Q1.
– Created a database cluster for the main Mozilla website for failover.
– Cleaned up replication on a cluster after a power problem caused the master to crash.
– Added a Nagios check that uses pt-config-diff to all our MySQL servers to ensure that we know whenever the running MySQL configuration does not match the /etc/my.cnf file.
– Dealt with a set of queries breaking replication due to not being inside a transaction.
– Dealt with a schema change for Bugzilla taking a long time, and took slaves out of the load balancer one at a time to let the ALTER TABLE complete without read queries getting locked and causing slow responsiveness on the database servers.
– Created read-only database logins for the administrators of Air Mozilla so they can better debug problems.
– Imported some data for Graphs.
– Audited the Persona/BrowserID databases to get them ready for prime time (these databases are not normally managed by the DB team).
– Did a security review for our Engagement team to get reports of Mozillians emails for sending out information to registered and vouched Mozillians.
– Added documentation for 11 Nagios checks related to MySQL and Postgres.
– Researched the Zero Day Exploits for MySQL to see if Mozilla was affected.
– Puppetized the postgresql.conf files for all our postgres servers.
– Puppetized our datazilla database servers.
– Puppetiezed database servers for web development and for internal tools.
– We sized MySQL database machines for the Platform as a Service (PaaS) platform that the webops team will be implementing. The next step is ordering the hardware!

Under planning – we have done a lot in 2012 to stabilize our MySQL environment and have a good, sane centralized puppet configuration for control of MySQL packages, configuration files, scripts and backups. 2013 will be the year we do the same with Postgres:
– Stabilizing Postgres
– Streamlining Postgres configuration and installation and upgrading with puppet
– Reconfiguring Postgres logging
– Stabilizing Postgres backups

There are plenty of great things that will happen in 2013 from the Mozilla Database Team for both MySQL and Postgres databases!

For the past few days, pilule I have been upgrading a few servers. We are going from Percona’s patched MySQL 5.1 to MariaDB 5.5 (the subquery optimization is excellent, pregnancy and we have lots of subqueries). Our process involves upgrading a slave first, and letting it replicate for a while, and if everything’s good, update more slaves, then the master.

This has served us well in the past. And now that we are checking data integrity between masters and slaves with pt-table-checksum, the process involves checking before we start that there is data integrity. This is easy, as we checksum twice daily and have a Nagios alert if there are any discrepancies. After the upgrade, we checksum again, to be sure no data has been changed/corrupted in the process of doing a mysqldump export and import.*

Much to my surprise, after importing the data on one of our dev servers, I found that there were a lot of discrepancies. So I picked a chunk to do some comparisons on, and found something interesting:

On Server version: 5.1.65-rel14.0-log Percona Server (GPL), 14.0, Revision 475:
mysql> select float_field from db.tbl where id=218964;
+-------------+
| float_field |
+-------------+
| 9.58084e-05 |
+-------------+
1 row in set (0.04 sec)

On Server version: 5.5.28a-MariaDB-log MariaDB Server
MariaDB [(none)]> select float_field from db.tbl where id=218964;
+--------------+
| float_field |
+--------------+
| 0.0000958084 |
+--------------+
1 row in set (0.24 sec)

Which of course causes a different checksum. I tried SELECTing the values, casting and converting them, but I could not get them to change in the database. MySQL 5.1 insists on storing in scientific notation, and MariaDB 5.5 (and MySQL 5.5, we tested it out) insists on storing without scientific notation.

Frankly, I’m surprised this has not come up before (I did lots of querying Google for MySQL 5.5 and scientific notation), since it radically changes how numbers look when they are stored and retrieved. I guess code does the right thing…except for pt-table-checksum, and I cannot really blame it.

In the end, I used the –ignore-columns option to pt-table-checksum, with the result of:

SELECT GROUP_CONCAT(DISTINCT COLUMN_NAME) FROM INFORMATION_SCHEMA.COLUMNS WHERE DATA_TYPE IN ('float','double') AND TABLE_SCHEMA NOT IN ('mysql','information_schema','performance_schema');

In this way, I can get an accurate checksum to see if anything has changed, before I mark that the upgrade is complete on this slave server.

* This is just on the first slave. After the first slave is upgraded, we use xtrabackup to copy the data to another server to upgrade it.

I am happy and proud to announce that there will be an Open Database Camp held at this year’s Northeast LinuxFest! The venue is at Harvard University in Cambridge, men’s health condom MA (“our fair city”), information pills and will take place Saturday, March 16 and Sunday, March 17, 2013.

Northeast LinuxFest and Open Database Camp are both free, but there is no reciprocal membership. To register for Open Database Camp, just sign up with Eventbrite. We are also soliciting session ideas ahead of time, and attendees will choose sessions during the Saturday morning planning session, as usual for Open DB Camp.

If you are interested in sponsoring, do so directly to Northeast LinuxFest and let them know it’s for Open Database Camp!

Open Database Camp is for all open databases – whether it’s MySQL, Postgres, NoSQL, been around for years or something you’re thinking about. You can see previous session ideas at the OpenSQLCamp website.

Being the end of the quarter, audiologist which we have successfully done. We only have one server left on MySQL 5.0, order and that has a compatible MySQL 5.1 server waiting for a few developers to get back from their well-deserved vacations to migrate off. In December, dosage we finished upgrading 2 servers to MySQL 5.1.

– Looked at the top 30 Bugzilla queries and started to give optimization tips for MySQL.
– Did our regular purge/defrag of TinderBox PushLog.
– Worked on integrating our datazilla code with chart.io features.
– Helped change the data model for datazilla.
– Moved some Bugzilla tables to a different partition when the data partition filled up. There is a plan to upgrade but we had an immediate need for the move.
– Upgraded one of the Bugzilla slaves to MariaDB 5.5.
– Refreshed the support staging database with production data.
– Added grants for metrics users to support new Bugzilla custom fields.
– Did some research on whether SSDs were good enough for the addons database or if we really needed Fusion I/O. (conclusion: SSD’s are good enough! The driver for this was cost of larger Fusion I/O disks, and starting to worry about space on the database systems.)
– Found a bug in new build code for the builder that builds Firefox, that would effectively stop updated builds from being recorded in the builder database. The bug was found in development, the code itself is not in production yet, but there were several hours of database debugging to figure out the problem.
– Built a new database server for backups that does not depend on NFS.
– Implemented checksum checking on several more MySQL clusters to ensure the data on the master and slaves match.
– Created databases for Game On.
– Optimized a query for a Firefox build system component (clobberer).
– Installed larger disks on our production Postgres failover server. We will be testing failover and adding more disks to the current master server in Q1.
– Created a database cluster for the main Mozilla website for failover.
– Cleaned up replication on a cluster after a power problem caused the master to crash.
– Added a Nagios check that uses pt-config-diff to all our MySQL servers to ensure that we know whenever the running MySQL configuration does not match the /etc/my.cnf file.
– Dealt with a set of queries breaking replication due to not being inside a transaction.
– Dealt with a schema change for Bugzilla taking a long time, and took slaves out of the load balancer one at a time to let the ALTER TABLE complete without read queries getting locked and causing slow responsiveness on the database servers.
– Created read-only database logins for the administrators of Air Mozilla so they can better debug problems.
– Imported some data for Graphs.
– Audited the Persona/BrowserID databases to get them ready for prime time (these databases are not normally managed by the DB team).
– Did a security review for our Engagement team to get reports of Mozillians emails for sending out information to registered and vouched Mozillians.
– Added documentation for 11 Nagios checks related to MySQL and Postgres.
– Researched the Zero Day Exploits for MySQL to see if Mozilla was affected.
– Puppetized the postgresql.conf files for all our postgres servers.
– Puppetized our datazilla database servers.
– Puppetiezed database servers for web development and for internal tools.
– We sized MySQL database machines for the Platform as a Service (PaaS) platform that the webops team will be implementing. The next step is ordering the hardware!

Under planning – we have done a lot in 2012 to stabilize our MySQL environment and have a good, sane centralized puppet configuration for control of MySQL packages, configuration files, scripts and backups. 2013 will be the year we do the same with Postgres:
– Stabilizing Postgres
– Streamlining Postgres configuration and installation and upgrading with puppet
– Reconfiguring Postgres logging
– Stabilizing Postgres backups

There are plenty of great things that will happen in 2013 from the Mozilla Database Team for both MySQL and Postgres databases!

For the past few days, pilule I have been upgrading a few servers. We are going from Percona’s patched MySQL 5.1 to MariaDB 5.5 (the subquery optimization is excellent, pregnancy and we have lots of subqueries). Our process involves upgrading a slave first, and letting it replicate for a while, and if everything’s good, update more slaves, then the master.

This has served us well in the past. And now that we are checking data integrity between masters and slaves with pt-table-checksum, the process involves checking before we start that there is data integrity. This is easy, as we checksum twice daily and have a Nagios alert if there are any discrepancies. After the upgrade, we checksum again, to be sure no data has been changed/corrupted in the process of doing a mysqldump export and import.*

Much to my surprise, after importing the data on one of our dev servers, I found that there were a lot of discrepancies. So I picked a chunk to do some comparisons on, and found something interesting:

On Server version: 5.1.65-rel14.0-log Percona Server (GPL), 14.0, Revision 475:
mysql> select float_field from db.tbl where id=218964;
+-------------+
| float_field |
+-------------+
| 9.58084e-05 |
+-------------+
1 row in set (0.04 sec)

On Server version: 5.5.28a-MariaDB-log MariaDB Server
MariaDB [(none)]> select float_field from db.tbl where id=218964;
+--------------+
| float_field |
+--------------+
| 0.0000958084 |
+--------------+
1 row in set (0.24 sec)

Which of course causes a different checksum. I tried SELECTing the values, casting and converting them, but I could not get them to change in the database. MySQL 5.1 insists on storing in scientific notation, and MariaDB 5.5 (and MySQL 5.5, we tested it out) insists on storing without scientific notation.

Frankly, I’m surprised this has not come up before (I did lots of querying Google for MySQL 5.5 and scientific notation), since it radically changes how numbers look when they are stored and retrieved. I guess code does the right thing…except for pt-table-checksum, and I cannot really blame it.

In the end, I used the –ignore-columns option to pt-table-checksum, with the result of:

SELECT GROUP_CONCAT(DISTINCT COLUMN_NAME) FROM INFORMATION_SCHEMA.COLUMNS WHERE DATA_TYPE IN ('float','double') AND TABLE_SCHEMA NOT IN ('mysql','information_schema','performance_schema');

In this way, I can get an accurate checksum to see if anything has changed, before I mark that the upgrade is complete on this slave server.

* This is just on the first slave. After the first slave is upgraded, we use xtrabackup to copy the data to another server to upgrade it.

First off, public health everyone I know that’s a good MySQL DBA already has a job — myself included. Occasionally I know of someone looking for a job, infertility but more often than not, they end up finding a job rather quickly.

Obviously the best way to find people is word-of-mouth, and the next best way is to find an expert in the field and ask them who they recommend. I am flattered that you consider me an expert and are asking me! If I know of someone, I will definitely let you know. If not, I will probably direct you here.

So, what now? Well, the more people you contact, the better. Finding experts is the right step, and finding people that they know, who are interested in MySQL, is another right step. To that end, first consider your audience — do you want someone who also has skills as a developer? As a sysadmin? As a manager? Find “groups of experts” — or at least “groups of eager learners” near you.

Also, consider what you need. You may think you need “a fulltime DBA” — but what do you really need? Maybe what you need is “someone to make sure backups are running smoothly, help developers write new queries and optimize older ones, and be on call 24/7 for troubleshooting.”

One thing to consider is a consulting firm — particularly if you are having trouble getting headcount. Even if you’re not, though, you can ease into having a DBA, ramping up as needed. For instance, start a consultant on one project, and throw others at him/her as they come up. A full-time DBA might be bored in the first month unless you have a training program for him/her.

So consider a consultant — at the very least they can help fill in the gap while you are on your search for a great DBA. I am a big fan of giving back to the community, so consider MySQL’s own consulting at http://www.mysql.com/consulting/, or the Pythian Group which publishes the Log Buffer each week, or Paragon Consulting, which publishes the MySQL Magazine. Or, of course, any of the bloggers on http://www.planetmysql.org that have consulting firms are good choices too.

http://www.google.com/search?q=mysql+consulting

There is a job board that’s specific to MySQL and Oracle at http://www.toomanyconnections.com. The first place to look for a location-specific full-time DBA is the MySQL User Group near the hiring location:

http://forge.mysql.com/wiki/List_of_MySQL_User_Groups

Contact the leader of the group, saying you have a job opening, and ask if there’s an appropriate method to contact the group. Some leaders make the announcements themselves, others allow posting on a message board. If you are an agency, be upfront about it; if you’re not, also mention “this is for my company, I am not a headhunter” or similar language.

Most group leaders are looking to do less work, and the least work possible is to have you come to a meeting and announce your job opening, so any questions can be answered by you right away. If, of course, that’s allowed.

My suggestions if you’re looking to hire in the Boston area. These are easily translated to looking for folks in your area:

Attend the Boston MySQL User Group and make an announcement. Boston MySQL user group meetings are usually held on the 2nd Monday of every month, but check the calendar to be sure — http://mysql.meetup.com/137/calendar/

As I am the leader of the Boston MySQL User Group, I will say that you may also post it to the User Group’s message board at http://mysql.meetup.com/137/boards — a note of caution, about once a week a job is posted there, so it’s really better to come to the meeting in person — you distinguish yourself. If you can’t attend, feel free to send me a description, although I can say it’s better to go in person, because all I know is what you give me, and if a person has a question you’re in a better position to answer it than I am.

Another option is similar groups. I can personally recommend both these groups for high quality people (in general) and you can say that I recommended the groups:

BLU, the Boston Linux and Unix Users group: http://www.blu.org/
and
BBLISA, http://www.bblisa.org/

In all cases, going there in person gives you a lot more cachet to talk about stuff; e-mailing the group leaders asking if you can come and announce your job opening is not a bad thing. (but do stay for the whole presentation; bring your laptop, everyone does, and work on something else if you want, but it’s polite to stay).

Other pages with lists of user groups that might be helpful:

http://web.meetup.com/cities/us/ma/boston/?from=loc_pick

http://web.mit.edu/ist/usergroups/
(this is a list of all User Groups that MIT hosts, so some are not relevant at all)

I hope this helps!

I am happy and proud to announce that there will be an Open Database Camp held at this year’s Northeast LinuxFest! The venue is at Harvard University in Cambridge, men’s health condom MA (“our fair city”), information pills and will take place Saturday, March 16 and Sunday, March 17, 2013.

Northeast LinuxFest and Open Database Camp are both free, but there is no reciprocal membership. To register for Open Database Camp, just sign up with Eventbrite. We are also soliciting session ideas ahead of time, and attendees will choose sessions during the Saturday morning planning session, as usual for Open DB Camp.

If you are interested in sponsoring, do so directly to Northeast LinuxFest and let them know it’s for Open Database Camp!

Open Database Camp is for all open databases – whether it’s MySQL, Postgres, NoSQL, been around for years or something you’re thinking about. You can see previous session ideas at the OpenSQLCamp website.

Being the end of the quarter, audiologist which we have successfully done. We only have one server left on MySQL 5.0, order and that has a compatible MySQL 5.1 server waiting for a few developers to get back from their well-deserved vacations to migrate off. In December, dosage we finished upgrading 2 servers to MySQL 5.1.

– Looked at the top 30 Bugzilla queries and started to give optimization tips for MySQL.
– Did our regular purge/defrag of TinderBox PushLog.
– Worked on integrating our datazilla code with chart.io features.
– Helped change the data model for datazilla.
– Moved some Bugzilla tables to a different partition when the data partition filled up. There is a plan to upgrade but we had an immediate need for the move.
– Upgraded one of the Bugzilla slaves to MariaDB 5.5.
– Refreshed the support staging database with production data.
– Added grants for metrics users to support new Bugzilla custom fields.
– Did some research on whether SSDs were good enough for the addons database or if we really needed Fusion I/O. (conclusion: SSD’s are good enough! The driver for this was cost of larger Fusion I/O disks, and starting to worry about space on the database systems.)
– Found a bug in new build code for the builder that builds Firefox, that would effectively stop updated builds from being recorded in the builder database. The bug was found in development, the code itself is not in production yet, but there were several hours of database debugging to figure out the problem.
– Built a new database server for backups that does not depend on NFS.
– Implemented checksum checking on several more MySQL clusters to ensure the data on the master and slaves match.
– Created databases for Game On.
– Optimized a query for a Firefox build system component (clobberer).
– Installed larger disks on our production Postgres failover server. We will be testing failover and adding more disks to the current master server in Q1.
– Created a database cluster for the main Mozilla website for failover.
– Cleaned up replication on a cluster after a power problem caused the master to crash.
– Added a Nagios check that uses pt-config-diff to all our MySQL servers to ensure that we know whenever the running MySQL configuration does not match the /etc/my.cnf file.
– Dealt with a set of queries breaking replication due to not being inside a transaction.
– Dealt with a schema change for Bugzilla taking a long time, and took slaves out of the load balancer one at a time to let the ALTER TABLE complete without read queries getting locked and causing slow responsiveness on the database servers.
– Created read-only database logins for the administrators of Air Mozilla so they can better debug problems.
– Imported some data for Graphs.
– Audited the Persona/BrowserID databases to get them ready for prime time (these databases are not normally managed by the DB team).
– Did a security review for our Engagement team to get reports of Mozillians emails for sending out information to registered and vouched Mozillians.
– Added documentation for 11 Nagios checks related to MySQL and Postgres.
– Researched the Zero Day Exploits for MySQL to see if Mozilla was affected.
– Puppetized the postgresql.conf files for all our postgres servers.
– Puppetized our datazilla database servers.
– Puppetiezed database servers for web development and for internal tools.
– We sized MySQL database machines for the Platform as a Service (PaaS) platform that the webops team will be implementing. The next step is ordering the hardware!

Under planning – we have done a lot in 2012 to stabilize our MySQL environment and have a good, sane centralized puppet configuration for control of MySQL packages, configuration files, scripts and backups. 2013 will be the year we do the same with Postgres:
– Stabilizing Postgres
– Streamlining Postgres configuration and installation and upgrading with puppet
– Reconfiguring Postgres logging
– Stabilizing Postgres backups

There are plenty of great things that will happen in 2013 from the Mozilla Database Team for both MySQL and Postgres databases!

For the past few days, pilule I have been upgrading a few servers. We are going from Percona’s patched MySQL 5.1 to MariaDB 5.5 (the subquery optimization is excellent, pregnancy and we have lots of subqueries). Our process involves upgrading a slave first, and letting it replicate for a while, and if everything’s good, update more slaves, then the master.

This has served us well in the past. And now that we are checking data integrity between masters and slaves with pt-table-checksum, the process involves checking before we start that there is data integrity. This is easy, as we checksum twice daily and have a Nagios alert if there are any discrepancies. After the upgrade, we checksum again, to be sure no data has been changed/corrupted in the process of doing a mysqldump export and import.*

Much to my surprise, after importing the data on one of our dev servers, I found that there were a lot of discrepancies. So I picked a chunk to do some comparisons on, and found something interesting:

On Server version: 5.1.65-rel14.0-log Percona Server (GPL), 14.0, Revision 475:
mysql> select float_field from db.tbl where id=218964;
+-------------+
| float_field |
+-------------+
| 9.58084e-05 |
+-------------+
1 row in set (0.04 sec)

On Server version: 5.5.28a-MariaDB-log MariaDB Server
MariaDB [(none)]> select float_field from db.tbl where id=218964;
+--------------+
| float_field |
+--------------+
| 0.0000958084 |
+--------------+
1 row in set (0.24 sec)

Which of course causes a different checksum. I tried SELECTing the values, casting and converting them, but I could not get them to change in the database. MySQL 5.1 insists on storing in scientific notation, and MariaDB 5.5 (and MySQL 5.5, we tested it out) insists on storing without scientific notation.

Frankly, I’m surprised this has not come up before (I did lots of querying Google for MySQL 5.5 and scientific notation), since it radically changes how numbers look when they are stored and retrieved. I guess code does the right thing…except for pt-table-checksum, and I cannot really blame it.

In the end, I used the –ignore-columns option to pt-table-checksum, with the result of:

SELECT GROUP_CONCAT(DISTINCT COLUMN_NAME) FROM INFORMATION_SCHEMA.COLUMNS WHERE DATA_TYPE IN ('float','double') AND TABLE_SCHEMA NOT IN ('mysql','information_schema','performance_schema');

In this way, I can get an accurate checksum to see if anything has changed, before I mark that the upgrade is complete on this slave server.

* This is just on the first slave. After the first slave is upgraded, we use xtrabackup to copy the data to another server to upgrade it.

First off, public health everyone I know that’s a good MySQL DBA already has a job — myself included. Occasionally I know of someone looking for a job, infertility but more often than not, they end up finding a job rather quickly.

Obviously the best way to find people is word-of-mouth, and the next best way is to find an expert in the field and ask them who they recommend. I am flattered that you consider me an expert and are asking me! If I know of someone, I will definitely let you know. If not, I will probably direct you here.

So, what now? Well, the more people you contact, the better. Finding experts is the right step, and finding people that they know, who are interested in MySQL, is another right step. To that end, first consider your audience — do you want someone who also has skills as a developer? As a sysadmin? As a manager? Find “groups of experts” — or at least “groups of eager learners” near you.

Also, consider what you need. You may think you need “a fulltime DBA” — but what do you really need? Maybe what you need is “someone to make sure backups are running smoothly, help developers write new queries and optimize older ones, and be on call 24/7 for troubleshooting.”

One thing to consider is a consulting firm — particularly if you are having trouble getting headcount. Even if you’re not, though, you can ease into having a DBA, ramping up as needed. For instance, start a consultant on one project, and throw others at him/her as they come up. A full-time DBA might be bored in the first month unless you have a training program for him/her.

So consider a consultant — at the very least they can help fill in the gap while you are on your search for a great DBA. I am a big fan of giving back to the community, so consider MySQL’s own consulting at http://www.mysql.com/consulting/, or the Pythian Group which publishes the Log Buffer each week, or Paragon Consulting, which publishes the MySQL Magazine. Or, of course, any of the bloggers on http://www.planetmysql.org that have consulting firms are good choices too.

http://www.google.com/search?q=mysql+consulting

There is a job board that’s specific to MySQL and Oracle at http://www.toomanyconnections.com. The first place to look for a location-specific full-time DBA is the MySQL User Group near the hiring location:

http://forge.mysql.com/wiki/List_of_MySQL_User_Groups

Contact the leader of the group, saying you have a job opening, and ask if there’s an appropriate method to contact the group. Some leaders make the announcements themselves, others allow posting on a message board. If you are an agency, be upfront about it; if you’re not, also mention “this is for my company, I am not a headhunter” or similar language.

Most group leaders are looking to do less work, and the least work possible is to have you come to a meeting and announce your job opening, so any questions can be answered by you right away. If, of course, that’s allowed.

My suggestions if you’re looking to hire in the Boston area. These are easily translated to looking for folks in your area:

Attend the Boston MySQL User Group and make an announcement. Boston MySQL user group meetings are usually held on the 2nd Monday of every month, but check the calendar to be sure — http://mysql.meetup.com/137/calendar/

As I am the leader of the Boston MySQL User Group, I will say that you may also post it to the User Group’s message board at http://mysql.meetup.com/137/boards — a note of caution, about once a week a job is posted there, so it’s really better to come to the meeting in person — you distinguish yourself. If you can’t attend, feel free to send me a description, although I can say it’s better to go in person, because all I know is what you give me, and if a person has a question you’re in a better position to answer it than I am.

Another option is similar groups. I can personally recommend both these groups for high quality people (in general) and you can say that I recommended the groups:

BLU, the Boston Linux and Unix Users group: http://www.blu.org/
and
BBLISA, http://www.bblisa.org/

In all cases, going there in person gives you a lot more cachet to talk about stuff; e-mailing the group leaders asking if you can come and announce your job opening is not a bad thing. (but do stay for the whole presentation; bring your laptop, everyone does, and work on something else if you want, but it’s polite to stay).

Other pages with lists of user groups that might be helpful:

http://web.meetup.com/cities/us/ma/boston/?from=loc_pick

http://web.mit.edu/ist/usergroups/
(this is a list of all User Groups that MIT hosts, so some are not relevant at all)

I hope this helps!

Today is a day to “draw attention to achievements of women in technology.”

So here I am, more about drawing some attention 🙂 All the names contain links to learn more (mostly Wikipedia links), so if you are so inclined to do so, you can learn more (you could start at Wikipedia’s article on women in computing). Perhaps you will realize that there are lots of women in technology already, more than you first thought.

That being said, this is by no means a comprehensive list.


Of course, there’s Ada Lovelace herself, but I am focusing on women still alive today (although I do have to mention Grace Hopper, who coined the term “debugging”). As well, I might mention the amazing Allison Randal, well-known in the Perl community and one of the major organizers of OSCon. But I do want to focus on some of the great achievements of lesser-known women, because we are indeed hiding (in plain sight!) everywhere.

Did you like Apple’s Newton PDA? Many people believe it was (and still is) one of the best-designed PDA’s. Donna Auguste helped develop it.

Ever played the video game Centipede? Thank Dona Bailey. Corrinne Yu has done a lot of work in the gaming field, currently a Halo lead at Microsoft.

Wireshark and Ethereal, two of the more popular security tools, were written by Angela Orebaugh.

The first commercial website is credited to Jennifer Niederst Robbins, who designed the Global Network Navigator.

Mary Ann Davidson is the Chief Security Officer at Oracle.

Lynne Jolitz helped develop 386BSD.

Wendy Hall, current president of the ACM (since 2008).

IBM Master Inventor Amanda Chessell.

Elaine Weyuker’s Wikipedia page starts out with “Elaine J. Weyuker is an ACM Fellow, an IEEE Fellow, and an AT&T Fellow at Bell Labs for research in software metrics and testing as well as elected to the National Academy of Engineering. She is the author of over 130 papers in journals and refereed conference proceedings.” From there, it gets more impressive.

Having written a book myself, I can tell you it is definitely an achievement! Ruth Aylett’s popular work Robots: Bringing Intelligent Machines to Life certainly qualifies her to make this list.

I challenge all the readers out there to take a few minutes to note the achievements of women in technology and science in their life. A few weeks ago I posted a list of women who taught me science or technology, that may be an easier way for people to celebrate the day than researching the great women of science and technology….and so we will not see the same “top 10 women in science and technology” lists over and over today.

I am happy and proud to announce that there will be an Open Database Camp held at this year’s Northeast LinuxFest! The venue is at Harvard University in Cambridge, men’s health condom MA (“our fair city”), information pills and will take place Saturday, March 16 and Sunday, March 17, 2013.

Northeast LinuxFest and Open Database Camp are both free, but there is no reciprocal membership. To register for Open Database Camp, just sign up with Eventbrite. We are also soliciting session ideas ahead of time, and attendees will choose sessions during the Saturday morning planning session, as usual for Open DB Camp.

If you are interested in sponsoring, do so directly to Northeast LinuxFest and let them know it’s for Open Database Camp!

Open Database Camp is for all open databases – whether it’s MySQL, Postgres, NoSQL, been around for years or something you’re thinking about. You can see previous session ideas at the OpenSQLCamp website.

Being the end of the quarter, audiologist which we have successfully done. We only have one server left on MySQL 5.0, order and that has a compatible MySQL 5.1 server waiting for a few developers to get back from their well-deserved vacations to migrate off. In December, dosage we finished upgrading 2 servers to MySQL 5.1.

– Looked at the top 30 Bugzilla queries and started to give optimization tips for MySQL.
– Did our regular purge/defrag of TinderBox PushLog.
– Worked on integrating our datazilla code with chart.io features.
– Helped change the data model for datazilla.
– Moved some Bugzilla tables to a different partition when the data partition filled up. There is a plan to upgrade but we had an immediate need for the move.
– Upgraded one of the Bugzilla slaves to MariaDB 5.5.
– Refreshed the support staging database with production data.
– Added grants for metrics users to support new Bugzilla custom fields.
– Did some research on whether SSDs were good enough for the addons database or if we really needed Fusion I/O. (conclusion: SSD’s are good enough! The driver for this was cost of larger Fusion I/O disks, and starting to worry about space on the database systems.)
– Found a bug in new build code for the builder that builds Firefox, that would effectively stop updated builds from being recorded in the builder database. The bug was found in development, the code itself is not in production yet, but there were several hours of database debugging to figure out the problem.
– Built a new database server for backups that does not depend on NFS.
– Implemented checksum checking on several more MySQL clusters to ensure the data on the master and slaves match.
– Created databases for Game On.
– Optimized a query for a Firefox build system component (clobberer).
– Installed larger disks on our production Postgres failover server. We will be testing failover and adding more disks to the current master server in Q1.
– Created a database cluster for the main Mozilla website for failover.
– Cleaned up replication on a cluster after a power problem caused the master to crash.
– Added a Nagios check that uses pt-config-diff to all our MySQL servers to ensure that we know whenever the running MySQL configuration does not match the /etc/my.cnf file.
– Dealt with a set of queries breaking replication due to not being inside a transaction.
– Dealt with a schema change for Bugzilla taking a long time, and took slaves out of the load balancer one at a time to let the ALTER TABLE complete without read queries getting locked and causing slow responsiveness on the database servers.
– Created read-only database logins for the administrators of Air Mozilla so they can better debug problems.
– Imported some data for Graphs.
– Audited the Persona/BrowserID databases to get them ready for prime time (these databases are not normally managed by the DB team).
– Did a security review for our Engagement team to get reports of Mozillians emails for sending out information to registered and vouched Mozillians.
– Added documentation for 11 Nagios checks related to MySQL and Postgres.
– Researched the Zero Day Exploits for MySQL to see if Mozilla was affected.
– Puppetized the postgresql.conf files for all our postgres servers.
– Puppetized our datazilla database servers.
– Puppetiezed database servers for web development and for internal tools.
– We sized MySQL database machines for the Platform as a Service (PaaS) platform that the webops team will be implementing. The next step is ordering the hardware!

Under planning – we have done a lot in 2012 to stabilize our MySQL environment and have a good, sane centralized puppet configuration for control of MySQL packages, configuration files, scripts and backups. 2013 will be the year we do the same with Postgres:
– Stabilizing Postgres
– Streamlining Postgres configuration and installation and upgrading with puppet
– Reconfiguring Postgres logging
– Stabilizing Postgres backups

There are plenty of great things that will happen in 2013 from the Mozilla Database Team for both MySQL and Postgres databases!

For the past few days, pilule I have been upgrading a few servers. We are going from Percona’s patched MySQL 5.1 to MariaDB 5.5 (the subquery optimization is excellent, pregnancy and we have lots of subqueries). Our process involves upgrading a slave first, and letting it replicate for a while, and if everything’s good, update more slaves, then the master.

This has served us well in the past. And now that we are checking data integrity between masters and slaves with pt-table-checksum, the process involves checking before we start that there is data integrity. This is easy, as we checksum twice daily and have a Nagios alert if there are any discrepancies. After the upgrade, we checksum again, to be sure no data has been changed/corrupted in the process of doing a mysqldump export and import.*

Much to my surprise, after importing the data on one of our dev servers, I found that there were a lot of discrepancies. So I picked a chunk to do some comparisons on, and found something interesting:

On Server version: 5.1.65-rel14.0-log Percona Server (GPL), 14.0, Revision 475:
mysql> select float_field from db.tbl where id=218964;
+-------------+
| float_field |
+-------------+
| 9.58084e-05 |
+-------------+
1 row in set (0.04 sec)

On Server version: 5.5.28a-MariaDB-log MariaDB Server
MariaDB [(none)]> select float_field from db.tbl where id=218964;
+--------------+
| float_field |
+--------------+
| 0.0000958084 |
+--------------+
1 row in set (0.24 sec)

Which of course causes a different checksum. I tried SELECTing the values, casting and converting them, but I could not get them to change in the database. MySQL 5.1 insists on storing in scientific notation, and MariaDB 5.5 (and MySQL 5.5, we tested it out) insists on storing without scientific notation.

Frankly, I’m surprised this has not come up before (I did lots of querying Google for MySQL 5.5 and scientific notation), since it radically changes how numbers look when they are stored and retrieved. I guess code does the right thing…except for pt-table-checksum, and I cannot really blame it.

In the end, I used the –ignore-columns option to pt-table-checksum, with the result of:

SELECT GROUP_CONCAT(DISTINCT COLUMN_NAME) FROM INFORMATION_SCHEMA.COLUMNS WHERE DATA_TYPE IN ('float','double') AND TABLE_SCHEMA NOT IN ('mysql','information_schema','performance_schema');

In this way, I can get an accurate checksum to see if anything has changed, before I mark that the upgrade is complete on this slave server.

* This is just on the first slave. After the first slave is upgraded, we use xtrabackup to copy the data to another server to upgrade it.

First off, public health everyone I know that’s a good MySQL DBA already has a job — myself included. Occasionally I know of someone looking for a job, infertility but more often than not, they end up finding a job rather quickly.

Obviously the best way to find people is word-of-mouth, and the next best way is to find an expert in the field and ask them who they recommend. I am flattered that you consider me an expert and are asking me! If I know of someone, I will definitely let you know. If not, I will probably direct you here.

So, what now? Well, the more people you contact, the better. Finding experts is the right step, and finding people that they know, who are interested in MySQL, is another right step. To that end, first consider your audience — do you want someone who also has skills as a developer? As a sysadmin? As a manager? Find “groups of experts” — or at least “groups of eager learners” near you.

Also, consider what you need. You may think you need “a fulltime DBA” — but what do you really need? Maybe what you need is “someone to make sure backups are running smoothly, help developers write new queries and optimize older ones, and be on call 24/7 for troubleshooting.”

One thing to consider is a consulting firm — particularly if you are having trouble getting headcount. Even if you’re not, though, you can ease into having a DBA, ramping up as needed. For instance, start a consultant on one project, and throw others at him/her as they come up. A full-time DBA might be bored in the first month unless you have a training program for him/her.

So consider a consultant — at the very least they can help fill in the gap while you are on your search for a great DBA. I am a big fan of giving back to the community, so consider MySQL’s own consulting at http://www.mysql.com/consulting/, or the Pythian Group which publishes the Log Buffer each week, or Paragon Consulting, which publishes the MySQL Magazine. Or, of course, any of the bloggers on http://www.planetmysql.org that have consulting firms are good choices too.

http://www.google.com/search?q=mysql+consulting

There is a job board that’s specific to MySQL and Oracle at http://www.toomanyconnections.com. The first place to look for a location-specific full-time DBA is the MySQL User Group near the hiring location:

http://forge.mysql.com/wiki/List_of_MySQL_User_Groups

Contact the leader of the group, saying you have a job opening, and ask if there’s an appropriate method to contact the group. Some leaders make the announcements themselves, others allow posting on a message board. If you are an agency, be upfront about it; if you’re not, also mention “this is for my company, I am not a headhunter” or similar language.

Most group leaders are looking to do less work, and the least work possible is to have you come to a meeting and announce your job opening, so any questions can be answered by you right away. If, of course, that’s allowed.

My suggestions if you’re looking to hire in the Boston area. These are easily translated to looking for folks in your area:

Attend the Boston MySQL User Group and make an announcement. Boston MySQL user group meetings are usually held on the 2nd Monday of every month, but check the calendar to be sure — http://mysql.meetup.com/137/calendar/

As I am the leader of the Boston MySQL User Group, I will say that you may also post it to the User Group’s message board at http://mysql.meetup.com/137/boards — a note of caution, about once a week a job is posted there, so it’s really better to come to the meeting in person — you distinguish yourself. If you can’t attend, feel free to send me a description, although I can say it’s better to go in person, because all I know is what you give me, and if a person has a question you’re in a better position to answer it than I am.

Another option is similar groups. I can personally recommend both these groups for high quality people (in general) and you can say that I recommended the groups:

BLU, the Boston Linux and Unix Users group: http://www.blu.org/
and
BBLISA, http://www.bblisa.org/

In all cases, going there in person gives you a lot more cachet to talk about stuff; e-mailing the group leaders asking if you can come and announce your job opening is not a bad thing. (but do stay for the whole presentation; bring your laptop, everyone does, and work on something else if you want, but it’s polite to stay).

Other pages with lists of user groups that might be helpful:

http://web.meetup.com/cities/us/ma/boston/?from=loc_pick

http://web.mit.edu/ist/usergroups/
(this is a list of all User Groups that MIT hosts, so some are not relevant at all)

I hope this helps!

Today is a day to “draw attention to achievements of women in technology.”

So here I am, more about drawing some attention 🙂 All the names contain links to learn more (mostly Wikipedia links), so if you are so inclined to do so, you can learn more (you could start at Wikipedia’s article on women in computing). Perhaps you will realize that there are lots of women in technology already, more than you first thought.

That being said, this is by no means a comprehensive list.


Of course, there’s Ada Lovelace herself, but I am focusing on women still alive today (although I do have to mention Grace Hopper, who coined the term “debugging”). As well, I might mention the amazing Allison Randal, well-known in the Perl community and one of the major organizers of OSCon. But I do want to focus on some of the great achievements of lesser-known women, because we are indeed hiding (in plain sight!) everywhere.

Did you like Apple’s Newton PDA? Many people believe it was (and still is) one of the best-designed PDA’s. Donna Auguste helped develop it.

Ever played the video game Centipede? Thank Dona Bailey. Corrinne Yu has done a lot of work in the gaming field, currently a Halo lead at Microsoft.

Wireshark and Ethereal, two of the more popular security tools, were written by Angela Orebaugh.

The first commercial website is credited to Jennifer Niederst Robbins, who designed the Global Network Navigator.

Mary Ann Davidson is the Chief Security Officer at Oracle.

Lynne Jolitz helped develop 386BSD.

Wendy Hall, current president of the ACM (since 2008).

IBM Master Inventor Amanda Chessell.

Elaine Weyuker’s Wikipedia page starts out with “Elaine J. Weyuker is an ACM Fellow, an IEEE Fellow, and an AT&T Fellow at Bell Labs for research in software metrics and testing as well as elected to the National Academy of Engineering. She is the author of over 130 papers in journals and refereed conference proceedings.” From there, it gets more impressive.

Having written a book myself, I can tell you it is definitely an achievement! Ruth Aylett’s popular work Robots: Bringing Intelligent Machines to Life certainly qualifies her to make this list.

I challenge all the readers out there to take a few minutes to note the achievements of women in technology and science in their life. A few weeks ago I posted a list of women who taught me science or technology, that may be an easier way for people to celebrate the day than researching the great women of science and technology….and so we will not see the same “top 10 women in science and technology” lists over and over today.

Chief Corporate Architect at Oracle, link .

Where MySQL fits within Oracle’s structure.

Oracle’s Strategy: Complete. Open. Integrated. (compare with MySQL’s strategy: Fast, Reliable, Easy to Use).

Most of the $$ spent by companies is not on software, but on integration. So Oracle makes software based on open standards that integrates well.

Most of the components talk to each other through open standards, so that customers can use other products, and standardize on the technology, which makes it much more likely that customers will continue to use Oracle.

Oracle invested heavily in open source even before the acquisition. Linux (Oracle Unbreakable Linux = Oracle Enterprise Linux = OEL). Clustering, data integrity, storage validation, asynchronous I/O, virtualiation technology that has been accepted back into the Linux kernel. They have enhanced Xen, in order to make a good Oracle VM server for x86. With Sun, they now have VirtualBox. In the 3 years of OEL, they have over 4,500 companies.

Oracle never settles for being second best at any level of the stack.
“Complete” means we meet most customer requirements at every level.
That’s why Oracle matters to Oracle and Oracle customers.

MySQL is small, lightweight, easy to install and easy to manage. These are different from Oracle, so MySQL is the RIGHT choice for many applications, so by adding MySQL to Oracle’s database offerings, it makes the Oracle solution more complete.

Investing in MySQL means:
making MySQL a better MySQL. Keep MySQL the #1 db for web apps.
improve enginnering consulting and support
24×7, world-class oracle support

MySQL community edition: “If we stop investing in the community edition, MySQL will stop being ubiquitous”.

They want to focus even more effort on:
web
embedded
telecom
integration with other products in the LAMP stack
Windows — #1 download platform is Windows, but it’s not the #1 *deployment* platform.

They want to invest more money in allowing Oracle tools to work with MySQL too. For example, Oracle Enterprise Manager for monitoring, Oracle Secure Backup for backups, and Oracle Audit Vault for auditing. (Pythian already has a free Oracle Grid Control plugin to monitor MySQL).

Oracle will keep pluggable storage engine API, they are starting a Storage Engine Advisory Board to talk about their requirements and experiences and future plans and product direction.

MySQL 5.5 is beta, that’s the big news. InnoDB is the default storage engine there.

5.5 is much faster….including more than 10x improvement in recovery times. There’s a 200% read-only 200% performance gain. Read/Write performance gain is 364% faster than MySQL 5.1.40. These are for large # of concurrent connections, like 1024 connections.

Better object/connection management, database administration, data modelling in MySQL workbench.

MySQL Cluster 7.1, improved administration, higher performance, java connectors, carrier grade availability and performance. “Extreme availability”.

They’re also making support better — MySQL Enterprise — bettter.

MySQL Enterprise Backup – formerly InnoDB hot backup. This is now included in MySQL Enterprise, not a separately paid for feature.

(Demo of MySQL enterprise manager)

In conclusion:
MySQL is important to Oracle and our customers — it’s part of Oracle’s complete, open, integrated strategy. Oracle is making MySQL better TODAY. A “come to Oracle OpenWorld pitch (I’ve been, it certainly is a great conference.)

I am happy and proud to announce that there will be an Open Database Camp held at this year’s Northeast LinuxFest! The venue is at Harvard University in Cambridge, men’s health condom MA (“our fair city”), information pills and will take place Saturday, March 16 and Sunday, March 17, 2013.

Northeast LinuxFest and Open Database Camp are both free, but there is no reciprocal membership. To register for Open Database Camp, just sign up with Eventbrite. We are also soliciting session ideas ahead of time, and attendees will choose sessions during the Saturday morning planning session, as usual for Open DB Camp.

If you are interested in sponsoring, do so directly to Northeast LinuxFest and let them know it’s for Open Database Camp!

Open Database Camp is for all open databases – whether it’s MySQL, Postgres, NoSQL, been around for years or something you’re thinking about. You can see previous session ideas at the OpenSQLCamp website.

Being the end of the quarter, audiologist which we have successfully done. We only have one server left on MySQL 5.0, order and that has a compatible MySQL 5.1 server waiting for a few developers to get back from their well-deserved vacations to migrate off. In December, dosage we finished upgrading 2 servers to MySQL 5.1.

– Looked at the top 30 Bugzilla queries and started to give optimization tips for MySQL.
– Did our regular purge/defrag of TinderBox PushLog.
– Worked on integrating our datazilla code with chart.io features.
– Helped change the data model for datazilla.
– Moved some Bugzilla tables to a different partition when the data partition filled up. There is a plan to upgrade but we had an immediate need for the move.
– Upgraded one of the Bugzilla slaves to MariaDB 5.5.
– Refreshed the support staging database with production data.
– Added grants for metrics users to support new Bugzilla custom fields.
– Did some research on whether SSDs were good enough for the addons database or if we really needed Fusion I/O. (conclusion: SSD’s are good enough! The driver for this was cost of larger Fusion I/O disks, and starting to worry about space on the database systems.)
– Found a bug in new build code for the builder that builds Firefox, that would effectively stop updated builds from being recorded in the builder database. The bug was found in development, the code itself is not in production yet, but there were several hours of database debugging to figure out the problem.
– Built a new database server for backups that does not depend on NFS.
– Implemented checksum checking on several more MySQL clusters to ensure the data on the master and slaves match.
– Created databases for Game On.
– Optimized a query for a Firefox build system component (clobberer).
– Installed larger disks on our production Postgres failover server. We will be testing failover and adding more disks to the current master server in Q1.
– Created a database cluster for the main Mozilla website for failover.
– Cleaned up replication on a cluster after a power problem caused the master to crash.
– Added a Nagios check that uses pt-config-diff to all our MySQL servers to ensure that we know whenever the running MySQL configuration does not match the /etc/my.cnf file.
– Dealt with a set of queries breaking replication due to not being inside a transaction.
– Dealt with a schema change for Bugzilla taking a long time, and took slaves out of the load balancer one at a time to let the ALTER TABLE complete without read queries getting locked and causing slow responsiveness on the database servers.
– Created read-only database logins for the administrators of Air Mozilla so they can better debug problems.
– Imported some data for Graphs.
– Audited the Persona/BrowserID databases to get them ready for prime time (these databases are not normally managed by the DB team).
– Did a security review for our Engagement team to get reports of Mozillians emails for sending out information to registered and vouched Mozillians.
– Added documentation for 11 Nagios checks related to MySQL and Postgres.
– Researched the Zero Day Exploits for MySQL to see if Mozilla was affected.
– Puppetized the postgresql.conf files for all our postgres servers.
– Puppetized our datazilla database servers.
– Puppetiezed database servers for web development and for internal tools.
– We sized MySQL database machines for the Platform as a Service (PaaS) platform that the webops team will be implementing. The next step is ordering the hardware!

Under planning – we have done a lot in 2012 to stabilize our MySQL environment and have a good, sane centralized puppet configuration for control of MySQL packages, configuration files, scripts and backups. 2013 will be the year we do the same with Postgres:
– Stabilizing Postgres
– Streamlining Postgres configuration and installation and upgrading with puppet
– Reconfiguring Postgres logging
– Stabilizing Postgres backups

There are plenty of great things that will happen in 2013 from the Mozilla Database Team for both MySQL and Postgres databases!

For the past few days, pilule I have been upgrading a few servers. We are going from Percona’s patched MySQL 5.1 to MariaDB 5.5 (the subquery optimization is excellent, pregnancy and we have lots of subqueries). Our process involves upgrading a slave first, and letting it replicate for a while, and if everything’s good, update more slaves, then the master.

This has served us well in the past. And now that we are checking data integrity between masters and slaves with pt-table-checksum, the process involves checking before we start that there is data integrity. This is easy, as we checksum twice daily and have a Nagios alert if there are any discrepancies. After the upgrade, we checksum again, to be sure no data has been changed/corrupted in the process of doing a mysqldump export and import.*

Much to my surprise, after importing the data on one of our dev servers, I found that there were a lot of discrepancies. So I picked a chunk to do some comparisons on, and found something interesting:

On Server version: 5.1.65-rel14.0-log Percona Server (GPL), 14.0, Revision 475:
mysql> select float_field from db.tbl where id=218964;
+-------------+
| float_field |
+-------------+
| 9.58084e-05 |
+-------------+
1 row in set (0.04 sec)

On Server version: 5.5.28a-MariaDB-log MariaDB Server
MariaDB [(none)]> select float_field from db.tbl where id=218964;
+--------------+
| float_field |
+--------------+
| 0.0000958084 |
+--------------+
1 row in set (0.24 sec)

Which of course causes a different checksum. I tried SELECTing the values, casting and converting them, but I could not get them to change in the database. MySQL 5.1 insists on storing in scientific notation, and MariaDB 5.5 (and MySQL 5.5, we tested it out) insists on storing without scientific notation.

Frankly, I’m surprised this has not come up before (I did lots of querying Google for MySQL 5.5 and scientific notation), since it radically changes how numbers look when they are stored and retrieved. I guess code does the right thing…except for pt-table-checksum, and I cannot really blame it.

In the end, I used the –ignore-columns option to pt-table-checksum, with the result of:

SELECT GROUP_CONCAT(DISTINCT COLUMN_NAME) FROM INFORMATION_SCHEMA.COLUMNS WHERE DATA_TYPE IN ('float','double') AND TABLE_SCHEMA NOT IN ('mysql','information_schema','performance_schema');

In this way, I can get an accurate checksum to see if anything has changed, before I mark that the upgrade is complete on this slave server.

* This is just on the first slave. After the first slave is upgraded, we use xtrabackup to copy the data to another server to upgrade it.

First off, public health everyone I know that’s a good MySQL DBA already has a job — myself included. Occasionally I know of someone looking for a job, infertility but more often than not, they end up finding a job rather quickly.

Obviously the best way to find people is word-of-mouth, and the next best way is to find an expert in the field and ask them who they recommend. I am flattered that you consider me an expert and are asking me! If I know of someone, I will definitely let you know. If not, I will probably direct you here.

So, what now? Well, the more people you contact, the better. Finding experts is the right step, and finding people that they know, who are interested in MySQL, is another right step. To that end, first consider your audience — do you want someone who also has skills as a developer? As a sysadmin? As a manager? Find “groups of experts” — or at least “groups of eager learners” near you.

Also, consider what you need. You may think you need “a fulltime DBA” — but what do you really need? Maybe what you need is “someone to make sure backups are running smoothly, help developers write new queries and optimize older ones, and be on call 24/7 for troubleshooting.”

One thing to consider is a consulting firm — particularly if you are having trouble getting headcount. Even if you’re not, though, you can ease into having a DBA, ramping up as needed. For instance, start a consultant on one project, and throw others at him/her as they come up. A full-time DBA might be bored in the first month unless you have a training program for him/her.

So consider a consultant — at the very least they can help fill in the gap while you are on your search for a great DBA. I am a big fan of giving back to the community, so consider MySQL’s own consulting at http://www.mysql.com/consulting/, or the Pythian Group which publishes the Log Buffer each week, or Paragon Consulting, which publishes the MySQL Magazine. Or, of course, any of the bloggers on http://www.planetmysql.org that have consulting firms are good choices too.

http://www.google.com/search?q=mysql+consulting

There is a job board that’s specific to MySQL and Oracle at http://www.toomanyconnections.com. The first place to look for a location-specific full-time DBA is the MySQL User Group near the hiring location:

http://forge.mysql.com/wiki/List_of_MySQL_User_Groups

Contact the leader of the group, saying you have a job opening, and ask if there’s an appropriate method to contact the group. Some leaders make the announcements themselves, others allow posting on a message board. If you are an agency, be upfront about it; if you’re not, also mention “this is for my company, I am not a headhunter” or similar language.

Most group leaders are looking to do less work, and the least work possible is to have you come to a meeting and announce your job opening, so any questions can be answered by you right away. If, of course, that’s allowed.

My suggestions if you’re looking to hire in the Boston area. These are easily translated to looking for folks in your area:

Attend the Boston MySQL User Group and make an announcement. Boston MySQL user group meetings are usually held on the 2nd Monday of every month, but check the calendar to be sure — http://mysql.meetup.com/137/calendar/

As I am the leader of the Boston MySQL User Group, I will say that you may also post it to the User Group’s message board at http://mysql.meetup.com/137/boards — a note of caution, about once a week a job is posted there, so it’s really better to come to the meeting in person — you distinguish yourself. If you can’t attend, feel free to send me a description, although I can say it’s better to go in person, because all I know is what you give me, and if a person has a question you’re in a better position to answer it than I am.

Another option is similar groups. I can personally recommend both these groups for high quality people (in general) and you can say that I recommended the groups:

BLU, the Boston Linux and Unix Users group: http://www.blu.org/
and
BBLISA, http://www.bblisa.org/

In all cases, going there in person gives you a lot more cachet to talk about stuff; e-mailing the group leaders asking if you can come and announce your job opening is not a bad thing. (but do stay for the whole presentation; bring your laptop, everyone does, and work on something else if you want, but it’s polite to stay).

Other pages with lists of user groups that might be helpful:

http://web.meetup.com/cities/us/ma/boston/?from=loc_pick

http://web.mit.edu/ist/usergroups/
(this is a list of all User Groups that MIT hosts, so some are not relevant at all)

I hope this helps!

Today is a day to “draw attention to achievements of women in technology.”

So here I am, more about drawing some attention 🙂 All the names contain links to learn more (mostly Wikipedia links), so if you are so inclined to do so, you can learn more (you could start at Wikipedia’s article on women in computing). Perhaps you will realize that there are lots of women in technology already, more than you first thought.

That being said, this is by no means a comprehensive list.


Of course, there’s Ada Lovelace herself, but I am focusing on women still alive today (although I do have to mention Grace Hopper, who coined the term “debugging”). As well, I might mention the amazing Allison Randal, well-known in the Perl community and one of the major organizers of OSCon. But I do want to focus on some of the great achievements of lesser-known women, because we are indeed hiding (in plain sight!) everywhere.

Did you like Apple’s Newton PDA? Many people believe it was (and still is) one of the best-designed PDA’s. Donna Auguste helped develop it.

Ever played the video game Centipede? Thank Dona Bailey. Corrinne Yu has done a lot of work in the gaming field, currently a Halo lead at Microsoft.

Wireshark and Ethereal, two of the more popular security tools, were written by Angela Orebaugh.

The first commercial website is credited to Jennifer Niederst Robbins, who designed the Global Network Navigator.

Mary Ann Davidson is the Chief Security Officer at Oracle.

Lynne Jolitz helped develop 386BSD.

Wendy Hall, current president of the ACM (since 2008).

IBM Master Inventor Amanda Chessell.

Elaine Weyuker’s Wikipedia page starts out with “Elaine J. Weyuker is an ACM Fellow, an IEEE Fellow, and an AT&T Fellow at Bell Labs for research in software metrics and testing as well as elected to the National Academy of Engineering. She is the author of over 130 papers in journals and refereed conference proceedings.” From there, it gets more impressive.

Having written a book myself, I can tell you it is definitely an achievement! Ruth Aylett’s popular work Robots: Bringing Intelligent Machines to Life certainly qualifies her to make this list.

I challenge all the readers out there to take a few minutes to note the achievements of women in technology and science in their life. A few weeks ago I posted a list of women who taught me science or technology, that may be an easier way for people to celebrate the day than researching the great women of science and technology….and so we will not see the same “top 10 women in science and technology” lists over and over today.

Chief Corporate Architect at Oracle, link .

Where MySQL fits within Oracle’s structure.

Oracle’s Strategy: Complete. Open. Integrated. (compare with MySQL’s strategy: Fast, Reliable, Easy to Use).

Most of the $$ spent by companies is not on software, but on integration. So Oracle makes software based on open standards that integrates well.

Most of the components talk to each other through open standards, so that customers can use other products, and standardize on the technology, which makes it much more likely that customers will continue to use Oracle.

Oracle invested heavily in open source even before the acquisition. Linux (Oracle Unbreakable Linux = Oracle Enterprise Linux = OEL). Clustering, data integrity, storage validation, asynchronous I/O, virtualiation technology that has been accepted back into the Linux kernel. They have enhanced Xen, in order to make a good Oracle VM server for x86. With Sun, they now have VirtualBox. In the 3 years of OEL, they have over 4,500 companies.

Oracle never settles for being second best at any level of the stack.
“Complete” means we meet most customer requirements at every level.
That’s why Oracle matters to Oracle and Oracle customers.

MySQL is small, lightweight, easy to install and easy to manage. These are different from Oracle, so MySQL is the RIGHT choice for many applications, so by adding MySQL to Oracle’s database offerings, it makes the Oracle solution more complete.

Investing in MySQL means:
making MySQL a better MySQL. Keep MySQL the #1 db for web apps.
improve enginnering consulting and support
24×7, world-class oracle support

MySQL community edition: “If we stop investing in the community edition, MySQL will stop being ubiquitous”.

They want to focus even more effort on:
web
embedded
telecom
integration with other products in the LAMP stack
Windows — #1 download platform is Windows, but it’s not the #1 *deployment* platform.

They want to invest more money in allowing Oracle tools to work with MySQL too. For example, Oracle Enterprise Manager for monitoring, Oracle Secure Backup for backups, and Oracle Audit Vault for auditing. (Pythian already has a free Oracle Grid Control plugin to monitor MySQL).

Oracle will keep pluggable storage engine API, they are starting a Storage Engine Advisory Board to talk about their requirements and experiences and future plans and product direction.

MySQL 5.5 is beta, that’s the big news. InnoDB is the default storage engine there.

5.5 is much faster….including more than 10x improvement in recovery times. There’s a 200% read-only 200% performance gain. Read/Write performance gain is 364% faster than MySQL 5.1.40. These are for large # of concurrent connections, like 1024 connections.

Better object/connection management, database administration, data modelling in MySQL workbench.

MySQL Cluster 7.1, improved administration, higher performance, java connectors, carrier grade availability and performance. “Extreme availability”.

They’re also making support better — MySQL Enterprise — bettter.

MySQL Enterprise Backup – formerly InnoDB hot backup. This is now included in MySQL Enterprise, not a separately paid for feature.

(Demo of MySQL enterprise manager)

In conclusion:
MySQL is important to Oracle and our customers — it’s part of Oracle’s complete, open, integrated strategy. Oracle is making MySQL better TODAY. A “come to Oracle OpenWorld pitch (I’ve been, it certainly is a great conference.)

This is not my notes about the MySQL conference that just occurred. These are my thoughts about MySQL conferences in general. Baron wrote in The History of OpenSQL Camp:

After O’Reilly/MySQL co-hosted MySQL Conference and Expo (a large commercial event) that year, pharm there was a bit of dissatisfaction amongst a few people about the increasingly commercial and marketing-oriented nature of that conference. Some people refused to call the conference by its new name (Conference and Expo) and wanted to put pressure on MySQL to keep it a MySQL User’s Conference.

During this year’s conference, and whether or not Oracle would decide to sponsor. I heard all of the following (in no particular order):

* If O’Reilly does not have a conference, what will we do?
* Maybe [http://www.opensqlcamp.org OpenSQLCamp] can be bigger instead of having an O’Reilly conference, because the O’Reilly conference is more commercial.
* If Oracle does not sponsor the O’Reilly conference, it means they don’t care about MySQL/the MySQL community.
* If Oracle sponsors the O’Reilly conference, they’ll ruin it by making it even more commercial.
* Oracle shouldn’t sponsor the O’Reilly conference, they should make a different technical conference, in a different hotel/location and bigger (6,000 people instead of 2,000).
* Oracle shouldn’t make their own technical conference for MySQL, they should let user groups get together and then sponsor it, like they do with Collaborate.

Obviously there are mixed messages here — I don’t see any clear directive from the community. Plenty of people have a strong opinion. What I do see happening is that there will probably be plenty of options:

I know that OpenSQLCamp is not dead — there will be 2 this year, check the website for details.

I also know know that there will be a *real* MySQL track at Oracle OpenWorld — there was a rumor that the number of sessions would be fewer than 5, but sources on the inside have said that will not be the case.

I also know that we will hear from O’Reilly in the next few months about next year’s MySQL conference.

So, regardless of what happens, the nay-sayers will say how awful it is, and the pollyannas will say how great it is. There are plenty of reasons that each scenario is good and bad; so keep that in mind.

I am happy and proud to announce that there will be an Open Database Camp held at this year’s Northeast LinuxFest! The venue is at Harvard University in Cambridge, men’s health condom MA (“our fair city”), information pills and will take place Saturday, March 16 and Sunday, March 17, 2013.

Northeast LinuxFest and Open Database Camp are both free, but there is no reciprocal membership. To register for Open Database Camp, just sign up with Eventbrite. We are also soliciting session ideas ahead of time, and attendees will choose sessions during the Saturday morning planning session, as usual for Open DB Camp.

If you are interested in sponsoring, do so directly to Northeast LinuxFest and let them know it’s for Open Database Camp!

Open Database Camp is for all open databases – whether it’s MySQL, Postgres, NoSQL, been around for years or something you’re thinking about. You can see previous session ideas at the OpenSQLCamp website.

Being the end of the quarter, audiologist which we have successfully done. We only have one server left on MySQL 5.0, order and that has a compatible MySQL 5.1 server waiting for a few developers to get back from their well-deserved vacations to migrate off. In December, dosage we finished upgrading 2 servers to MySQL 5.1.

– Looked at the top 30 Bugzilla queries and started to give optimization tips for MySQL.
– Did our regular purge/defrag of TinderBox PushLog.
– Worked on integrating our datazilla code with chart.io features.
– Helped change the data model for datazilla.
– Moved some Bugzilla tables to a different partition when the data partition filled up. There is a plan to upgrade but we had an immediate need for the move.
– Upgraded one of the Bugzilla slaves to MariaDB 5.5.
– Refreshed the support staging database with production data.
– Added grants for metrics users to support new Bugzilla custom fields.
– Did some research on whether SSDs were good enough for the addons database or if we really needed Fusion I/O. (conclusion: SSD’s are good enough! The driver for this was cost of larger Fusion I/O disks, and starting to worry about space on the database systems.)
– Found a bug in new build code for the builder that builds Firefox, that would effectively stop updated builds from being recorded in the builder database. The bug was found in development, the code itself is not in production yet, but there were several hours of database debugging to figure out the problem.
– Built a new database server for backups that does not depend on NFS.
– Implemented checksum checking on several more MySQL clusters to ensure the data on the master and slaves match.
– Created databases for Game On.
– Optimized a query for a Firefox build system component (clobberer).
– Installed larger disks on our production Postgres failover server. We will be testing failover and adding more disks to the current master server in Q1.
– Created a database cluster for the main Mozilla website for failover.
– Cleaned up replication on a cluster after a power problem caused the master to crash.
– Added a Nagios check that uses pt-config-diff to all our MySQL servers to ensure that we know whenever the running MySQL configuration does not match the /etc/my.cnf file.
– Dealt with a set of queries breaking replication due to not being inside a transaction.
– Dealt with a schema change for Bugzilla taking a long time, and took slaves out of the load balancer one at a time to let the ALTER TABLE complete without read queries getting locked and causing slow responsiveness on the database servers.
– Created read-only database logins for the administrators of Air Mozilla so they can better debug problems.
– Imported some data for Graphs.
– Audited the Persona/BrowserID databases to get them ready for prime time (these databases are not normally managed by the DB team).
– Did a security review for our Engagement team to get reports of Mozillians emails for sending out information to registered and vouched Mozillians.
– Added documentation for 11 Nagios checks related to MySQL and Postgres.
– Researched the Zero Day Exploits for MySQL to see if Mozilla was affected.
– Puppetized the postgresql.conf files for all our postgres servers.
– Puppetized our datazilla database servers.
– Puppetiezed database servers for web development and for internal tools.
– We sized MySQL database machines for the Platform as a Service (PaaS) platform that the webops team will be implementing. The next step is ordering the hardware!

Under planning – we have done a lot in 2012 to stabilize our MySQL environment and have a good, sane centralized puppet configuration for control of MySQL packages, configuration files, scripts and backups. 2013 will be the year we do the same with Postgres:
– Stabilizing Postgres
– Streamlining Postgres configuration and installation and upgrading with puppet
– Reconfiguring Postgres logging
– Stabilizing Postgres backups

There are plenty of great things that will happen in 2013 from the Mozilla Database Team for both MySQL and Postgres databases!

For the past few days, pilule I have been upgrading a few servers. We are going from Percona’s patched MySQL 5.1 to MariaDB 5.5 (the subquery optimization is excellent, pregnancy and we have lots of subqueries). Our process involves upgrading a slave first, and letting it replicate for a while, and if everything’s good, update more slaves, then the master.

This has served us well in the past. And now that we are checking data integrity between masters and slaves with pt-table-checksum, the process involves checking before we start that there is data integrity. This is easy, as we checksum twice daily and have a Nagios alert if there are any discrepancies. After the upgrade, we checksum again, to be sure no data has been changed/corrupted in the process of doing a mysqldump export and import.*

Much to my surprise, after importing the data on one of our dev servers, I found that there were a lot of discrepancies. So I picked a chunk to do some comparisons on, and found something interesting:

On Server version: 5.1.65-rel14.0-log Percona Server (GPL), 14.0, Revision 475:
mysql> select float_field from db.tbl where id=218964;
+-------------+
| float_field |
+-------------+
| 9.58084e-05 |
+-------------+
1 row in set (0.04 sec)

On Server version: 5.5.28a-MariaDB-log MariaDB Server
MariaDB [(none)]> select float_field from db.tbl where id=218964;
+--------------+
| float_field |
+--------------+
| 0.0000958084 |
+--------------+
1 row in set (0.24 sec)

Which of course causes a different checksum. I tried SELECTing the values, casting and converting them, but I could not get them to change in the database. MySQL 5.1 insists on storing in scientific notation, and MariaDB 5.5 (and MySQL 5.5, we tested it out) insists on storing without scientific notation.

Frankly, I’m surprised this has not come up before (I did lots of querying Google for MySQL 5.5 and scientific notation), since it radically changes how numbers look when they are stored and retrieved. I guess code does the right thing…except for pt-table-checksum, and I cannot really blame it.

In the end, I used the –ignore-columns option to pt-table-checksum, with the result of:

SELECT GROUP_CONCAT(DISTINCT COLUMN_NAME) FROM INFORMATION_SCHEMA.COLUMNS WHERE DATA_TYPE IN ('float','double') AND TABLE_SCHEMA NOT IN ('mysql','information_schema','performance_schema');

In this way, I can get an accurate checksum to see if anything has changed, before I mark that the upgrade is complete on this slave server.

* This is just on the first slave. After the first slave is upgraded, we use xtrabackup to copy the data to another server to upgrade it.

First off, public health everyone I know that’s a good MySQL DBA already has a job — myself included. Occasionally I know of someone looking for a job, infertility but more often than not, they end up finding a job rather quickly.

Obviously the best way to find people is word-of-mouth, and the next best way is to find an expert in the field and ask them who they recommend. I am flattered that you consider me an expert and are asking me! If I know of someone, I will definitely let you know. If not, I will probably direct you here.

So, what now? Well, the more people you contact, the better. Finding experts is the right step, and finding people that they know, who are interested in MySQL, is another right step. To that end, first consider your audience — do you want someone who also has skills as a developer? As a sysadmin? As a manager? Find “groups of experts” — or at least “groups of eager learners” near you.

Also, consider what you need. You may think you need “a fulltime DBA” — but what do you really need? Maybe what you need is “someone to make sure backups are running smoothly, help developers write new queries and optimize older ones, and be on call 24/7 for troubleshooting.”

One thing to consider is a consulting firm — particularly if you are having trouble getting headcount. Even if you’re not, though, you can ease into having a DBA, ramping up as needed. For instance, start a consultant on one project, and throw others at him/her as they come up. A full-time DBA might be bored in the first month unless you have a training program for him/her.

So consider a consultant — at the very least they can help fill in the gap while you are on your search for a great DBA. I am a big fan of giving back to the community, so consider MySQL’s own consulting at http://www.mysql.com/consulting/, or the Pythian Group which publishes the Log Buffer each week, or Paragon Consulting, which publishes the MySQL Magazine. Or, of course, any of the bloggers on http://www.planetmysql.org that have consulting firms are good choices too.

http://www.google.com/search?q=mysql+consulting

There is a job board that’s specific to MySQL and Oracle at http://www.toomanyconnections.com. The first place to look for a location-specific full-time DBA is the MySQL User Group near the hiring location:

http://forge.mysql.com/wiki/List_of_MySQL_User_Groups

Contact the leader of the group, saying you have a job opening, and ask if there’s an appropriate method to contact the group. Some leaders make the announcements themselves, others allow posting on a message board. If you are an agency, be upfront about it; if you’re not, also mention “this is for my company, I am not a headhunter” or similar language.

Most group leaders are looking to do less work, and the least work possible is to have you come to a meeting and announce your job opening, so any questions can be answered by you right away. If, of course, that’s allowed.

My suggestions if you’re looking to hire in the Boston area. These are easily translated to looking for folks in your area:

Attend the Boston MySQL User Group and make an announcement. Boston MySQL user group meetings are usually held on the 2nd Monday of every month, but check the calendar to be sure — http://mysql.meetup.com/137/calendar/

As I am the leader of the Boston MySQL User Group, I will say that you may also post it to the User Group’s message board at http://mysql.meetup.com/137/boards — a note of caution, about once a week a job is posted there, so it’s really better to come to the meeting in person — you distinguish yourself. If you can’t attend, feel free to send me a description, although I can say it’s better to go in person, because all I know is what you give me, and if a person has a question you’re in a better position to answer it than I am.

Another option is similar groups. I can personally recommend both these groups for high quality people (in general) and you can say that I recommended the groups:

BLU, the Boston Linux and Unix Users group: http://www.blu.org/
and
BBLISA, http://www.bblisa.org/

In all cases, going there in person gives you a lot more cachet to talk about stuff; e-mailing the group leaders asking if you can come and announce your job opening is not a bad thing. (but do stay for the whole presentation; bring your laptop, everyone does, and work on something else if you want, but it’s polite to stay).

Other pages with lists of user groups that might be helpful:

http://web.meetup.com/cities/us/ma/boston/?from=loc_pick

http://web.mit.edu/ist/usergroups/
(this is a list of all User Groups that MIT hosts, so some are not relevant at all)

I hope this helps!

Today is a day to “draw attention to achievements of women in technology.”

So here I am, more about drawing some attention 🙂 All the names contain links to learn more (mostly Wikipedia links), so if you are so inclined to do so, you can learn more (you could start at Wikipedia’s article on women in computing). Perhaps you will realize that there are lots of women in technology already, more than you first thought.

That being said, this is by no means a comprehensive list.


Of course, there’s Ada Lovelace herself, but I am focusing on women still alive today (although I do have to mention Grace Hopper, who coined the term “debugging”). As well, I might mention the amazing Allison Randal, well-known in the Perl community and one of the major organizers of OSCon. But I do want to focus on some of the great achievements of lesser-known women, because we are indeed hiding (in plain sight!) everywhere.

Did you like Apple’s Newton PDA? Many people believe it was (and still is) one of the best-designed PDA’s. Donna Auguste helped develop it.

Ever played the video game Centipede? Thank Dona Bailey. Corrinne Yu has done a lot of work in the gaming field, currently a Halo lead at Microsoft.

Wireshark and Ethereal, two of the more popular security tools, were written by Angela Orebaugh.

The first commercial website is credited to Jennifer Niederst Robbins, who designed the Global Network Navigator.

Mary Ann Davidson is the Chief Security Officer at Oracle.

Lynne Jolitz helped develop 386BSD.

Wendy Hall, current president of the ACM (since 2008).

IBM Master Inventor Amanda Chessell.

Elaine Weyuker’s Wikipedia page starts out with “Elaine J. Weyuker is an ACM Fellow, an IEEE Fellow, and an AT&T Fellow at Bell Labs for research in software metrics and testing as well as elected to the National Academy of Engineering. She is the author of over 130 papers in journals and refereed conference proceedings.” From there, it gets more impressive.

Having written a book myself, I can tell you it is definitely an achievement! Ruth Aylett’s popular work Robots: Bringing Intelligent Machines to Life certainly qualifies her to make this list.

I challenge all the readers out there to take a few minutes to note the achievements of women in technology and science in their life. A few weeks ago I posted a list of women who taught me science or technology, that may be an easier way for people to celebrate the day than researching the great women of science and technology….and so we will not see the same “top 10 women in science and technology” lists over and over today.

Chief Corporate Architect at Oracle, link .

Where MySQL fits within Oracle’s structure.

Oracle’s Strategy: Complete. Open. Integrated. (compare with MySQL’s strategy: Fast, Reliable, Easy to Use).

Most of the $$ spent by companies is not on software, but on integration. So Oracle makes software based on open standards that integrates well.

Most of the components talk to each other through open standards, so that customers can use other products, and standardize on the technology, which makes it much more likely that customers will continue to use Oracle.

Oracle invested heavily in open source even before the acquisition. Linux (Oracle Unbreakable Linux = Oracle Enterprise Linux = OEL). Clustering, data integrity, storage validation, asynchronous I/O, virtualiation technology that has been accepted back into the Linux kernel. They have enhanced Xen, in order to make a good Oracle VM server for x86. With Sun, they now have VirtualBox. In the 3 years of OEL, they have over 4,500 companies.

Oracle never settles for being second best at any level of the stack.
“Complete” means we meet most customer requirements at every level.
That’s why Oracle matters to Oracle and Oracle customers.

MySQL is small, lightweight, easy to install and easy to manage. These are different from Oracle, so MySQL is the RIGHT choice for many applications, so by adding MySQL to Oracle’s database offerings, it makes the Oracle solution more complete.

Investing in MySQL means:
making MySQL a better MySQL. Keep MySQL the #1 db for web apps.
improve enginnering consulting and support
24×7, world-class oracle support

MySQL community edition: “If we stop investing in the community edition, MySQL will stop being ubiquitous”.

They want to focus even more effort on:
web
embedded
telecom
integration with other products in the LAMP stack
Windows — #1 download platform is Windows, but it’s not the #1 *deployment* platform.

They want to invest more money in allowing Oracle tools to work with MySQL too. For example, Oracle Enterprise Manager for monitoring, Oracle Secure Backup for backups, and Oracle Audit Vault for auditing. (Pythian already has a free Oracle Grid Control plugin to monitor MySQL).

Oracle will keep pluggable storage engine API, they are starting a Storage Engine Advisory Board to talk about their requirements and experiences and future plans and product direction.

MySQL 5.5 is beta, that’s the big news. InnoDB is the default storage engine there.

5.5 is much faster….including more than 10x improvement in recovery times. There’s a 200% read-only 200% performance gain. Read/Write performance gain is 364% faster than MySQL 5.1.40. These are for large # of concurrent connections, like 1024 connections.

Better object/connection management, database administration, data modelling in MySQL workbench.

MySQL Cluster 7.1, improved administration, higher performance, java connectors, carrier grade availability and performance. “Extreme availability”.

They’re also making support better — MySQL Enterprise — bettter.

MySQL Enterprise Backup – formerly InnoDB hot backup. This is now included in MySQL Enterprise, not a separately paid for feature.

(Demo of MySQL enterprise manager)

In conclusion:
MySQL is important to Oracle and our customers — it’s part of Oracle’s complete, open, integrated strategy. Oracle is making MySQL better TODAY. A “come to Oracle OpenWorld pitch (I’ve been, it certainly is a great conference.)

This is not my notes about the MySQL conference that just occurred. These are my thoughts about MySQL conferences in general. Baron wrote in The History of OpenSQL Camp:

After O’Reilly/MySQL co-hosted MySQL Conference and Expo (a large commercial event) that year, pharm there was a bit of dissatisfaction amongst a few people about the increasingly commercial and marketing-oriented nature of that conference. Some people refused to call the conference by its new name (Conference and Expo) and wanted to put pressure on MySQL to keep it a MySQL User’s Conference.

During this year’s conference, and whether or not Oracle would decide to sponsor. I heard all of the following (in no particular order):

* If O’Reilly does not have a conference, what will we do?
* Maybe [http://www.opensqlcamp.org OpenSQLCamp] can be bigger instead of having an O’Reilly conference, because the O’Reilly conference is more commercial.
* If Oracle does not sponsor the O’Reilly conference, it means they don’t care about MySQL/the MySQL community.
* If Oracle sponsors the O’Reilly conference, they’ll ruin it by making it even more commercial.
* Oracle shouldn’t sponsor the O’Reilly conference, they should make a different technical conference, in a different hotel/location and bigger (6,000 people instead of 2,000).
* Oracle shouldn’t make their own technical conference for MySQL, they should let user groups get together and then sponsor it, like they do with Collaborate.

Obviously there are mixed messages here — I don’t see any clear directive from the community. Plenty of people have a strong opinion. What I do see happening is that there will probably be plenty of options:

I know that OpenSQLCamp is not dead — there will be 2 this year, check the website for details.

I also know know that there will be a *real* MySQL track at Oracle OpenWorld — there was a rumor that the number of sessions would be fewer than 5, but sources on the inside have said that will not be the case.

I also know that we will hear from O’Reilly in the next few months about next year’s MySQL conference.

So, regardless of what happens, the nay-sayers will say how awful it is, and the pollyannas will say how great it is. There are plenty of reasons that each scenario is good and bad; so keep that in mind.

Here’s a sneak peek at a video matrix — this is all the videos that include Pythian Group employees at the MySQL conference. I hope to have all the rest of the videos processed and uploaded within 24 hours, find with a matrix similar to the one below (but of course with many more sessions).

TitlePresenterSlidesVideo link
(hr:min:sec)
Details (Conf. site link)
Main Stage
Keynote: Under New Management: Next Steps for the CommunitySheeri K. Cabral (Pythian)N/A18:16
session 14808
Ignite talk: MySQLtuner 2.0Sheeri K. Cabral (Pythian)PDF5:31N/A
Interview
Thoughts on Drizzle and MySQLSheeri K. Cabral (Pythian)N/A9:22N/A
Tutorials
MySQL Configuration Options and Files: Basic MySQL Variables (Part 1)Sheeri K. Cabral (Pythian)
PDF
MySQL Configuration Options and Files: Intermediate MySQL Variables (Part 2)Sheeri K. Cabral (Pythian)
PDF
1:25:04, pre-break

1:24:28, post-break
session 12435
Sessions
Better Database Debugging for Shorter DowntimesRob Hamel (Pythian)PDF33:13
session 13021
Find Query Problems Proactively With Query ReviewsSheeri K. Cabral (Pythian)PDF45:59session 13267
Time Zones and MySQLSheeri K. Cabral (Pythian)PDF45:54
session 12412
Security Around MySQLDanil Zburivsky (The Pythian Group)ODP37:27session 13458
Continual Replication SyncDanil Zburivsky (The Pythian Group)ODP45:57session 13428

I am happy and proud to announce that there will be an Open Database Camp held at this year’s Northeast LinuxFest! The venue is at Harvard University in Cambridge, men’s health condom MA (“our fair city”), information pills and will take place Saturday, March 16 and Sunday, March 17, 2013.

Northeast LinuxFest and Open Database Camp are both free, but there is no reciprocal membership. To register for Open Database Camp, just sign up with Eventbrite. We are also soliciting session ideas ahead of time, and attendees will choose sessions during the Saturday morning planning session, as usual for Open DB Camp.

If you are interested in sponsoring, do so directly to Northeast LinuxFest and let them know it’s for Open Database Camp!

Open Database Camp is for all open databases – whether it’s MySQL, Postgres, NoSQL, been around for years or something you’re thinking about. You can see previous session ideas at the OpenSQLCamp website.

Being the end of the quarter, audiologist which we have successfully done. We only have one server left on MySQL 5.0, order and that has a compatible MySQL 5.1 server waiting for a few developers to get back from their well-deserved vacations to migrate off. In December, dosage we finished upgrading 2 servers to MySQL 5.1.

– Looked at the top 30 Bugzilla queries and started to give optimization tips for MySQL.
– Did our regular purge/defrag of TinderBox PushLog.
– Worked on integrating our datazilla code with chart.io features.
– Helped change the data model for datazilla.
– Moved some Bugzilla tables to a different partition when the data partition filled up. There is a plan to upgrade but we had an immediate need for the move.
– Upgraded one of the Bugzilla slaves to MariaDB 5.5.
– Refreshed the support staging database with production data.
– Added grants for metrics users to support new Bugzilla custom fields.
– Did some research on whether SSDs were good enough for the addons database or if we really needed Fusion I/O. (conclusion: SSD’s are good enough! The driver for this was cost of larger Fusion I/O disks, and starting to worry about space on the database systems.)
– Found a bug in new build code for the builder that builds Firefox, that would effectively stop updated builds from being recorded in the builder database. The bug was found in development, the code itself is not in production yet, but there were several hours of database debugging to figure out the problem.
– Built a new database server for backups that does not depend on NFS.
– Implemented checksum checking on several more MySQL clusters to ensure the data on the master and slaves match.
– Created databases for Game On.
– Optimized a query for a Firefox build system component (clobberer).
– Installed larger disks on our production Postgres failover server. We will be testing failover and adding more disks to the current master server in Q1.
– Created a database cluster for the main Mozilla website for failover.
– Cleaned up replication on a cluster after a power problem caused the master to crash.
– Added a Nagios check that uses pt-config-diff to all our MySQL servers to ensure that we know whenever the running MySQL configuration does not match the /etc/my.cnf file.
– Dealt with a set of queries breaking replication due to not being inside a transaction.
– Dealt with a schema change for Bugzilla taking a long time, and took slaves out of the load balancer one at a time to let the ALTER TABLE complete without read queries getting locked and causing slow responsiveness on the database servers.
– Created read-only database logins for the administrators of Air Mozilla so they can better debug problems.
– Imported some data for Graphs.
– Audited the Persona/BrowserID databases to get them ready for prime time (these databases are not normally managed by the DB team).
– Did a security review for our Engagement team to get reports of Mozillians emails for sending out information to registered and vouched Mozillians.
– Added documentation for 11 Nagios checks related to MySQL and Postgres.
– Researched the Zero Day Exploits for MySQL to see if Mozilla was affected.
– Puppetized the postgresql.conf files for all our postgres servers.
– Puppetized our datazilla database servers.
– Puppetiezed database servers for web development and for internal tools.
– We sized MySQL database machines for the Platform as a Service (PaaS) platform that the webops team will be implementing. The next step is ordering the hardware!

Under planning – we have done a lot in 2012 to stabilize our MySQL environment and have a good, sane centralized puppet configuration for control of MySQL packages, configuration files, scripts and backups. 2013 will be the year we do the same with Postgres:
– Stabilizing Postgres
– Streamlining Postgres configuration and installation and upgrading with puppet
– Reconfiguring Postgres logging
– Stabilizing Postgres backups

There are plenty of great things that will happen in 2013 from the Mozilla Database Team for both MySQL and Postgres databases!

For the past few days, pilule I have been upgrading a few servers. We are going from Percona’s patched MySQL 5.1 to MariaDB 5.5 (the subquery optimization is excellent, pregnancy and we have lots of subqueries). Our process involves upgrading a slave first, and letting it replicate for a while, and if everything’s good, update more slaves, then the master.

This has served us well in the past. And now that we are checking data integrity between masters and slaves with pt-table-checksum, the process involves checking before we start that there is data integrity. This is easy, as we checksum twice daily and have a Nagios alert if there are any discrepancies. After the upgrade, we checksum again, to be sure no data has been changed/corrupted in the process of doing a mysqldump export and import.*

Much to my surprise, after importing the data on one of our dev servers, I found that there were a lot of discrepancies. So I picked a chunk to do some comparisons on, and found something interesting:

On Server version: 5.1.65-rel14.0-log Percona Server (GPL), 14.0, Revision 475:
mysql> select float_field from db.tbl where id=218964;
+-------------+
| float_field |
+-------------+
| 9.58084e-05 |
+-------------+
1 row in set (0.04 sec)

On Server version: 5.5.28a-MariaDB-log MariaDB Server
MariaDB [(none)]> select float_field from db.tbl where id=218964;
+--------------+
| float_field |
+--------------+
| 0.0000958084 |
+--------------+
1 row in set (0.24 sec)

Which of course causes a different checksum. I tried SELECTing the values, casting and converting them, but I could not get them to change in the database. MySQL 5.1 insists on storing in scientific notation, and MariaDB 5.5 (and MySQL 5.5, we tested it out) insists on storing without scientific notation.

Frankly, I’m surprised this has not come up before (I did lots of querying Google for MySQL 5.5 and scientific notation), since it radically changes how numbers look when they are stored and retrieved. I guess code does the right thing…except for pt-table-checksum, and I cannot really blame it.

In the end, I used the –ignore-columns option to pt-table-checksum, with the result of:

SELECT GROUP_CONCAT(DISTINCT COLUMN_NAME) FROM INFORMATION_SCHEMA.COLUMNS WHERE DATA_TYPE IN ('float','double') AND TABLE_SCHEMA NOT IN ('mysql','information_schema','performance_schema');

In this way, I can get an accurate checksum to see if anything has changed, before I mark that the upgrade is complete on this slave server.

* This is just on the first slave. After the first slave is upgraded, we use xtrabackup to copy the data to another server to upgrade it.

First off, public health everyone I know that’s a good MySQL DBA already has a job — myself included. Occasionally I know of someone looking for a job, infertility but more often than not, they end up finding a job rather quickly.

Obviously the best way to find people is word-of-mouth, and the next best way is to find an expert in the field and ask them who they recommend. I am flattered that you consider me an expert and are asking me! If I know of someone, I will definitely let you know. If not, I will probably direct you here.

So, what now? Well, the more people you contact, the better. Finding experts is the right step, and finding people that they know, who are interested in MySQL, is another right step. To that end, first consider your audience — do you want someone who also has skills as a developer? As a sysadmin? As a manager? Find “groups of experts” — or at least “groups of eager learners” near you.

Also, consider what you need. You may think you need “a fulltime DBA” — but what do you really need? Maybe what you need is “someone to make sure backups are running smoothly, help developers write new queries and optimize older ones, and be on call 24/7 for troubleshooting.”

One thing to consider is a consulting firm — particularly if you are having trouble getting headcount. Even if you’re not, though, you can ease into having a DBA, ramping up as needed. For instance, start a consultant on one project, and throw others at him/her as they come up. A full-time DBA might be bored in the first month unless you have a training program for him/her.

So consider a consultant — at the very least they can help fill in the gap while you are on your search for a great DBA. I am a big fan of giving back to the community, so consider MySQL’s own consulting at http://www.mysql.com/consulting/, or the Pythian Group which publishes the Log Buffer each week, or Paragon Consulting, which publishes the MySQL Magazine. Or, of course, any of the bloggers on http://www.planetmysql.org that have consulting firms are good choices too.

http://www.google.com/search?q=mysql+consulting

There is a job board that’s specific to MySQL and Oracle at http://www.toomanyconnections.com. The first place to look for a location-specific full-time DBA is the MySQL User Group near the hiring location:

http://forge.mysql.com/wiki/List_of_MySQL_User_Groups

Contact the leader of the group, saying you have a job opening, and ask if there’s an appropriate method to contact the group. Some leaders make the announcements themselves, others allow posting on a message board. If you are an agency, be upfront about it; if you’re not, also mention “this is for my company, I am not a headhunter” or similar language.

Most group leaders are looking to do less work, and the least work possible is to have you come to a meeting and announce your job opening, so any questions can be answered by you right away. If, of course, that’s allowed.

My suggestions if you’re looking to hire in the Boston area. These are easily translated to looking for folks in your area:

Attend the Boston MySQL User Group and make an announcement. Boston MySQL user group meetings are usually held on the 2nd Monday of every month, but check the calendar to be sure — http://mysql.meetup.com/137/calendar/

As I am the leader of the Boston MySQL User Group, I will say that you may also post it to the User Group’s message board at http://mysql.meetup.com/137/boards — a note of caution, about once a week a job is posted there, so it’s really better to come to the meeting in person — you distinguish yourself. If you can’t attend, feel free to send me a description, although I can say it’s better to go in person, because all I know is what you give me, and if a person has a question you’re in a better position to answer it than I am.

Another option is similar groups. I can personally recommend both these groups for high quality people (in general) and you can say that I recommended the groups:

BLU, the Boston Linux and Unix Users group: http://www.blu.org/
and
BBLISA, http://www.bblisa.org/

In all cases, going there in person gives you a lot more cachet to talk about stuff; e-mailing the group leaders asking if you can come and announce your job opening is not a bad thing. (but do stay for the whole presentation; bring your laptop, everyone does, and work on something else if you want, but it’s polite to stay).

Other pages with lists of user groups that might be helpful:

http://web.meetup.com/cities/us/ma/boston/?from=loc_pick

http://web.mit.edu/ist/usergroups/
(this is a list of all User Groups that MIT hosts, so some are not relevant at all)

I hope this helps!

Today is a day to “draw attention to achievements of women in technology.”

So here I am, more about drawing some attention 🙂 All the names contain links to learn more (mostly Wikipedia links), so if you are so inclined to do so, you can learn more (you could start at Wikipedia’s article on women in computing). Perhaps you will realize that there are lots of women in technology already, more than you first thought.

That being said, this is by no means a comprehensive list.


Of course, there’s Ada Lovelace herself, but I am focusing on women still alive today (although I do have to mention Grace Hopper, who coined the term “debugging”). As well, I might mention the amazing Allison Randal, well-known in the Perl community and one of the major organizers of OSCon. But I do want to focus on some of the great achievements of lesser-known women, because we are indeed hiding (in plain sight!) everywhere.

Did you like Apple’s Newton PDA? Many people believe it was (and still is) one of the best-designed PDA’s. Donna Auguste helped develop it.

Ever played the video game Centipede? Thank Dona Bailey. Corrinne Yu has done a lot of work in the gaming field, currently a Halo lead at Microsoft.

Wireshark and Ethereal, two of the more popular security tools, were written by Angela Orebaugh.

The first commercial website is credited to Jennifer Niederst Robbins, who designed the Global Network Navigator.

Mary Ann Davidson is the Chief Security Officer at Oracle.

Lynne Jolitz helped develop 386BSD.

Wendy Hall, current president of the ACM (since 2008).

IBM Master Inventor Amanda Chessell.

Elaine Weyuker’s Wikipedia page starts out with “Elaine J. Weyuker is an ACM Fellow, an IEEE Fellow, and an AT&T Fellow at Bell Labs for research in software metrics and testing as well as elected to the National Academy of Engineering. She is the author of over 130 papers in journals and refereed conference proceedings.” From there, it gets more impressive.

Having written a book myself, I can tell you it is definitely an achievement! Ruth Aylett’s popular work Robots: Bringing Intelligent Machines to Life certainly qualifies her to make this list.

I challenge all the readers out there to take a few minutes to note the achievements of women in technology and science in their life. A few weeks ago I posted a list of women who taught me science or technology, that may be an easier way for people to celebrate the day than researching the great women of science and technology….and so we will not see the same “top 10 women in science and technology” lists over and over today.

Chief Corporate Architect at Oracle, link .

Where MySQL fits within Oracle’s structure.

Oracle’s Strategy: Complete. Open. Integrated. (compare with MySQL’s strategy: Fast, Reliable, Easy to Use).

Most of the $$ spent by companies is not on software, but on integration. So Oracle makes software based on open standards that integrates well.

Most of the components talk to each other through open standards, so that customers can use other products, and standardize on the technology, which makes it much more likely that customers will continue to use Oracle.

Oracle invested heavily in open source even before the acquisition. Linux (Oracle Unbreakable Linux = Oracle Enterprise Linux = OEL). Clustering, data integrity, storage validation, asynchronous I/O, virtualiation technology that has been accepted back into the Linux kernel. They have enhanced Xen, in order to make a good Oracle VM server for x86. With Sun, they now have VirtualBox. In the 3 years of OEL, they have over 4,500 companies.

Oracle never settles for being second best at any level of the stack.
“Complete” means we meet most customer requirements at every level.
That’s why Oracle matters to Oracle and Oracle customers.

MySQL is small, lightweight, easy to install and easy to manage. These are different from Oracle, so MySQL is the RIGHT choice for many applications, so by adding MySQL to Oracle’s database offerings, it makes the Oracle solution more complete.

Investing in MySQL means:
making MySQL a better MySQL. Keep MySQL the #1 db for web apps.
improve enginnering consulting and support
24×7, world-class oracle support

MySQL community edition: “If we stop investing in the community edition, MySQL will stop being ubiquitous”.

They want to focus even more effort on:
web
embedded
telecom
integration with other products in the LAMP stack
Windows — #1 download platform is Windows, but it’s not the #1 *deployment* platform.

They want to invest more money in allowing Oracle tools to work with MySQL too. For example, Oracle Enterprise Manager for monitoring, Oracle Secure Backup for backups, and Oracle Audit Vault for auditing. (Pythian already has a free Oracle Grid Control plugin to monitor MySQL).

Oracle will keep pluggable storage engine API, they are starting a Storage Engine Advisory Board to talk about their requirements and experiences and future plans and product direction.

MySQL 5.5 is beta, that’s the big news. InnoDB is the default storage engine there.

5.5 is much faster….including more than 10x improvement in recovery times. There’s a 200% read-only 200% performance gain. Read/Write performance gain is 364% faster than MySQL 5.1.40. These are for large # of concurrent connections, like 1024 connections.

Better object/connection management, database administration, data modelling in MySQL workbench.

MySQL Cluster 7.1, improved administration, higher performance, java connectors, carrier grade availability and performance. “Extreme availability”.

They’re also making support better — MySQL Enterprise — bettter.

MySQL Enterprise Backup – formerly InnoDB hot backup. This is now included in MySQL Enterprise, not a separately paid for feature.

(Demo of MySQL enterprise manager)

In conclusion:
MySQL is important to Oracle and our customers — it’s part of Oracle’s complete, open, integrated strategy. Oracle is making MySQL better TODAY. A “come to Oracle OpenWorld pitch (I’ve been, it certainly is a great conference.)

This is not my notes about the MySQL conference that just occurred. These are my thoughts about MySQL conferences in general. Baron wrote in The History of OpenSQL Camp:

After O’Reilly/MySQL co-hosted MySQL Conference and Expo (a large commercial event) that year, pharm there was a bit of dissatisfaction amongst a few people about the increasingly commercial and marketing-oriented nature of that conference. Some people refused to call the conference by its new name (Conference and Expo) and wanted to put pressure on MySQL to keep it a MySQL User’s Conference.

During this year’s conference, and whether or not Oracle would decide to sponsor. I heard all of the following (in no particular order):

* If O’Reilly does not have a conference, what will we do?
* Maybe [http://www.opensqlcamp.org OpenSQLCamp] can be bigger instead of having an O’Reilly conference, because the O’Reilly conference is more commercial.
* If Oracle does not sponsor the O’Reilly conference, it means they don’t care about MySQL/the MySQL community.
* If Oracle sponsors the O’Reilly conference, they’ll ruin it by making it even more commercial.
* Oracle shouldn’t sponsor the O’Reilly conference, they should make a different technical conference, in a different hotel/location and bigger (6,000 people instead of 2,000).
* Oracle shouldn’t make their own technical conference for MySQL, they should let user groups get together and then sponsor it, like they do with Collaborate.

Obviously there are mixed messages here — I don’t see any clear directive from the community. Plenty of people have a strong opinion. What I do see happening is that there will probably be plenty of options:

I know that OpenSQLCamp is not dead — there will be 2 this year, check the website for details.

I also know know that there will be a *real* MySQL track at Oracle OpenWorld — there was a rumor that the number of sessions would be fewer than 5, but sources on the inside have said that will not be the case.

I also know that we will hear from O’Reilly in the next few months about next year’s MySQL conference.

So, regardless of what happens, the nay-sayers will say how awful it is, and the pollyannas will say how great it is. There are plenty of reasons that each scenario is good and bad; so keep that in mind.

Here’s a sneak peek at a video matrix — this is all the videos that include Pythian Group employees at the MySQL conference. I hope to have all the rest of the videos processed and uploaded within 24 hours, find with a matrix similar to the one below (but of course with many more sessions).

TitlePresenterSlidesVideo link
(hr:min:sec)
Details (Conf. site link)
Main Stage
Keynote: Under New Management: Next Steps for the CommunitySheeri K. Cabral (Pythian)N/A18:16
session 14808
Ignite talk: MySQLtuner 2.0Sheeri K. Cabral (Pythian)PDF5:31N/A
Interview
Thoughts on Drizzle and MySQLSheeri K. Cabral (Pythian)N/A9:22N/A
Tutorials
MySQL Configuration Options and Files: Basic MySQL Variables (Part 1)Sheeri K. Cabral (Pythian)
PDF
MySQL Configuration Options and Files: Intermediate MySQL Variables (Part 2)Sheeri K. Cabral (Pythian)
PDF
1:25:04, pre-break

1:24:28, post-break
session 12435
Sessions
Better Database Debugging for Shorter DowntimesRob Hamel (Pythian)PDF33:13
session 13021
Find Query Problems Proactively With Query ReviewsSheeri K. Cabral (Pythian)PDF45:59session 13267
Time Zones and MySQLSheeri K. Cabral (Pythian)PDF45:54
session 12412
Security Around MySQLDanil Zburivsky (The Pythian Group)ODP37:27session 13458
Continual Replication SyncDanil Zburivsky (The Pythian Group)ODP45:57session 13428

Here’s a matrix of all the videos up on YouTube for the video link, resuscitator and link to the official conference detail page, where you can rate the session and provide feedback that the presenter will see. They are grouped mostly by topic, except for the main stage events (keynote, ignite) and interviews.

If there’s a detail missing (ie, slides, or there are other videos you know about), please add a comment so I can make this a complete matrix.






TitlePresenterSlidesVideo link
(hr:min:sec)
Details (Conf. site link)

Keynotes

State of the DolphinEdward Screven (Oracle)29:10session 12440
O’Reilly RadarTim O’Reilly36:38session 12441
MySQL at FacebookMark Callaghan (Facebook)21:05session 14841
State of MariaDBMonty Widenius (Monty Program Ab)41:54session 12443
State of DrizzleBrian Aker (Data Differential)44:58session 12442
Keynote: Under New Management: Next Steps for the CommunitySheeri K. Cabral (Pythian)18:16session 14808
State of the MySQL CommunityKaj Arnö (Sun Microsystems GmbH)38:06session 12498
The Engines of CommunityJono Bacon (Canonical, Ltd)47:51session 14796
The Best of Ignite MySQLSarah Novotny, Gerry Narvaja, Gillian Gunson, Mark Atwood23:25
RethinkDB: Why Start a New Database Company in 2010Slava Akhmechet (RethinkDB), Michael Glukhovsky (RethinkDB)44:49session 14891

Ignite Talks

TitlePresenterSlidesVideo link
(hr:min:sec)
Details (Conf. site link)
Backups Don’t Make Me MoneySarah Novotny (Blue Gecko)6:52Also in Best of Ignite
Calpont’s InfiniDBRobin Schumacher (Calpont)6:40
A Future [for MySQL]Mark Callaghan (Facebook)6:37/TD>
A Guide to NoSQLBrian Aker (Data Differential)6:27
Guide to NoSQL, reduxMark Atwood (Gear6)4:22Also in Best of Ignite
The Gunson Rules of Database AdministrationGillian Gunson6:08Also in Best of Ignite
MariaDB: 20 slides, 5 minutes, the full MontyMonty Widenius (Monty Program Ab)6:18
MySQLtuner 2.0Sheeri K. Cabral (Pythian)PDF5:31
“Painting” Data with Entrance (free) and MySQLTod Landis (dbEntrance Software)5:11
Three Basic Laws of DB DiagnosisGerry Narvaja (OpenMarket, Inc)2:33Also in Best of Ignite
What is the difference between XtraDB and others?Baron Schwartz (Percona)6:59
What is a Performance Model for SSD’s?Bradley C. Kuszmaul (Tokutek)7:00

Interviews

TitlePresenterSlidesVideo link
(hr:min:sec)
Details (Conf. site link)
Why having InnoDB and MySQL in the same company will improve performance, the way Drizzle leaves the past behind, and other issues in MySQL Development. Kaj Arnö (MySQL)10:40
What’s hard to optimize in MySQL, how they’ve improved performance, and what’s in the performance schema.Peter Gulutzan3:01
Write-scaling, MySQL performance in an EC2 cloud, why they wrote the book MySQL High Availability.Charles Bell, Mats Kindahl, and Lars Thalmann7:04
How third-party ads make web sites slow, why mobile devices are the next frontier in Web performance.Steve Souders, Web performance expert7:12
Attractions of Gearman, the adaptation of database technology to large multi-core and multi-node environments, and what relational databases are and are not great for.Brian Aker9:01
Thoughts on Drizzle and MySQLSheeri K. Cabral (Pythian)9:22
Democratic culture of Monty Program ABHenrik Ingo (Monty Program AB)2:20
Thoughts on democratic companies and his role is in coding and managementMonty Widenius (Monty Program Ab)8:52
How MariaDB emerged as a superset of MySQL, and development issues.Kurt von Finck (MontyProgram Ab)7:23

Tutorials

TitlePresenterSlidesVideo link
(hr:min:sec)
Details (Conf. site link)
MySQL Configuration Options and Files: Basic MySQL Variables (Part 1)Sheeri K. Cabral (Pythian)PDF1:25:04, pre-break

1:35:47, post-break
session 12408
MySQL Configuration Options and Files: Intermediate MySQL Variables (Part 2)Sheeri K. Cabral (Pythian)PDF1:25:04, pre-break

1:24:28, post-break
session 12435

Sessions

Performance

TitlePresenterSlidesVideo link
(hr:min:sec)
Details (Conf. site link)
Advanced Sharding Techniques with SpiderKentoku Shiba () and Daniel Saito (MySQL)39:55session 12619
Boosting Database Performance with GearmanEric Day (Rackspace Cloud), Giuseppe Maxia (MySQL)46:18session 13310
High Concurrency MySQLDomas Mituzas (Facebook)PDF49:53session 13285
High-throughput MySQLMark Callaghan (Facebook), Ryan Mack (Facebook), Ryan McElroy (Facebook)57:31session 13223
Introduction to InnoDB Monitoring System and Resource & Performance TuningJimmy Yang (Oracle Corporation)ZIP40:49session 13508
Linux Performance Tuning and Stabilization TipsYoshinori Matsunobu (Sun Microsystems)slideshare.net48:45session 13252

Debugging and Reactive/Proactive Monitoring

TitlePresenterSlidesVideo link
(hr:min:sec)
Details (Conf. site link)
Better Database Debugging for Shorter DowntimesRob Hamel (Pythian)PDF33:13
session 13021
Continual Replication SyncDanil Zburivsky (Pythian)ODP45:57session 13428
Find Query Problems Proactively With Query ReviewsSheeri K. Cabral (Pythian)PDF45:59session 13267
Monitoring Drizzle or MySQL With DTrace and SystemTapPadraig O’Sullivan (Akiba Technologies Inc.)PDF42:33session 12472

Security / Risk Management

TitlePresenterSlidesVideo link
(hr:min:sec)
Details (Conf. site link)
Achieving PCI Compliance with MySQLRyan Lowe (Percona)PPTX58:24session 12484
Security Around MySQLDanil Zburivsky (Pythian)ODP37:27session 13458
Securich – A Security and User Administration plugin for MySQLDarren Cassar (Trading Screen Inc)PDF54:05session 13351

Other DBA-related

TitlePresenterSlidesVideo link
(hr:min:sec)
Details (Conf. site link)
Galera – Synchronous Multi-master Replication For InnoDBSeppo Jaakola (Codership), Alexey Yurchenko (Codership) PDF47:39session 13286
Large Deployment Best PracticesNicklas Westerlund (Electronic Arts)40:37session 12567
MySQL Cluster: An IntroductionGeert Vanderkelen (Sun Microsystems)PDF47:30session 12469
Migration From Oracle to MySQL : An NPR Case StudyJoanne Garlow (National Public Radio)PPT34:35session 13404
New Replication FeaturesMats Kindahl (Sun Microsystems), Lars Thalmann (MySQL)PDF53:32session 12451
Successful and Cost Effective Data Warehouse… The MySQL WayIvan Zoratti (MySQL)PDF1:00:25session 13343
The Thinking Person’s Guide to Data Warehouse DesignRobin Schumacher (Calpont)slideshare.net59:50session 13366
Using DrizzleEric Day (Rackspace Cloud), Monty Taylor (Rackspace Cloud)58:21session 13308

Other Developer-related

TitlePresenterSlidesVideo link
(hr:min:sec)
Details (Conf. site link)
Connecting MySQL and PythonGeert Vanderkelen (Sun Microsystems)PDF54:56session 13251
MySQL Plugin API: New FeaturesSergei Golubchik (MariaDB)ZIP40:14session 13143
PHP Object-Relational Mapping Libraries In ActionFernando Ipar (Percona)PDF49:45session 12489
Scalability and Reliability Features of MySQL Connector/JMark Matthews (Oracle), Todd Farmer (Oracle Corporation)PDF39:07session 12448
Time Zones and MySQLSheeri K. Cabral (Pythian)PDF45:54session 12412
Using Visual Studio 2010MySQL Reggie Burnett (Oracle), Mike Frank (Oracle)ZIP37:53session 13365

On Monday, this web doctor his thoughts on Tuesday.

We have confirmed that there will be an entire MySQL track at Kaleidoscope! Because Kaleidoscope is less than 8 weeks away, dosage we could not go through a standard call for papers. Ronald and I have been working to come up with appropriate topics and speakers for an audience that uses MySQL but is probably more familiar with Oracle. We contacted folks we thought would be interested, pharmacy and who we thought could make it logistically, as the conference is in Washington, D.C.

We have (almost) finalized the list of speakers; the session abstracts will be finalized in the next few days. You can see the speakers at Kaleidoscope’s MySQL page, but I’ve also listed them below (alpha by last name):

Philip Antoniades, Sun/MySQL
Ronald Bradford, 42SQL
Sheeri K. Cabral, The Pythian Group
Laine Campbell, PalominoDB
Patrick Galbraith, Northscale
Sarah Novotny, Blue Gecko
Padraig O’Sullivan, Akiba Technologies Inc.
Jay Pipes, Rackspace Cloud
Dossy Shiobara, Panoptic.com
Matt Yonkovit, Percona

There are one or two more speakers we are waiting to hear back from. There will be 19 sessions, so some speakers will have more than one session.

I am very excited that MySQL has its own track at Kaleidoscope. In addition, Ronald and I will be able to attend our very first event as Oracle ACE Directors – the Sundown Sessions are a Birds-of-a-Feather-type discussion, with the Oracle ACE Directors being the panelists and the community asking questions. Immediately after the Sundown Sessions is a “Meet the Oracle ACE” event, the only part of the conference officially sponsored by Oracle.

On Monday, this web doctor his thoughts on Tuesday.

We have confirmed that there will be an entire MySQL track at Kaleidoscope! Because Kaleidoscope is less than 8 weeks away, dosage we could not go through a standard call for papers. Ronald and I have been working to come up with appropriate topics and speakers for an audience that uses MySQL but is probably more familiar with Oracle. We contacted folks we thought would be interested, pharmacy and who we thought could make it logistically, as the conference is in Washington, D.C.

We have (almost) finalized the list of speakers; the session abstracts will be finalized in the next few days. You can see the speakers at Kaleidoscope’s MySQL page, but I’ve also listed them below (alpha by last name):

Philip Antoniades, Sun/MySQL
Ronald Bradford, 42SQL
Sheeri K. Cabral, The Pythian Group
Laine Campbell, PalominoDB
Patrick Galbraith, Northscale
Sarah Novotny, Blue Gecko
Padraig O’Sullivan, Akiba Technologies Inc.
Jay Pipes, Rackspace Cloud
Dossy Shiobara, Panoptic.com
Matt Yonkovit, Percona

There are one or two more speakers we are waiting to hear back from. There will be 19 sessions, so some speakers will have more than one session.

I am very excited that MySQL has its own track at Kaleidoscope. In addition, Ronald and I will be able to attend our very first event as Oracle ACE Directors – the Sundown Sessions are a Birds-of-a-Feather-type discussion, with the Oracle ACE Directors being the panelists and the community asking questions. Immediately after the Sundown Sessions is a “Meet the Oracle ACE” event, the only part of the conference officially sponsored by Oracle.

I am attending the tomorrow.)

So I am in Seeking Senior and Beyond: The Tech Skills That Get You Promoted. The first part talks about the definition of what it means to be senior, nurse and it completely relates to DBA work:
works and plays well with other
understands “ability”
leads by example
lives to share knowledge
understands “Service”
thoughtful of the consequences of their actions
understands projects
cool under pressure

Good Qualities:
confident
empathetic
humane
personal
forthright
respectful
thorough

Bad Qualities:
disrespective
insensitive
incompetent
[my own addition – no follow through, dosage lack of attention to detail]

The Dice/Monster Factor – what do job sites see as important for a senior position?

They back up the SAGE 5-year experience requirement
Ability to code in newer languages (Ruby/Python) is more prevalent (perhaps cloud-induced?)

The cloud allows sysadmin tasks to be done by anyone…..so developers can do sysadmin work, and you end up seeing schizophrenic job descriptions such as

About the 5-year requirement:
– Senior after 5 years? What happens after 10 years?
– Most electricians, by comparison, haven’t even completed an *apprenticeship* in 5 years.

Senior Administrators Code
– not just 20-line shell scripts
– coding skills are part of a sysadmin skill
– ability to code competently *is* a factor that separates juniors from seniors
– hiring managers expect senior admins to be competent coders.

If you are not a coder
– pick a language, any language
– do not listen to fans, find one that fits how you think, they all work…..
– …that being said, some languages are more practical than others (ie, .NET probably is not the best language to learn if you are a Unix sysadmin).

Popular admin languages:
– Perl: classic admin scripting language. Learn at least the basics, because you will see it in any environment that has been around for more than 5 years.

– Ruby: object-oriented language for people who mostly like Perl (except for its OO implementation)

– Python: object-oriented language for people who mostly hate Perl, objects or no objects. For example, you don’t have to create a String object to send an output.

But what if you do not have time to learn how to program?

– senior admins are better at managing their time than junior admins, so perhaps managing time
– time management means you’ll have more time to do things, it doesn’t mean all work work work.
– Read Time Management for System Administrators – there is Google Video of a presentation by the author, Tom Limoncelli.

Consider “The Cloud”
– starting to use developer APIs to perform sysadmin tasks, so learning programming is good.
– still growing, could supplant large portions of datacenter real estate
– a coder with sysadmin knowledge: Good
– a sysadmin with coding knowledge: Good
– a coder without sysadmin knowledge: OK
– a sysadmin with no coding interest/experience: Tough place to be in

Senior Admins Have Problems Too
Many don’t document or share knowledge
Maany don’t do a good job keeping up with their craft
Cannot always be highlighted as an example of how to deal with clients
Often reinvent the wheel – also usually there is no repository
Often don’t progress beyond the “senior admin” role

….on the other hand…..
cynicism can be good…..

Advice:
learn from the good traits
observe how others respond to their bad traits
think about how you might improve upon that
strive to work and play well with others, even if you don’t have a mentor for good/bad examples.

Now he’s going into talking about Patterns in System Administration….

On Monday, this web doctor his thoughts on Tuesday.

We have confirmed that there will be an entire MySQL track at Kaleidoscope! Because Kaleidoscope is less than 8 weeks away, dosage we could not go through a standard call for papers. Ronald and I have been working to come up with appropriate topics and speakers for an audience that uses MySQL but is probably more familiar with Oracle. We contacted folks we thought would be interested, pharmacy and who we thought could make it logistically, as the conference is in Washington, D.C.

We have (almost) finalized the list of speakers; the session abstracts will be finalized in the next few days. You can see the speakers at Kaleidoscope’s MySQL page, but I’ve also listed them below (alpha by last name):

Philip Antoniades, Sun/MySQL
Ronald Bradford, 42SQL
Sheeri K. Cabral, The Pythian Group
Laine Campbell, PalominoDB
Patrick Galbraith, Northscale
Sarah Novotny, Blue Gecko
Padraig O’Sullivan, Akiba Technologies Inc.
Jay Pipes, Rackspace Cloud
Dossy Shiobara, Panoptic.com
Matt Yonkovit, Percona

There are one or two more speakers we are waiting to hear back from. There will be 19 sessions, so some speakers will have more than one session.

I am very excited that MySQL has its own track at Kaleidoscope. In addition, Ronald and I will be able to attend our very first event as Oracle ACE Directors – the Sundown Sessions are a Birds-of-a-Feather-type discussion, with the Oracle ACE Directors being the panelists and the community asking questions. Immediately after the Sundown Sessions is a “Meet the Oracle ACE” event, the only part of the conference officially sponsored by Oracle.

I am attending the tomorrow.)

So I am in Seeking Senior and Beyond: The Tech Skills That Get You Promoted. The first part talks about the definition of what it means to be senior, nurse and it completely relates to DBA work:
works and plays well with other
understands “ability”
leads by example
lives to share knowledge
understands “Service”
thoughtful of the consequences of their actions
understands projects
cool under pressure

Good Qualities:
confident
empathetic
humane
personal
forthright
respectful
thorough

Bad Qualities:
disrespective
insensitive
incompetent
[my own addition – no follow through, dosage lack of attention to detail]

The Dice/Monster Factor – what do job sites see as important for a senior position?

They back up the SAGE 5-year experience requirement
Ability to code in newer languages (Ruby/Python) is more prevalent (perhaps cloud-induced?)

The cloud allows sysadmin tasks to be done by anyone…..so developers can do sysadmin work, and you end up seeing schizophrenic job descriptions such as

About the 5-year requirement:
– Senior after 5 years? What happens after 10 years?
– Most electricians, by comparison, haven’t even completed an *apprenticeship* in 5 years.

Senior Administrators Code
– not just 20-line shell scripts
– coding skills are part of a sysadmin skill
– ability to code competently *is* a factor that separates juniors from seniors
– hiring managers expect senior admins to be competent coders.

If you are not a coder
– pick a language, any language
– do not listen to fans, find one that fits how you think, they all work…..
– …that being said, some languages are more practical than others (ie, .NET probably is not the best language to learn if you are a Unix sysadmin).

Popular admin languages:
– Perl: classic admin scripting language. Learn at least the basics, because you will see it in any environment that has been around for more than 5 years.

– Ruby: object-oriented language for people who mostly like Perl (except for its OO implementation)

– Python: object-oriented language for people who mostly hate Perl, objects or no objects. For example, you don’t have to create a String object to send an output.

But what if you do not have time to learn how to program?

– senior admins are better at managing their time than junior admins, so perhaps managing time
– time management means you’ll have more time to do things, it doesn’t mean all work work work.
– Read Time Management for System Administrators – there is Google Video of a presentation by the author, Tom Limoncelli.

Consider “The Cloud”
– starting to use developer APIs to perform sysadmin tasks, so learning programming is good.
– still growing, could supplant large portions of datacenter real estate
– a coder with sysadmin knowledge: Good
– a sysadmin with coding knowledge: Good
– a coder without sysadmin knowledge: OK
– a sysadmin with no coding interest/experience: Tough place to be in

Senior Admins Have Problems Too
Many don’t document or share knowledge
Maany don’t do a good job keeping up with their craft
Cannot always be highlighted as an example of how to deal with clients
Often reinvent the wheel – also usually there is no repository
Often don’t progress beyond the “senior admin” role

….on the other hand…..
cynicism can be good…..

Advice:
learn from the good traits
observe how others respond to their bad traits
think about how you might improve upon that
strive to work and play well with others, even if you don’t have a mentor for good/bad examples.

Now he’s going into talking about Patterns in System Administration….

Why Python?

– Low WTF per minute factor
– Passes the 6-month test (if you write python code, here going back in 6 months, prostate you pretty much know what you were trying to do)
– Small Shift/no-Shift ratio (ie, you use the “Shift” key a lot in Perl because you use $ % ( ) { } etc, so you can tell what something is by context, not by $ or %)
– It’s hard to make a mess
– Objects if you need them, ignore them if you don’t.


Basics
Here’s a sample interpreter session. The >>> is the python prompt, and the … is the second/subsequent line prompt:

>>> x='hello, world!';
>>> x.upper()
'HELLO, WORLD!'
>>> def swapper(mystr):
... return mystr.swapcase()
File "<stdin>", line 2
return mystr.swapcase()
^
IndentationError: expected an indented block

You need to put a space on the second line because whitespace ‘tabbing’ is enforced in Python:

>>> def swapper(mystr):
...  return mystr.swapcase()
...
>>> swapper(x)
'HELLO, WORLD!'
>>> x
'hello, world!'

Substrings
partition is how to get substrings based on a separator:

>>> def parts(mystr, sep=','):
...  return mystr.partition(sep)
...
>>> parts(x, ',')
('hello', ',', ' world!')

You can replace text, too, using replace.

>>> def personalize(greeting, name='Brian'):
...  """Replaces 'world' with a given name"""
...  return greeting.replace('world', name)
...
>>> personalize(x, 'Brian')
'hello, Brian!'

By the way, the stuff in the triple quotes is automatic documentation. A double underscore, also called a “dunder”, is to print the stuff in the triple quotes:

>>> print personalize.__doc__
Replaces 'world' with a given name

Loop over a list of functions and do that function to some data:

>>> funclist=[swapper, personalize, parts]
>>> for func in funclist:
...  func(x)
...
'HELLO, WORLD!'
'hello, Brian!'
('hello', ',', ' world!')

Lists

>>> v=range(1,10)
>>> v
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> v[1]
2
>>> v[5]
6
>>> v[-1]
9
>>> v[-3]
7

List slicing with “:”
>>> v[:2]
[1, 2]
>>> v[4:]
[5, 6, 7, 8, 9]
>>> v[4:9]
[5, 6, 7, 8, 9]
Note that there’s no error returned even though there’s no field 9. If you did v[9], you’d get an error:
>>> v[9]
Traceback (most recent call last):
File ““, line 1, in
IndexError: list index out of range

Python uses pointers (or pointer-like things) so v[1:-1] does not print the first and last values:

>>> v[1:-1]
[2, 3, 4, 5, 6, 7, 8]

The full array syntax is [start:end:index increment]:

>>> v[::2]
[1, 3, 5, 7, 9]
>>> v[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1]
>>> v[1:-1:4]
[2, 6]
>>> v[::3]
[1, 4, 7]

Make an array of numbers with range

>>> l=range(10)
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Make a list from another list

>>> [pow(num,2) for num in l]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

append appends to the end of a list

>>> l.append( [pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]]
>>> l.pop()
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

extend takes a sequence and puts it at the end of the array.

>>> l.extend([pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

A list can be made of a transformation, an iteration and optional filter:
[ i*i for i in mylist if i % 2 == 0]
transformation is i*i
iteration is for i in mylist
optional filter is if i % 2 == 0

>>> L=range(1,6)
>>> L
[1, 2, 3, 4, 5]
>>> [ i*i for i in L if i % 2 == 0]
[4, 16]

Tuples
Tuples are immutable lists, and they use () instead of []
A tuple always has 2 elements, so a one-item tuple is defined as
x=(1,)

Dictionaries aka associative arrays/hashes:

>>> d = {'user':'jonesy', 'room':'1178'}
>>> d
{'user': 'jonesy', 'room': '1178'}
>>> d['user']
'jonesy'
>>> d.keys()
['user', 'room']
>>> d.values()
['jonesy', '1178']
>>> d.items()
[('user', 'jonesy'), ('room', '1178')]
>>> d.items()[0]
('user', 'jonesy')
>>> d.items()[0][1]
'jonesy'
>>> d.items()[0][1].swapcase()
'JONESY'

There is no order to dictionaries, so don’t rely on it.

Quotes and string formatting
– You can use single and double quotes inside each other
– Inside triple quotes, you can use single and double quotes
– Variables are not recognized in strings, uses printf-style string formatting:

>>> word='World'
>>> punc='!'
>>> print "Hello, %s%s" % (word, punc)
Hello, World!

Braces, semicolons, indents
– Use indents instead of braces
– End-of-line instead of semicolons

if x == y:
print "x == y"
for k,v in mydict.iteritems():
if v is None:
continue
print "v has a value: %s" % v

This seems like it might be problematic because of long blocks of code, but apparently code blocks don’t get that long. You can also use folds in vim [now I need to look up what folds in vim are].

You can’t assign a value in a conditional statement’s expression — because you can’t use an = sign. This is on purpose, it avoids bugs resulting from typing if x=y instead of if x==y.

The construct has no place in production code anyway, since you give up catching any exceptions.

Python modules for sysadmins:
– sys
– os
– urlib/urlib2
– time, datetime (and calendar)
– fileinput
– stat
– filecmp
– glob (to use wildcards)
– shutil
– gzip
– tarfile
– hashlib, md5, crypt
– logging
– curses
– smtplib and email
– cmd

The Zen of Python
To get this, type ‘python’ in a unix environment, then type ‘import this’ at the commandline. I did this on my Windows laptop running Cygwin:

cabral@pythianbos2 ~
$ python
Python 2.5.2 (r252:60911, Dec  2 2008, 09:26:14)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

This was liveblogged, please let me know any issues, as they may be typos….

On Monday, this web doctor his thoughts on Tuesday.

We have confirmed that there will be an entire MySQL track at Kaleidoscope! Because Kaleidoscope is less than 8 weeks away, dosage we could not go through a standard call for papers. Ronald and I have been working to come up with appropriate topics and speakers for an audience that uses MySQL but is probably more familiar with Oracle. We contacted folks we thought would be interested, pharmacy and who we thought could make it logistically, as the conference is in Washington, D.C.

We have (almost) finalized the list of speakers; the session abstracts will be finalized in the next few days. You can see the speakers at Kaleidoscope’s MySQL page, but I’ve also listed them below (alpha by last name):

Philip Antoniades, Sun/MySQL
Ronald Bradford, 42SQL
Sheeri K. Cabral, The Pythian Group
Laine Campbell, PalominoDB
Patrick Galbraith, Northscale
Sarah Novotny, Blue Gecko
Padraig O’Sullivan, Akiba Technologies Inc.
Jay Pipes, Rackspace Cloud
Dossy Shiobara, Panoptic.com
Matt Yonkovit, Percona

There are one or two more speakers we are waiting to hear back from. There will be 19 sessions, so some speakers will have more than one session.

I am very excited that MySQL has its own track at Kaleidoscope. In addition, Ronald and I will be able to attend our very first event as Oracle ACE Directors – the Sundown Sessions are a Birds-of-a-Feather-type discussion, with the Oracle ACE Directors being the panelists and the community asking questions. Immediately after the Sundown Sessions is a “Meet the Oracle ACE” event, the only part of the conference officially sponsored by Oracle.

I am attending the tomorrow.)

So I am in Seeking Senior and Beyond: The Tech Skills That Get You Promoted. The first part talks about the definition of what it means to be senior, nurse and it completely relates to DBA work:
works and plays well with other
understands “ability”
leads by example
lives to share knowledge
understands “Service”
thoughtful of the consequences of their actions
understands projects
cool under pressure

Good Qualities:
confident
empathetic
humane
personal
forthright
respectful
thorough

Bad Qualities:
disrespective
insensitive
incompetent
[my own addition – no follow through, dosage lack of attention to detail]

The Dice/Monster Factor – what do job sites see as important for a senior position?

They back up the SAGE 5-year experience requirement
Ability to code in newer languages (Ruby/Python) is more prevalent (perhaps cloud-induced?)

The cloud allows sysadmin tasks to be done by anyone…..so developers can do sysadmin work, and you end up seeing schizophrenic job descriptions such as

About the 5-year requirement:
– Senior after 5 years? What happens after 10 years?
– Most electricians, by comparison, haven’t even completed an *apprenticeship* in 5 years.

Senior Administrators Code
– not just 20-line shell scripts
– coding skills are part of a sysadmin skill
– ability to code competently *is* a factor that separates juniors from seniors
– hiring managers expect senior admins to be competent coders.

If you are not a coder
– pick a language, any language
– do not listen to fans, find one that fits how you think, they all work…..
– …that being said, some languages are more practical than others (ie, .NET probably is not the best language to learn if you are a Unix sysadmin).

Popular admin languages:
– Perl: classic admin scripting language. Learn at least the basics, because you will see it in any environment that has been around for more than 5 years.

– Ruby: object-oriented language for people who mostly like Perl (except for its OO implementation)

– Python: object-oriented language for people who mostly hate Perl, objects or no objects. For example, you don’t have to create a String object to send an output.

But what if you do not have time to learn how to program?

– senior admins are better at managing their time than junior admins, so perhaps managing time
– time management means you’ll have more time to do things, it doesn’t mean all work work work.
– Read Time Management for System Administrators – there is Google Video of a presentation by the author, Tom Limoncelli.

Consider “The Cloud”
– starting to use developer APIs to perform sysadmin tasks, so learning programming is good.
– still growing, could supplant large portions of datacenter real estate
– a coder with sysadmin knowledge: Good
– a sysadmin with coding knowledge: Good
– a coder without sysadmin knowledge: OK
– a sysadmin with no coding interest/experience: Tough place to be in

Senior Admins Have Problems Too
Many don’t document or share knowledge
Maany don’t do a good job keeping up with their craft
Cannot always be highlighted as an example of how to deal with clients
Often reinvent the wheel – also usually there is no repository
Often don’t progress beyond the “senior admin” role

….on the other hand…..
cynicism can be good…..

Advice:
learn from the good traits
observe how others respond to their bad traits
think about how you might improve upon that
strive to work and play well with others, even if you don’t have a mentor for good/bad examples.

Now he’s going into talking about Patterns in System Administration….

Why Python?

– Low WTF per minute factor
– Passes the 6-month test (if you write python code, here going back in 6 months, prostate you pretty much know what you were trying to do)
– Small Shift/no-Shift ratio (ie, you use the “Shift” key a lot in Perl because you use $ % ( ) { } etc, so you can tell what something is by context, not by $ or %)
– It’s hard to make a mess
– Objects if you need them, ignore them if you don’t.


Basics
Here’s a sample interpreter session. The >>> is the python prompt, and the … is the second/subsequent line prompt:

>>> x='hello, world!';
>>> x.upper()
'HELLO, WORLD!'
>>> def swapper(mystr):
... return mystr.swapcase()
File "<stdin>", line 2
return mystr.swapcase()
^
IndentationError: expected an indented block

You need to put a space on the second line because whitespace ‘tabbing’ is enforced in Python:

>>> def swapper(mystr):
...  return mystr.swapcase()
...
>>> swapper(x)
'HELLO, WORLD!'
>>> x
'hello, world!'

Substrings
partition is how to get substrings based on a separator:

>>> def parts(mystr, sep=','):
...  return mystr.partition(sep)
...
>>> parts(x, ',')
('hello', ',', ' world!')

You can replace text, too, using replace.

>>> def personalize(greeting, name='Brian'):
...  """Replaces 'world' with a given name"""
...  return greeting.replace('world', name)
...
>>> personalize(x, 'Brian')
'hello, Brian!'

By the way, the stuff in the triple quotes is automatic documentation. A double underscore, also called a “dunder”, is to print the stuff in the triple quotes:

>>> print personalize.__doc__
Replaces 'world' with a given name

Loop over a list of functions and do that function to some data:

>>> funclist=[swapper, personalize, parts]
>>> for func in funclist:
...  func(x)
...
'HELLO, WORLD!'
'hello, Brian!'
('hello', ',', ' world!')

Lists

>>> v=range(1,10)
>>> v
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> v[1]
2
>>> v[5]
6
>>> v[-1]
9
>>> v[-3]
7

List slicing with “:”
>>> v[:2]
[1, 2]
>>> v[4:]
[5, 6, 7, 8, 9]
>>> v[4:9]
[5, 6, 7, 8, 9]
Note that there’s no error returned even though there’s no field 9. If you did v[9], you’d get an error:
>>> v[9]
Traceback (most recent call last):
File ““, line 1, in
IndexError: list index out of range

Python uses pointers (or pointer-like things) so v[1:-1] does not print the first and last values:

>>> v[1:-1]
[2, 3, 4, 5, 6, 7, 8]

The full array syntax is [start:end:index increment]:

>>> v[::2]
[1, 3, 5, 7, 9]
>>> v[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1]
>>> v[1:-1:4]
[2, 6]
>>> v[::3]
[1, 4, 7]

Make an array of numbers with range

>>> l=range(10)
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Make a list from another list

>>> [pow(num,2) for num in l]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

append appends to the end of a list

>>> l.append( [pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]]
>>> l.pop()
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

extend takes a sequence and puts it at the end of the array.

>>> l.extend([pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

A list can be made of a transformation, an iteration and optional filter:
[ i*i for i in mylist if i % 2 == 0]
transformation is i*i
iteration is for i in mylist
optional filter is if i % 2 == 0

>>> L=range(1,6)
>>> L
[1, 2, 3, 4, 5]
>>> [ i*i for i in L if i % 2 == 0]
[4, 16]

Tuples
Tuples are immutable lists, and they use () instead of []
A tuple always has 2 elements, so a one-item tuple is defined as
x=(1,)

Dictionaries aka associative arrays/hashes:

>>> d = {'user':'jonesy', 'room':'1178'}
>>> d
{'user': 'jonesy', 'room': '1178'}
>>> d['user']
'jonesy'
>>> d.keys()
['user', 'room']
>>> d.values()
['jonesy', '1178']
>>> d.items()
[('user', 'jonesy'), ('room', '1178')]
>>> d.items()[0]
('user', 'jonesy')
>>> d.items()[0][1]
'jonesy'
>>> d.items()[0][1].swapcase()
'JONESY'

There is no order to dictionaries, so don’t rely on it.

Quotes and string formatting
– You can use single and double quotes inside each other
– Inside triple quotes, you can use single and double quotes
– Variables are not recognized in strings, uses printf-style string formatting:

>>> word='World'
>>> punc='!'
>>> print "Hello, %s%s" % (word, punc)
Hello, World!

Braces, semicolons, indents
– Use indents instead of braces
– End-of-line instead of semicolons

if x == y:
print "x == y"
for k,v in mydict.iteritems():
if v is None:
continue
print "v has a value: %s" % v

This seems like it might be problematic because of long blocks of code, but apparently code blocks don’t get that long. You can also use folds in vim [now I need to look up what folds in vim are].

You can’t assign a value in a conditional statement’s expression — because you can’t use an = sign. This is on purpose, it avoids bugs resulting from typing if x=y instead of if x==y.

The construct has no place in production code anyway, since you give up catching any exceptions.

Python modules for sysadmins:
– sys
– os
– urlib/urlib2
– time, datetime (and calendar)
– fileinput
– stat
– filecmp
– glob (to use wildcards)
– shutil
– gzip
– tarfile
– hashlib, md5, crypt
– logging
– curses
– smtplib and email
– cmd

The Zen of Python
To get this, type ‘python’ in a unix environment, then type ‘import this’ at the commandline. I did this on my Windows laptop running Cygwin:

cabral@pythianbos2 ~
$ python
Python 2.5.2 (r252:60911, Dec  2 2008, 09:26:14)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

This was liveblogged, please let me know any issues, as they may be typos….

[author’s note: personally, store I use awk a bunch in MySQL DBA work, viagra for tasks like scrubbing data from a production export for use in qa/dev, but usually have to resort to Perl for really complex stuff, but now I know how to do .]

Basics:
By default, fields are separated by any number of spaces. The -F option to awk changes the separator on commandline.
Print the first field, fields are separated by a colon.
awk -F: '{print $1}' /etc/passwd

Print the first and fifth field:
awk -F: '{$print $1,$5}' /etc/passwd

Can pattern match and use files, so you can replace:
grep foo /etc/passwd | awk -F: '{print $1,$5}'
with:
awk -F: '/foo/ {print $1,$5}' /etc/passwd

NF = built in variable (no $) used to mean “field number”
This will print the first and last fields of lines where the first field matches “foo”
awk -F: '$1 ~/foo/ {print $1,$NF}' /etc/passwd

NF = number of fields, ie, “7″
$NF = value of last field, ie “/bin/bash”
(similarly, NR is record number)


Awk makes assumptions about input, variables, and processing that you’d otherwise have to code yourself.

– “main loop” of input processing is done for you
– awk initializes variables for you, to 0
– input is viewed by awk as ‘records’ which are splittable into ‘fields’

This all makes a lot of operations very concise in awk, many things can be done w/ a one-liner that would otherwise require several lines of code.

awk key points:
– splits text into fields
– default delimiter is “any number of spaces”
– reference fields
– $0 is entire line
– create filters using ‘addresses’ which can be regexps (similar to sed)
– Turing-complete language
– has if, while, for, do-while, etc
– built-in math like exp, log, rand, sin, cos
– built-in string sub, split, index, toupper/lower

Patterns and actions
Pattern is first, then action(s)
Actions are enclosed in {}

only a pattern, no action:
'length>42'
but, the default action is to print the whole line, so this will actually do something — print lines where the length of the line is > 42. strings are just arrays in awk

only action, no pattern:
{print $2,$1}
do this to all lines of input

NR % 3 == 0
print every third line (pattern is %NR mod 3)

{print $1, $NF, $(NF-1)}
print the first field, last field, and 2nd to last field

built-in variables
NF, NR we’ve done
FS = field separator (can be regexp)
OFMT = output format for numbers (default %.6g)

Patterns
– used to filter lines processed by awk
– can be regexp
/^root/ is the pattern in the following
awk -F:'/^root/ {print $1,$NF}' /etc/passwd

– Patterns can use fields and relational operators
To print 1st, 4th and last field if value of 4th field >10:
awk -F: '$4 > 10 {print $1, $4, $NF}' /etc/passwd

awk -F: '$0 !~ /^#/ && $4 > 10 {print $1, $4, $NF}' /etc/passwd

Range patterns
sed-like addressing : you can have start and end addresses
awk ‘NR==1,NR==3′
prints only first three lines of the file
You can use regular expressions in range patterns:
awk -F:’/^root/,/^daemon/ {print $1,$NF}’ /etc/passwd
start printing at the line that starts with “root”, the last line that is processed is the line starting with “daemon”

Range pattern “gotcha” – can’t mix a range with other patterns:
To do “start at non-commented line where value of $4 is less than 10, end at the first line where value of $4 is greater than 10″

This does not work!
awk -F: '$0 !~ /^#/ $4 <= 10, $4 > 10' /etc/passwd

This is how to do it, {next} is an action that skips:
awk -F: '$0 ~ /^#/ {next} $4 <= 10, $4 > 10 {print $1, $4' /etc/passwd

Basic Aggregation
awk -F: ‘$3 > 100 {x+=1; print x}’ /etc/passwd
This gives a line of output as each matching line is processed. This gives a running total of x.

awk -F: ‘$3 > 100 {x+=1} END {print x}’ /etc/passwd
This processes the “{print x}” action only after the entire file has been processed. This gives only the final value of x.

Arrays:
Support for regular arrays
Technically multi-dimensional arrays are not supported, but array indexes are not supported, so you can make your own associative arrays.

Example:
awk -F: ‘{x[$1] = $2*($4 – $3)} END {for(key in x) {print key, x[key]}}’ stocks.txt

The part before the END creates the associative array, the part after the END prints the array.

Extreme data munging:
awk -f: '{x[$1]=($2'($4 - $3))} END {for(z in x) {print z, x[z]}}' stocks.txt

ABC,100,12.14,19.12
FOO,100,24.01,17.45

output
BAR 271.5
ABC 698

For the line “ABC,100,12.14,19.12″
the function becomes

x[ABC] = 100 * (19.12 - 12.14) = 698

Aggregate across multiple variables:
awk -F, '{x[$1] = $2*($4 - $3); y+=x[$1]} END {for(z in x) {print z, x[z]}} {print "Net:"y}}' stocks.txt

Note that y is a running *sum* (not a running count like before).

Now, the above is hard to read, this is much easier.

#!/usr/bin/awk -f

BEGIN { FS="," }
{ x[$1] = $2*($4 - $3)
y+=x[$1]
}
END {
for(z in x) {
print z, x[z]
}
}  # end for loop
{
print "Net:"y
} # end END block

This was liveblogged, so please point out any issues, as they may be typos on my part….

On Monday, this web doctor his thoughts on Tuesday.

We have confirmed that there will be an entire MySQL track at Kaleidoscope! Because Kaleidoscope is less than 8 weeks away, dosage we could not go through a standard call for papers. Ronald and I have been working to come up with appropriate topics and speakers for an audience that uses MySQL but is probably more familiar with Oracle. We contacted folks we thought would be interested, pharmacy and who we thought could make it logistically, as the conference is in Washington, D.C.

We have (almost) finalized the list of speakers; the session abstracts will be finalized in the next few days. You can see the speakers at Kaleidoscope’s MySQL page, but I’ve also listed them below (alpha by last name):

Philip Antoniades, Sun/MySQL
Ronald Bradford, 42SQL
Sheeri K. Cabral, The Pythian Group
Laine Campbell, PalominoDB
Patrick Galbraith, Northscale
Sarah Novotny, Blue Gecko
Padraig O’Sullivan, Akiba Technologies Inc.
Jay Pipes, Rackspace Cloud
Dossy Shiobara, Panoptic.com
Matt Yonkovit, Percona

There are one or two more speakers we are waiting to hear back from. There will be 19 sessions, so some speakers will have more than one session.

I am very excited that MySQL has its own track at Kaleidoscope. In addition, Ronald and I will be able to attend our very first event as Oracle ACE Directors – the Sundown Sessions are a Birds-of-a-Feather-type discussion, with the Oracle ACE Directors being the panelists and the community asking questions. Immediately after the Sundown Sessions is a “Meet the Oracle ACE” event, the only part of the conference officially sponsored by Oracle.

I am attending the tomorrow.)

So I am in Seeking Senior and Beyond: The Tech Skills That Get You Promoted. The first part talks about the definition of what it means to be senior, nurse and it completely relates to DBA work:
works and plays well with other
understands “ability”
leads by example
lives to share knowledge
understands “Service”
thoughtful of the consequences of their actions
understands projects
cool under pressure

Good Qualities:
confident
empathetic
humane
personal
forthright
respectful
thorough

Bad Qualities:
disrespective
insensitive
incompetent
[my own addition – no follow through, dosage lack of attention to detail]

The Dice/Monster Factor – what do job sites see as important for a senior position?

They back up the SAGE 5-year experience requirement
Ability to code in newer languages (Ruby/Python) is more prevalent (perhaps cloud-induced?)

The cloud allows sysadmin tasks to be done by anyone…..so developers can do sysadmin work, and you end up seeing schizophrenic job descriptions such as

About the 5-year requirement:
– Senior after 5 years? What happens after 10 years?
– Most electricians, by comparison, haven’t even completed an *apprenticeship* in 5 years.

Senior Administrators Code
– not just 20-line shell scripts
– coding skills are part of a sysadmin skill
– ability to code competently *is* a factor that separates juniors from seniors
– hiring managers expect senior admins to be competent coders.

If you are not a coder
– pick a language, any language
– do not listen to fans, find one that fits how you think, they all work…..
– …that being said, some languages are more practical than others (ie, .NET probably is not the best language to learn if you are a Unix sysadmin).

Popular admin languages:
– Perl: classic admin scripting language. Learn at least the basics, because you will see it in any environment that has been around for more than 5 years.

– Ruby: object-oriented language for people who mostly like Perl (except for its OO implementation)

– Python: object-oriented language for people who mostly hate Perl, objects or no objects. For example, you don’t have to create a String object to send an output.

But what if you do not have time to learn how to program?

– senior admins are better at managing their time than junior admins, so perhaps managing time
– time management means you’ll have more time to do things, it doesn’t mean all work work work.
– Read Time Management for System Administrators – there is Google Video of a presentation by the author, Tom Limoncelli.

Consider “The Cloud”
– starting to use developer APIs to perform sysadmin tasks, so learning programming is good.
– still growing, could supplant large portions of datacenter real estate
– a coder with sysadmin knowledge: Good
– a sysadmin with coding knowledge: Good
– a coder without sysadmin knowledge: OK
– a sysadmin with no coding interest/experience: Tough place to be in

Senior Admins Have Problems Too
Many don’t document or share knowledge
Maany don’t do a good job keeping up with their craft
Cannot always be highlighted as an example of how to deal with clients
Often reinvent the wheel – also usually there is no repository
Often don’t progress beyond the “senior admin” role

….on the other hand…..
cynicism can be good…..

Advice:
learn from the good traits
observe how others respond to their bad traits
think about how you might improve upon that
strive to work and play well with others, even if you don’t have a mentor for good/bad examples.

Now he’s going into talking about Patterns in System Administration….

Why Python?

– Low WTF per minute factor
– Passes the 6-month test (if you write python code, here going back in 6 months, prostate you pretty much know what you were trying to do)
– Small Shift/no-Shift ratio (ie, you use the “Shift” key a lot in Perl because you use $ % ( ) { } etc, so you can tell what something is by context, not by $ or %)
– It’s hard to make a mess
– Objects if you need them, ignore them if you don’t.


Basics
Here’s a sample interpreter session. The >>> is the python prompt, and the … is the second/subsequent line prompt:

>>> x='hello, world!';
>>> x.upper()
'HELLO, WORLD!'
>>> def swapper(mystr):
... return mystr.swapcase()
File "<stdin>", line 2
return mystr.swapcase()
^
IndentationError: expected an indented block

You need to put a space on the second line because whitespace ‘tabbing’ is enforced in Python:

>>> def swapper(mystr):
...  return mystr.swapcase()
...
>>> swapper(x)
'HELLO, WORLD!'
>>> x
'hello, world!'

Substrings
partition is how to get substrings based on a separator:

>>> def parts(mystr, sep=','):
...  return mystr.partition(sep)
...
>>> parts(x, ',')
('hello', ',', ' world!')

You can replace text, too, using replace.

>>> def personalize(greeting, name='Brian'):
...  """Replaces 'world' with a given name"""
...  return greeting.replace('world', name)
...
>>> personalize(x, 'Brian')
'hello, Brian!'

By the way, the stuff in the triple quotes is automatic documentation. A double underscore, also called a “dunder”, is to print the stuff in the triple quotes:

>>> print personalize.__doc__
Replaces 'world' with a given name

Loop over a list of functions and do that function to some data:

>>> funclist=[swapper, personalize, parts]
>>> for func in funclist:
...  func(x)
...
'HELLO, WORLD!'
'hello, Brian!'
('hello', ',', ' world!')

Lists

>>> v=range(1,10)
>>> v
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> v[1]
2
>>> v[5]
6
>>> v[-1]
9
>>> v[-3]
7

List slicing with “:”
>>> v[:2]
[1, 2]
>>> v[4:]
[5, 6, 7, 8, 9]
>>> v[4:9]
[5, 6, 7, 8, 9]
Note that there’s no error returned even though there’s no field 9. If you did v[9], you’d get an error:
>>> v[9]
Traceback (most recent call last):
File ““, line 1, in
IndexError: list index out of range

Python uses pointers (or pointer-like things) so v[1:-1] does not print the first and last values:

>>> v[1:-1]
[2, 3, 4, 5, 6, 7, 8]

The full array syntax is [start:end:index increment]:

>>> v[::2]
[1, 3, 5, 7, 9]
>>> v[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1]
>>> v[1:-1:4]
[2, 6]
>>> v[::3]
[1, 4, 7]

Make an array of numbers with range

>>> l=range(10)
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Make a list from another list

>>> [pow(num,2) for num in l]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

append appends to the end of a list

>>> l.append( [pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]]
>>> l.pop()
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

extend takes a sequence and puts it at the end of the array.

>>> l.extend([pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

A list can be made of a transformation, an iteration and optional filter:
[ i*i for i in mylist if i % 2 == 0]
transformation is i*i
iteration is for i in mylist
optional filter is if i % 2 == 0

>>> L=range(1,6)
>>> L
[1, 2, 3, 4, 5]
>>> [ i*i for i in L if i % 2 == 0]
[4, 16]

Tuples
Tuples are immutable lists, and they use () instead of []
A tuple always has 2 elements, so a one-item tuple is defined as
x=(1,)

Dictionaries aka associative arrays/hashes:

>>> d = {'user':'jonesy', 'room':'1178'}
>>> d
{'user': 'jonesy', 'room': '1178'}
>>> d['user']
'jonesy'
>>> d.keys()
['user', 'room']
>>> d.values()
['jonesy', '1178']
>>> d.items()
[('user', 'jonesy'), ('room', '1178')]
>>> d.items()[0]
('user', 'jonesy')
>>> d.items()[0][1]
'jonesy'
>>> d.items()[0][1].swapcase()
'JONESY'

There is no order to dictionaries, so don’t rely on it.

Quotes and string formatting
– You can use single and double quotes inside each other
– Inside triple quotes, you can use single and double quotes
– Variables are not recognized in strings, uses printf-style string formatting:

>>> word='World'
>>> punc='!'
>>> print "Hello, %s%s" % (word, punc)
Hello, World!

Braces, semicolons, indents
– Use indents instead of braces
– End-of-line instead of semicolons

if x == y:
print "x == y"
for k,v in mydict.iteritems():
if v is None:
continue
print "v has a value: %s" % v

This seems like it might be problematic because of long blocks of code, but apparently code blocks don’t get that long. You can also use folds in vim [now I need to look up what folds in vim are].

You can’t assign a value in a conditional statement’s expression — because you can’t use an = sign. This is on purpose, it avoids bugs resulting from typing if x=y instead of if x==y.

The construct has no place in production code anyway, since you give up catching any exceptions.

Python modules for sysadmins:
– sys
– os
– urlib/urlib2
– time, datetime (and calendar)
– fileinput
– stat
– filecmp
– glob (to use wildcards)
– shutil
– gzip
– tarfile
– hashlib, md5, crypt
– logging
– curses
– smtplib and email
– cmd

The Zen of Python
To get this, type ‘python’ in a unix environment, then type ‘import this’ at the commandline. I did this on my Windows laptop running Cygwin:

cabral@pythianbos2 ~
$ python
Python 2.5.2 (r252:60911, Dec  2 2008, 09:26:14)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

This was liveblogged, please let me know any issues, as they may be typos….

[author’s note: personally, store I use awk a bunch in MySQL DBA work, viagra for tasks like scrubbing data from a production export for use in qa/dev, but usually have to resort to Perl for really complex stuff, but now I know how to do .]

Basics:
By default, fields are separated by any number of spaces. The -F option to awk changes the separator on commandline.
Print the first field, fields are separated by a colon.
awk -F: '{print $1}' /etc/passwd

Print the first and fifth field:
awk -F: '{$print $1,$5}' /etc/passwd

Can pattern match and use files, so you can replace:
grep foo /etc/passwd | awk -F: '{print $1,$5}'
with:
awk -F: '/foo/ {print $1,$5}' /etc/passwd

NF = built in variable (no $) used to mean “field number”
This will print the first and last fields of lines where the first field matches “foo”
awk -F: '$1 ~/foo/ {print $1,$NF}' /etc/passwd

NF = number of fields, ie, “7″
$NF = value of last field, ie “/bin/bash”
(similarly, NR is record number)


Awk makes assumptions about input, variables, and processing that you’d otherwise have to code yourself.

– “main loop” of input processing is done for you
– awk initializes variables for you, to 0
– input is viewed by awk as ‘records’ which are splittable into ‘fields’

This all makes a lot of operations very concise in awk, many things can be done w/ a one-liner that would otherwise require several lines of code.

awk key points:
– splits text into fields
– default delimiter is “any number of spaces”
– reference fields
– $0 is entire line
– create filters using ‘addresses’ which can be regexps (similar to sed)
– Turing-complete language
– has if, while, for, do-while, etc
– built-in math like exp, log, rand, sin, cos
– built-in string sub, split, index, toupper/lower

Patterns and actions
Pattern is first, then action(s)
Actions are enclosed in {}

only a pattern, no action:
'length>42'
but, the default action is to print the whole line, so this will actually do something — print lines where the length of the line is > 42. strings are just arrays in awk

only action, no pattern:
{print $2,$1}
do this to all lines of input

NR % 3 == 0
print every third line (pattern is %NR mod 3)

{print $1, $NF, $(NF-1)}
print the first field, last field, and 2nd to last field

built-in variables
NF, NR we’ve done
FS = field separator (can be regexp)
OFMT = output format for numbers (default %.6g)

Patterns
– used to filter lines processed by awk
– can be regexp
/^root/ is the pattern in the following
awk -F:'/^root/ {print $1,$NF}' /etc/passwd

– Patterns can use fields and relational operators
To print 1st, 4th and last field if value of 4th field >10:
awk -F: '$4 > 10 {print $1, $4, $NF}' /etc/passwd

awk -F: '$0 !~ /^#/ && $4 > 10 {print $1, $4, $NF}' /etc/passwd

Range patterns
sed-like addressing : you can have start and end addresses
awk ‘NR==1,NR==3′
prints only first three lines of the file
You can use regular expressions in range patterns:
awk -F:’/^root/,/^daemon/ {print $1,$NF}’ /etc/passwd
start printing at the line that starts with “root”, the last line that is processed is the line starting with “daemon”

Range pattern “gotcha” – can’t mix a range with other patterns:
To do “start at non-commented line where value of $4 is less than 10, end at the first line where value of $4 is greater than 10″

This does not work!
awk -F: '$0 !~ /^#/ $4 <= 10, $4 > 10' /etc/passwd

This is how to do it, {next} is an action that skips:
awk -F: '$0 ~ /^#/ {next} $4 <= 10, $4 > 10 {print $1, $4' /etc/passwd

Basic Aggregation
awk -F: ‘$3 > 100 {x+=1; print x}’ /etc/passwd
This gives a line of output as each matching line is processed. This gives a running total of x.

awk -F: ‘$3 > 100 {x+=1} END {print x}’ /etc/passwd
This processes the “{print x}” action only after the entire file has been processed. This gives only the final value of x.

Arrays:
Support for regular arrays
Technically multi-dimensional arrays are not supported, but array indexes are not supported, so you can make your own associative arrays.

Example:
awk -F: ‘{x[$1] = $2*($4 – $3)} END {for(key in x) {print key, x[key]}}’ stocks.txt

The part before the END creates the associative array, the part after the END prints the array.

Extreme data munging:
awk -f: '{x[$1]=($2'($4 - $3))} END {for(z in x) {print z, x[z]}}' stocks.txt

ABC,100,12.14,19.12
FOO,100,24.01,17.45

output
BAR 271.5
ABC 698

For the line “ABC,100,12.14,19.12″
the function becomes

x[ABC] = 100 * (19.12 - 12.14) = 698

Aggregate across multiple variables:
awk -F, '{x[$1] = $2*($4 - $3); y+=x[$1]} END {for(z in x) {print z, x[z]}} {print "Net:"y}}' stocks.txt

Note that y is a running *sum* (not a running count like before).

Now, the above is hard to read, this is much easier.

#!/usr/bin/awk -f

BEGIN { FS="," }
{ x[$1] = $2*($4 - $3)
y+=x[$1]
}
END {
for(z in x) {
print z, x[z]
}
}  # end for loop
{
print "Net:"y
} # end END block

This was liveblogged, so please point out any issues, as they may be typos on my part….

The Beacon Pattern:
– This is a “Get out of the business” pattern
– Identify an oft-occurring and annoying task
– Automate and document it to the point of being able to hand it off to someone far less technical

Example:
– System admins were being put in charge of scheduling rooms in the building
– They wrote a PHP web application to help them automate the task
– They refined the app, cardiology and handed it off to a secretary
– They have to maintain the app, this web but it’s far less work.

The Community Pattern:

– Prior to launch of a new service, create user documentation for it.
– Point a few early adopters at the documentation and see if they can use the service with minimal support
– Use feedback to improve documentation, and the service
– Upon launch, create a mailing list, forum, IRC channel, or Jabber chat room and ask early adopters to help test it out.
– Upon launch, your early adopters are the community, and they’ll tell new users to use the tools you’ve provided instead of calling you.

Example:
– A beowulf cluster for an academic department
– Documented like crazy, early adopters were given early access to the cluster (demand was high)
– Crated a mailing list, early adopters were added to it with their consent, functionality was tested with them.
– Email announcing launch mentioned the early adopters in a ‘thank you’ secion, and linked them to their mailing list.

The DRY pattern
DRY = Don’t repeat yourself
Identify duplicate code in your automation scripts
Put subroutines that exist in an include file, and include them in your scripts.

Example:
– “sysadmin library”
– /var/lib/adm/.*pl
– Elapsed time and # of lines to script a task for which the library was useful plunged dramatically
– new tasks were thought up that were not considered before but were obvious now (ie, users that want to change their username)
– migrating to new services became much easier

The Chameleon Pattern
– Identify commonalities among your services
– Leverage those to create “Chameleon” servers that can be re-purposed on the fly
– Abstract as much of this away from the physical hardware
– Doesn’t need to involve virtualization, though it’s awfully handy if you can do it that way.
[this one is a bit harder to do with MySQL config files]

Example:
[puppet/cfengine were mentioned…]
ldapconfig.py – more than a script: a methodology

– But isn’t installing packages you don’t need bad? Depends on the package….ie, gcc is bad for enterprise

“Junior annoynances”

Terminal issues

Junior:
open terminal, login to machine1
think issue is with machine2, talks to machine1.
log out of machine1
log into machine2

Senior:
opens 2 terminals each of machine1 and machine2 to start

Junior:
networking issue ticket arrives
logs into server
runs tcpdump

Senior:
networking issue ticket arrives
logs into server
looks at logs

“Fix” vs. “Solution” ie “taking orders”
Junior will try fix a problem, senior will try to figure out what the problem is. ie, “I need a samba directory mounted under an NFS mount” a junior admin will try to do exactly that, a senior admin will ask “what are you trying to do with that?” because maybe all they need is a symlink.

Fanboyism
Signs you might be a fanboy:
– Disparaging users of latest stable release of $THING for not using the nightly (unstable) build which fixes more issues
– Creating false/invalid comparisons based on popular opinion instead of experience/facts
– Going against internal standards, breaking environmental consistency, to use $THING instead of $STANDARD (but this is also how disruptive technology works)
– Being in complete denial that most technology at some point or another stinks.
– Evaluating solutions based on “I like” instead of “we need” and “this does”.

On Monday, this web doctor his thoughts on Tuesday.

We have confirmed that there will be an entire MySQL track at Kaleidoscope! Because Kaleidoscope is less than 8 weeks away, dosage we could not go through a standard call for papers. Ronald and I have been working to come up with appropriate topics and speakers for an audience that uses MySQL but is probably more familiar with Oracle. We contacted folks we thought would be interested, pharmacy and who we thought could make it logistically, as the conference is in Washington, D.C.

We have (almost) finalized the list of speakers; the session abstracts will be finalized in the next few days. You can see the speakers at Kaleidoscope’s MySQL page, but I’ve also listed them below (alpha by last name):

Philip Antoniades, Sun/MySQL
Ronald Bradford, 42SQL
Sheeri K. Cabral, The Pythian Group
Laine Campbell, PalominoDB
Patrick Galbraith, Northscale
Sarah Novotny, Blue Gecko
Padraig O’Sullivan, Akiba Technologies Inc.
Jay Pipes, Rackspace Cloud
Dossy Shiobara, Panoptic.com
Matt Yonkovit, Percona

There are one or two more speakers we are waiting to hear back from. There will be 19 sessions, so some speakers will have more than one session.

I am very excited that MySQL has its own track at Kaleidoscope. In addition, Ronald and I will be able to attend our very first event as Oracle ACE Directors – the Sundown Sessions are a Birds-of-a-Feather-type discussion, with the Oracle ACE Directors being the panelists and the community asking questions. Immediately after the Sundown Sessions is a “Meet the Oracle ACE” event, the only part of the conference officially sponsored by Oracle.

I am attending the tomorrow.)

So I am in Seeking Senior and Beyond: The Tech Skills That Get You Promoted. The first part talks about the definition of what it means to be senior, nurse and it completely relates to DBA work:
works and plays well with other
understands “ability”
leads by example
lives to share knowledge
understands “Service”
thoughtful of the consequences of their actions
understands projects
cool under pressure

Good Qualities:
confident
empathetic
humane
personal
forthright
respectful
thorough

Bad Qualities:
disrespective
insensitive
incompetent
[my own addition – no follow through, dosage lack of attention to detail]

The Dice/Monster Factor – what do job sites see as important for a senior position?

They back up the SAGE 5-year experience requirement
Ability to code in newer languages (Ruby/Python) is more prevalent (perhaps cloud-induced?)

The cloud allows sysadmin tasks to be done by anyone…..so developers can do sysadmin work, and you end up seeing schizophrenic job descriptions such as

About the 5-year requirement:
– Senior after 5 years? What happens after 10 years?
– Most electricians, by comparison, haven’t even completed an *apprenticeship* in 5 years.

Senior Administrators Code
– not just 20-line shell scripts
– coding skills are part of a sysadmin skill
– ability to code competently *is* a factor that separates juniors from seniors
– hiring managers expect senior admins to be competent coders.

If you are not a coder
– pick a language, any language
– do not listen to fans, find one that fits how you think, they all work…..
– …that being said, some languages are more practical than others (ie, .NET probably is not the best language to learn if you are a Unix sysadmin).

Popular admin languages:
– Perl: classic admin scripting language. Learn at least the basics, because you will see it in any environment that has been around for more than 5 years.

– Ruby: object-oriented language for people who mostly like Perl (except for its OO implementation)

– Python: object-oriented language for people who mostly hate Perl, objects or no objects. For example, you don’t have to create a String object to send an output.

But what if you do not have time to learn how to program?

– senior admins are better at managing their time than junior admins, so perhaps managing time
– time management means you’ll have more time to do things, it doesn’t mean all work work work.
– Read Time Management for System Administrators – there is Google Video of a presentation by the author, Tom Limoncelli.

Consider “The Cloud”
– starting to use developer APIs to perform sysadmin tasks, so learning programming is good.
– still growing, could supplant large portions of datacenter real estate
– a coder with sysadmin knowledge: Good
– a sysadmin with coding knowledge: Good
– a coder without sysadmin knowledge: OK
– a sysadmin with no coding interest/experience: Tough place to be in

Senior Admins Have Problems Too
Many don’t document or share knowledge
Maany don’t do a good job keeping up with their craft
Cannot always be highlighted as an example of how to deal with clients
Often reinvent the wheel – also usually there is no repository
Often don’t progress beyond the “senior admin” role

….on the other hand…..
cynicism can be good…..

Advice:
learn from the good traits
observe how others respond to their bad traits
think about how you might improve upon that
strive to work and play well with others, even if you don’t have a mentor for good/bad examples.

Now he’s going into talking about Patterns in System Administration….

Why Python?

– Low WTF per minute factor
– Passes the 6-month test (if you write python code, here going back in 6 months, prostate you pretty much know what you were trying to do)
– Small Shift/no-Shift ratio (ie, you use the “Shift” key a lot in Perl because you use $ % ( ) { } etc, so you can tell what something is by context, not by $ or %)
– It’s hard to make a mess
– Objects if you need them, ignore them if you don’t.


Basics
Here’s a sample interpreter session. The >>> is the python prompt, and the … is the second/subsequent line prompt:

>>> x='hello, world!';
>>> x.upper()
'HELLO, WORLD!'
>>> def swapper(mystr):
... return mystr.swapcase()
File "<stdin>", line 2
return mystr.swapcase()
^
IndentationError: expected an indented block

You need to put a space on the second line because whitespace ‘tabbing’ is enforced in Python:

>>> def swapper(mystr):
...  return mystr.swapcase()
...
>>> swapper(x)
'HELLO, WORLD!'
>>> x
'hello, world!'

Substrings
partition is how to get substrings based on a separator:

>>> def parts(mystr, sep=','):
...  return mystr.partition(sep)
...
>>> parts(x, ',')
('hello', ',', ' world!')

You can replace text, too, using replace.

>>> def personalize(greeting, name='Brian'):
...  """Replaces 'world' with a given name"""
...  return greeting.replace('world', name)
...
>>> personalize(x, 'Brian')
'hello, Brian!'

By the way, the stuff in the triple quotes is automatic documentation. A double underscore, also called a “dunder”, is to print the stuff in the triple quotes:

>>> print personalize.__doc__
Replaces 'world' with a given name

Loop over a list of functions and do that function to some data:

>>> funclist=[swapper, personalize, parts]
>>> for func in funclist:
...  func(x)
...
'HELLO, WORLD!'
'hello, Brian!'
('hello', ',', ' world!')

Lists

>>> v=range(1,10)
>>> v
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> v[1]
2
>>> v[5]
6
>>> v[-1]
9
>>> v[-3]
7

List slicing with “:”
>>> v[:2]
[1, 2]
>>> v[4:]
[5, 6, 7, 8, 9]
>>> v[4:9]
[5, 6, 7, 8, 9]
Note that there’s no error returned even though there’s no field 9. If you did v[9], you’d get an error:
>>> v[9]
Traceback (most recent call last):
File ““, line 1, in
IndexError: list index out of range

Python uses pointers (or pointer-like things) so v[1:-1] does not print the first and last values:

>>> v[1:-1]
[2, 3, 4, 5, 6, 7, 8]

The full array syntax is [start:end:index increment]:

>>> v[::2]
[1, 3, 5, 7, 9]
>>> v[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1]
>>> v[1:-1:4]
[2, 6]
>>> v[::3]
[1, 4, 7]

Make an array of numbers with range

>>> l=range(10)
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Make a list from another list

>>> [pow(num,2) for num in l]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

append appends to the end of a list

>>> l.append( [pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]]
>>> l.pop()
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

extend takes a sequence and puts it at the end of the array.

>>> l.extend([pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

A list can be made of a transformation, an iteration and optional filter:
[ i*i for i in mylist if i % 2 == 0]
transformation is i*i
iteration is for i in mylist
optional filter is if i % 2 == 0

>>> L=range(1,6)
>>> L
[1, 2, 3, 4, 5]
>>> [ i*i for i in L if i % 2 == 0]
[4, 16]

Tuples
Tuples are immutable lists, and they use () instead of []
A tuple always has 2 elements, so a one-item tuple is defined as
x=(1,)

Dictionaries aka associative arrays/hashes:

>>> d = {'user':'jonesy', 'room':'1178'}
>>> d
{'user': 'jonesy', 'room': '1178'}
>>> d['user']
'jonesy'
>>> d.keys()
['user', 'room']
>>> d.values()
['jonesy', '1178']
>>> d.items()
[('user', 'jonesy'), ('room', '1178')]
>>> d.items()[0]
('user', 'jonesy')
>>> d.items()[0][1]
'jonesy'
>>> d.items()[0][1].swapcase()
'JONESY'

There is no order to dictionaries, so don’t rely on it.

Quotes and string formatting
– You can use single and double quotes inside each other
– Inside triple quotes, you can use single and double quotes
– Variables are not recognized in strings, uses printf-style string formatting:

>>> word='World'
>>> punc='!'
>>> print "Hello, %s%s" % (word, punc)
Hello, World!

Braces, semicolons, indents
– Use indents instead of braces
– End-of-line instead of semicolons

if x == y:
print "x == y"
for k,v in mydict.iteritems():
if v is None:
continue
print "v has a value: %s" % v

This seems like it might be problematic because of long blocks of code, but apparently code blocks don’t get that long. You can also use folds in vim [now I need to look up what folds in vim are].

You can’t assign a value in a conditional statement’s expression — because you can’t use an = sign. This is on purpose, it avoids bugs resulting from typing if x=y instead of if x==y.

The construct has no place in production code anyway, since you give up catching any exceptions.

Python modules for sysadmins:
– sys
– os
– urlib/urlib2
– time, datetime (and calendar)
– fileinput
– stat
– filecmp
– glob (to use wildcards)
– shutil
– gzip
– tarfile
– hashlib, md5, crypt
– logging
– curses
– smtplib and email
– cmd

The Zen of Python
To get this, type ‘python’ in a unix environment, then type ‘import this’ at the commandline. I did this on my Windows laptop running Cygwin:

cabral@pythianbos2 ~
$ python
Python 2.5.2 (r252:60911, Dec  2 2008, 09:26:14)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

This was liveblogged, please let me know any issues, as they may be typos….

[author’s note: personally, store I use awk a bunch in MySQL DBA work, viagra for tasks like scrubbing data from a production export for use in qa/dev, but usually have to resort to Perl for really complex stuff, but now I know how to do .]

Basics:
By default, fields are separated by any number of spaces. The -F option to awk changes the separator on commandline.
Print the first field, fields are separated by a colon.
awk -F: '{print $1}' /etc/passwd

Print the first and fifth field:
awk -F: '{$print $1,$5}' /etc/passwd

Can pattern match and use files, so you can replace:
grep foo /etc/passwd | awk -F: '{print $1,$5}'
with:
awk -F: '/foo/ {print $1,$5}' /etc/passwd

NF = built in variable (no $) used to mean “field number”
This will print the first and last fields of lines where the first field matches “foo”
awk -F: '$1 ~/foo/ {print $1,$NF}' /etc/passwd

NF = number of fields, ie, “7″
$NF = value of last field, ie “/bin/bash”
(similarly, NR is record number)


Awk makes assumptions about input, variables, and processing that you’d otherwise have to code yourself.

– “main loop” of input processing is done for you
– awk initializes variables for you, to 0
– input is viewed by awk as ‘records’ which are splittable into ‘fields’

This all makes a lot of operations very concise in awk, many things can be done w/ a one-liner that would otherwise require several lines of code.

awk key points:
– splits text into fields
– default delimiter is “any number of spaces”
– reference fields
– $0 is entire line
– create filters using ‘addresses’ which can be regexps (similar to sed)
– Turing-complete language
– has if, while, for, do-while, etc
– built-in math like exp, log, rand, sin, cos
– built-in string sub, split, index, toupper/lower

Patterns and actions
Pattern is first, then action(s)
Actions are enclosed in {}

only a pattern, no action:
'length>42'
but, the default action is to print the whole line, so this will actually do something — print lines where the length of the line is > 42. strings are just arrays in awk

only action, no pattern:
{print $2,$1}
do this to all lines of input

NR % 3 == 0
print every third line (pattern is %NR mod 3)

{print $1, $NF, $(NF-1)}
print the first field, last field, and 2nd to last field

built-in variables
NF, NR we’ve done
FS = field separator (can be regexp)
OFMT = output format for numbers (default %.6g)

Patterns
– used to filter lines processed by awk
– can be regexp
/^root/ is the pattern in the following
awk -F:'/^root/ {print $1,$NF}' /etc/passwd

– Patterns can use fields and relational operators
To print 1st, 4th and last field if value of 4th field >10:
awk -F: '$4 > 10 {print $1, $4, $NF}' /etc/passwd

awk -F: '$0 !~ /^#/ && $4 > 10 {print $1, $4, $NF}' /etc/passwd

Range patterns
sed-like addressing : you can have start and end addresses
awk ‘NR==1,NR==3′
prints only first three lines of the file
You can use regular expressions in range patterns:
awk -F:’/^root/,/^daemon/ {print $1,$NF}’ /etc/passwd
start printing at the line that starts with “root”, the last line that is processed is the line starting with “daemon”

Range pattern “gotcha” – can’t mix a range with other patterns:
To do “start at non-commented line where value of $4 is less than 10, end at the first line where value of $4 is greater than 10″

This does not work!
awk -F: '$0 !~ /^#/ $4 <= 10, $4 > 10' /etc/passwd

This is how to do it, {next} is an action that skips:
awk -F: '$0 ~ /^#/ {next} $4 <= 10, $4 > 10 {print $1, $4' /etc/passwd

Basic Aggregation
awk -F: ‘$3 > 100 {x+=1; print x}’ /etc/passwd
This gives a line of output as each matching line is processed. This gives a running total of x.

awk -F: ‘$3 > 100 {x+=1} END {print x}’ /etc/passwd
This processes the “{print x}” action only after the entire file has been processed. This gives only the final value of x.

Arrays:
Support for regular arrays
Technically multi-dimensional arrays are not supported, but array indexes are not supported, so you can make your own associative arrays.

Example:
awk -F: ‘{x[$1] = $2*($4 – $3)} END {for(key in x) {print key, x[key]}}’ stocks.txt

The part before the END creates the associative array, the part after the END prints the array.

Extreme data munging:
awk -f: '{x[$1]=($2'($4 - $3))} END {for(z in x) {print z, x[z]}}' stocks.txt

ABC,100,12.14,19.12
FOO,100,24.01,17.45

output
BAR 271.5
ABC 698

For the line “ABC,100,12.14,19.12″
the function becomes

x[ABC] = 100 * (19.12 - 12.14) = 698

Aggregate across multiple variables:
awk -F, '{x[$1] = $2*($4 - $3); y+=x[$1]} END {for(z in x) {print z, x[z]}} {print "Net:"y}}' stocks.txt

Note that y is a running *sum* (not a running count like before).

Now, the above is hard to read, this is much easier.

#!/usr/bin/awk -f

BEGIN { FS="," }
{ x[$1] = $2*($4 - $3)
y+=x[$1]
}
END {
for(z in x) {
print z, x[z]
}
}  # end for loop
{
print "Net:"y
} # end END block

This was liveblogged, so please point out any issues, as they may be typos on my part….

The Beacon Pattern:
– This is a “Get out of the business” pattern
– Identify an oft-occurring and annoying task
– Automate and document it to the point of being able to hand it off to someone far less technical

Example:
– System admins were being put in charge of scheduling rooms in the building
– They wrote a PHP web application to help them automate the task
– They refined the app, cardiology and handed it off to a secretary
– They have to maintain the app, this web but it’s far less work.

The Community Pattern:

– Prior to launch of a new service, create user documentation for it.
– Point a few early adopters at the documentation and see if they can use the service with minimal support
– Use feedback to improve documentation, and the service
– Upon launch, create a mailing list, forum, IRC channel, or Jabber chat room and ask early adopters to help test it out.
– Upon launch, your early adopters are the community, and they’ll tell new users to use the tools you’ve provided instead of calling you.

Example:
– A beowulf cluster for an academic department
– Documented like crazy, early adopters were given early access to the cluster (demand was high)
– Crated a mailing list, early adopters were added to it with their consent, functionality was tested with them.
– Email announcing launch mentioned the early adopters in a ‘thank you’ secion, and linked them to their mailing list.

The DRY pattern
DRY = Don’t repeat yourself
Identify duplicate code in your automation scripts
Put subroutines that exist in an include file, and include them in your scripts.

Example:
– “sysadmin library”
– /var/lib/adm/.*pl
– Elapsed time and # of lines to script a task for which the library was useful plunged dramatically
– new tasks were thought up that were not considered before but were obvious now (ie, users that want to change their username)
– migrating to new services became much easier

The Chameleon Pattern
– Identify commonalities among your services
– Leverage those to create “Chameleon” servers that can be re-purposed on the fly
– Abstract as much of this away from the physical hardware
– Doesn’t need to involve virtualization, though it’s awfully handy if you can do it that way.
[this one is a bit harder to do with MySQL config files]

Example:
[puppet/cfengine were mentioned…]
ldapconfig.py – more than a script: a methodology

– But isn’t installing packages you don’t need bad? Depends on the package….ie, gcc is bad for enterprise

“Junior annoynances”

Terminal issues

Junior:
open terminal, login to machine1
think issue is with machine2, talks to machine1.
log out of machine1
log into machine2

Senior:
opens 2 terminals each of machine1 and machine2 to start

Junior:
networking issue ticket arrives
logs into server
runs tcpdump

Senior:
networking issue ticket arrives
logs into server
looks at logs

“Fix” vs. “Solution” ie “taking orders”
Junior will try fix a problem, senior will try to figure out what the problem is. ie, “I need a samba directory mounted under an NFS mount” a junior admin will try to do exactly that, a senior admin will ask “what are you trying to do with that?” because maybe all they need is a symlink.

Fanboyism
Signs you might be a fanboy:
– Disparaging users of latest stable release of $THING for not using the nightly (unstable) build which fixes more issues
– Creating false/invalid comparisons based on popular opinion instead of experience/facts
– Going against internal standards, breaking environmental consistency, to use $THING instead of $STANDARD (but this is also how disruptive technology works)
– Being in complete denial that most technology at some point or another stinks.
– Evaluating solutions based on “I like” instead of “we need” and “this does”.

Liveblog of the Professional IT Community Conference session Mentoring: It’s for everyone

Ways to learn:
Audio
Visual
Kinetic (doing it)

Everyone learns differently, cialis sale but most people learn with some combination of all these three.

However, help you can also learn by training [that’s the truth, otolaryngologist I learned a LOT by writing the book, even things I knew, I ended up needing to research more].

Ways to train:
Explanation
Observation
Demonstration
Questioning (Socratic Method)

What is a mentor?
noun: experienced and trusted adviser. It’s not just someone who teaches, it’s someone who advises.
experienced person in a company, college or schools who trains and counsels new employees or students.

verb: to advise or train (someone, esp. a younger colleague).

A mentorship is a safe place to ask questions.

A mentor is a trainer, but a trainer who also is a professional advisor.

Finding a mentor
Someone…..
– you respect/admire
– works with similar technology
– has a compatible personality
– you have a good rapport with

Being a mentor
– Teach technical skills
– Provide advanced technical/design guidance
– Model and teach professional skills
– Be interested and invested in the [student’s] career

On Monday, this web doctor his thoughts on Tuesday.

We have confirmed that there will be an entire MySQL track at Kaleidoscope! Because Kaleidoscope is less than 8 weeks away, dosage we could not go through a standard call for papers. Ronald and I have been working to come up with appropriate topics and speakers for an audience that uses MySQL but is probably more familiar with Oracle. We contacted folks we thought would be interested, pharmacy and who we thought could make it logistically, as the conference is in Washington, D.C.

We have (almost) finalized the list of speakers; the session abstracts will be finalized in the next few days. You can see the speakers at Kaleidoscope’s MySQL page, but I’ve also listed them below (alpha by last name):

Philip Antoniades, Sun/MySQL
Ronald Bradford, 42SQL
Sheeri K. Cabral, The Pythian Group
Laine Campbell, PalominoDB
Patrick Galbraith, Northscale
Sarah Novotny, Blue Gecko
Padraig O’Sullivan, Akiba Technologies Inc.
Jay Pipes, Rackspace Cloud
Dossy Shiobara, Panoptic.com
Matt Yonkovit, Percona

There are one or two more speakers we are waiting to hear back from. There will be 19 sessions, so some speakers will have more than one session.

I am very excited that MySQL has its own track at Kaleidoscope. In addition, Ronald and I will be able to attend our very first event as Oracle ACE Directors – the Sundown Sessions are a Birds-of-a-Feather-type discussion, with the Oracle ACE Directors being the panelists and the community asking questions. Immediately after the Sundown Sessions is a “Meet the Oracle ACE” event, the only part of the conference officially sponsored by Oracle.

I am attending the tomorrow.)

So I am in Seeking Senior and Beyond: The Tech Skills That Get You Promoted. The first part talks about the definition of what it means to be senior, nurse and it completely relates to DBA work:
works and plays well with other
understands “ability”
leads by example
lives to share knowledge
understands “Service”
thoughtful of the consequences of their actions
understands projects
cool under pressure

Good Qualities:
confident
empathetic
humane
personal
forthright
respectful
thorough

Bad Qualities:
disrespective
insensitive
incompetent
[my own addition – no follow through, dosage lack of attention to detail]

The Dice/Monster Factor – what do job sites see as important for a senior position?

They back up the SAGE 5-year experience requirement
Ability to code in newer languages (Ruby/Python) is more prevalent (perhaps cloud-induced?)

The cloud allows sysadmin tasks to be done by anyone…..so developers can do sysadmin work, and you end up seeing schizophrenic job descriptions such as

About the 5-year requirement:
– Senior after 5 years? What happens after 10 years?
– Most electricians, by comparison, haven’t even completed an *apprenticeship* in 5 years.

Senior Administrators Code
– not just 20-line shell scripts
– coding skills are part of a sysadmin skill
– ability to code competently *is* a factor that separates juniors from seniors
– hiring managers expect senior admins to be competent coders.

If you are not a coder
– pick a language, any language
– do not listen to fans, find one that fits how you think, they all work…..
– …that being said, some languages are more practical than others (ie, .NET probably is not the best language to learn if you are a Unix sysadmin).

Popular admin languages:
– Perl: classic admin scripting language. Learn at least the basics, because you will see it in any environment that has been around for more than 5 years.

– Ruby: object-oriented language for people who mostly like Perl (except for its OO implementation)

– Python: object-oriented language for people who mostly hate Perl, objects or no objects. For example, you don’t have to create a String object to send an output.

But what if you do not have time to learn how to program?

– senior admins are better at managing their time than junior admins, so perhaps managing time
– time management means you’ll have more time to do things, it doesn’t mean all work work work.
– Read Time Management for System Administrators – there is Google Video of a presentation by the author, Tom Limoncelli.

Consider “The Cloud”
– starting to use developer APIs to perform sysadmin tasks, so learning programming is good.
– still growing, could supplant large portions of datacenter real estate
– a coder with sysadmin knowledge: Good
– a sysadmin with coding knowledge: Good
– a coder without sysadmin knowledge: OK
– a sysadmin with no coding interest/experience: Tough place to be in

Senior Admins Have Problems Too
Many don’t document or share knowledge
Maany don’t do a good job keeping up with their craft
Cannot always be highlighted as an example of how to deal with clients
Often reinvent the wheel – also usually there is no repository
Often don’t progress beyond the “senior admin” role

….on the other hand…..
cynicism can be good…..

Advice:
learn from the good traits
observe how others respond to their bad traits
think about how you might improve upon that
strive to work and play well with others, even if you don’t have a mentor for good/bad examples.

Now he’s going into talking about Patterns in System Administration….

Why Python?

– Low WTF per minute factor
– Passes the 6-month test (if you write python code, here going back in 6 months, prostate you pretty much know what you were trying to do)
– Small Shift/no-Shift ratio (ie, you use the “Shift” key a lot in Perl because you use $ % ( ) { } etc, so you can tell what something is by context, not by $ or %)
– It’s hard to make a mess
– Objects if you need them, ignore them if you don’t.


Basics
Here’s a sample interpreter session. The >>> is the python prompt, and the … is the second/subsequent line prompt:

>>> x='hello, world!';
>>> x.upper()
'HELLO, WORLD!'
>>> def swapper(mystr):
... return mystr.swapcase()
File "<stdin>", line 2
return mystr.swapcase()
^
IndentationError: expected an indented block

You need to put a space on the second line because whitespace ‘tabbing’ is enforced in Python:

>>> def swapper(mystr):
...  return mystr.swapcase()
...
>>> swapper(x)
'HELLO, WORLD!'
>>> x
'hello, world!'

Substrings
partition is how to get substrings based on a separator:

>>> def parts(mystr, sep=','):
...  return mystr.partition(sep)
...
>>> parts(x, ',')
('hello', ',', ' world!')

You can replace text, too, using replace.

>>> def personalize(greeting, name='Brian'):
...  """Replaces 'world' with a given name"""
...  return greeting.replace('world', name)
...
>>> personalize(x, 'Brian')
'hello, Brian!'

By the way, the stuff in the triple quotes is automatic documentation. A double underscore, also called a “dunder”, is to print the stuff in the triple quotes:

>>> print personalize.__doc__
Replaces 'world' with a given name

Loop over a list of functions and do that function to some data:

>>> funclist=[swapper, personalize, parts]
>>> for func in funclist:
...  func(x)
...
'HELLO, WORLD!'
'hello, Brian!'
('hello', ',', ' world!')

Lists

>>> v=range(1,10)
>>> v
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> v[1]
2
>>> v[5]
6
>>> v[-1]
9
>>> v[-3]
7

List slicing with “:”
>>> v[:2]
[1, 2]
>>> v[4:]
[5, 6, 7, 8, 9]
>>> v[4:9]
[5, 6, 7, 8, 9]
Note that there’s no error returned even though there’s no field 9. If you did v[9], you’d get an error:
>>> v[9]
Traceback (most recent call last):
File ““, line 1, in
IndexError: list index out of range

Python uses pointers (or pointer-like things) so v[1:-1] does not print the first and last values:

>>> v[1:-1]
[2, 3, 4, 5, 6, 7, 8]

The full array syntax is [start:end:index increment]:

>>> v[::2]
[1, 3, 5, 7, 9]
>>> v[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1]
>>> v[1:-1:4]
[2, 6]
>>> v[::3]
[1, 4, 7]

Make an array of numbers with range

>>> l=range(10)
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Make a list from another list

>>> [pow(num,2) for num in l]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

append appends to the end of a list

>>> l.append( [pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]]
>>> l.pop()
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

extend takes a sequence and puts it at the end of the array.

>>> l.extend([pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

A list can be made of a transformation, an iteration and optional filter:
[ i*i for i in mylist if i % 2 == 0]
transformation is i*i
iteration is for i in mylist
optional filter is if i % 2 == 0

>>> L=range(1,6)
>>> L
[1, 2, 3, 4, 5]
>>> [ i*i for i in L if i % 2 == 0]
[4, 16]

Tuples
Tuples are immutable lists, and they use () instead of []
A tuple always has 2 elements, so a one-item tuple is defined as
x=(1,)

Dictionaries aka associative arrays/hashes:

>>> d = {'user':'jonesy', 'room':'1178'}
>>> d
{'user': 'jonesy', 'room': '1178'}
>>> d['user']
'jonesy'
>>> d.keys()
['user', 'room']
>>> d.values()
['jonesy', '1178']
>>> d.items()
[('user', 'jonesy'), ('room', '1178')]
>>> d.items()[0]
('user', 'jonesy')
>>> d.items()[0][1]
'jonesy'
>>> d.items()[0][1].swapcase()
'JONESY'

There is no order to dictionaries, so don’t rely on it.

Quotes and string formatting
– You can use single and double quotes inside each other
– Inside triple quotes, you can use single and double quotes
– Variables are not recognized in strings, uses printf-style string formatting:

>>> word='World'
>>> punc='!'
>>> print "Hello, %s%s" % (word, punc)
Hello, World!

Braces, semicolons, indents
– Use indents instead of braces
– End-of-line instead of semicolons

if x == y:
print "x == y"
for k,v in mydict.iteritems():
if v is None:
continue
print "v has a value: %s" % v

This seems like it might be problematic because of long blocks of code, but apparently code blocks don’t get that long. You can also use folds in vim [now I need to look up what folds in vim are].

You can’t assign a value in a conditional statement’s expression — because you can’t use an = sign. This is on purpose, it avoids bugs resulting from typing if x=y instead of if x==y.

The construct has no place in production code anyway, since you give up catching any exceptions.

Python modules for sysadmins:
– sys
– os
– urlib/urlib2
– time, datetime (and calendar)
– fileinput
– stat
– filecmp
– glob (to use wildcards)
– shutil
– gzip
– tarfile
– hashlib, md5, crypt
– logging
– curses
– smtplib and email
– cmd

The Zen of Python
To get this, type ‘python’ in a unix environment, then type ‘import this’ at the commandline. I did this on my Windows laptop running Cygwin:

cabral@pythianbos2 ~
$ python
Python 2.5.2 (r252:60911, Dec  2 2008, 09:26:14)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

This was liveblogged, please let me know any issues, as they may be typos….

[author’s note: personally, store I use awk a bunch in MySQL DBA work, viagra for tasks like scrubbing data from a production export for use in qa/dev, but usually have to resort to Perl for really complex stuff, but now I know how to do .]

Basics:
By default, fields are separated by any number of spaces. The -F option to awk changes the separator on commandline.
Print the first field, fields are separated by a colon.
awk -F: '{print $1}' /etc/passwd

Print the first and fifth field:
awk -F: '{$print $1,$5}' /etc/passwd

Can pattern match and use files, so you can replace:
grep foo /etc/passwd | awk -F: '{print $1,$5}'
with:
awk -F: '/foo/ {print $1,$5}' /etc/passwd

NF = built in variable (no $) used to mean “field number”
This will print the first and last fields of lines where the first field matches “foo”
awk -F: '$1 ~/foo/ {print $1,$NF}' /etc/passwd

NF = number of fields, ie, “7″
$NF = value of last field, ie “/bin/bash”
(similarly, NR is record number)


Awk makes assumptions about input, variables, and processing that you’d otherwise have to code yourself.

– “main loop” of input processing is done for you
– awk initializes variables for you, to 0
– input is viewed by awk as ‘records’ which are splittable into ‘fields’

This all makes a lot of operations very concise in awk, many things can be done w/ a one-liner that would otherwise require several lines of code.

awk key points:
– splits text into fields
– default delimiter is “any number of spaces”
– reference fields
– $0 is entire line
– create filters using ‘addresses’ which can be regexps (similar to sed)
– Turing-complete language
– has if, while, for, do-while, etc
– built-in math like exp, log, rand, sin, cos
– built-in string sub, split, index, toupper/lower

Patterns and actions
Pattern is first, then action(s)
Actions are enclosed in {}

only a pattern, no action:
'length>42'
but, the default action is to print the whole line, so this will actually do something — print lines where the length of the line is > 42. strings are just arrays in awk

only action, no pattern:
{print $2,$1}
do this to all lines of input

NR % 3 == 0
print every third line (pattern is %NR mod 3)

{print $1, $NF, $(NF-1)}
print the first field, last field, and 2nd to last field

built-in variables
NF, NR we’ve done
FS = field separator (can be regexp)
OFMT = output format for numbers (default %.6g)

Patterns
– used to filter lines processed by awk
– can be regexp
/^root/ is the pattern in the following
awk -F:'/^root/ {print $1,$NF}' /etc/passwd

– Patterns can use fields and relational operators
To print 1st, 4th and last field if value of 4th field >10:
awk -F: '$4 > 10 {print $1, $4, $NF}' /etc/passwd

awk -F: '$0 !~ /^#/ && $4 > 10 {print $1, $4, $NF}' /etc/passwd

Range patterns
sed-like addressing : you can have start and end addresses
awk ‘NR==1,NR==3′
prints only first three lines of the file
You can use regular expressions in range patterns:
awk -F:’/^root/,/^daemon/ {print $1,$NF}’ /etc/passwd
start printing at the line that starts with “root”, the last line that is processed is the line starting with “daemon”

Range pattern “gotcha” – can’t mix a range with other patterns:
To do “start at non-commented line where value of $4 is less than 10, end at the first line where value of $4 is greater than 10″

This does not work!
awk -F: '$0 !~ /^#/ $4 <= 10, $4 > 10' /etc/passwd

This is how to do it, {next} is an action that skips:
awk -F: '$0 ~ /^#/ {next} $4 <= 10, $4 > 10 {print $1, $4' /etc/passwd

Basic Aggregation
awk -F: ‘$3 > 100 {x+=1; print x}’ /etc/passwd
This gives a line of output as each matching line is processed. This gives a running total of x.

awk -F: ‘$3 > 100 {x+=1} END {print x}’ /etc/passwd
This processes the “{print x}” action only after the entire file has been processed. This gives only the final value of x.

Arrays:
Support for regular arrays
Technically multi-dimensional arrays are not supported, but array indexes are not supported, so you can make your own associative arrays.

Example:
awk -F: ‘{x[$1] = $2*($4 – $3)} END {for(key in x) {print key, x[key]}}’ stocks.txt

The part before the END creates the associative array, the part after the END prints the array.

Extreme data munging:
awk -f: '{x[$1]=($2'($4 - $3))} END {for(z in x) {print z, x[z]}}' stocks.txt

ABC,100,12.14,19.12
FOO,100,24.01,17.45

output
BAR 271.5
ABC 698

For the line “ABC,100,12.14,19.12″
the function becomes

x[ABC] = 100 * (19.12 - 12.14) = 698

Aggregate across multiple variables:
awk -F, '{x[$1] = $2*($4 - $3); y+=x[$1]} END {for(z in x) {print z, x[z]}} {print "Net:"y}}' stocks.txt

Note that y is a running *sum* (not a running count like before).

Now, the above is hard to read, this is much easier.

#!/usr/bin/awk -f

BEGIN { FS="," }
{ x[$1] = $2*($4 - $3)
y+=x[$1]
}
END {
for(z in x) {
print z, x[z]
}
}  # end for loop
{
print "Net:"y
} # end END block

This was liveblogged, so please point out any issues, as they may be typos on my part….

The Beacon Pattern:
– This is a “Get out of the business” pattern
– Identify an oft-occurring and annoying task
– Automate and document it to the point of being able to hand it off to someone far less technical

Example:
– System admins were being put in charge of scheduling rooms in the building
– They wrote a PHP web application to help them automate the task
– They refined the app, cardiology and handed it off to a secretary
– They have to maintain the app, this web but it’s far less work.

The Community Pattern:

– Prior to launch of a new service, create user documentation for it.
– Point a few early adopters at the documentation and see if they can use the service with minimal support
– Use feedback to improve documentation, and the service
– Upon launch, create a mailing list, forum, IRC channel, or Jabber chat room and ask early adopters to help test it out.
– Upon launch, your early adopters are the community, and they’ll tell new users to use the tools you’ve provided instead of calling you.

Example:
– A beowulf cluster for an academic department
– Documented like crazy, early adopters were given early access to the cluster (demand was high)
– Crated a mailing list, early adopters were added to it with their consent, functionality was tested with them.
– Email announcing launch mentioned the early adopters in a ‘thank you’ secion, and linked them to their mailing list.

The DRY pattern
DRY = Don’t repeat yourself
Identify duplicate code in your automation scripts
Put subroutines that exist in an include file, and include them in your scripts.

Example:
– “sysadmin library”
– /var/lib/adm/.*pl
– Elapsed time and # of lines to script a task for which the library was useful plunged dramatically
– new tasks were thought up that were not considered before but were obvious now (ie, users that want to change their username)
– migrating to new services became much easier

The Chameleon Pattern
– Identify commonalities among your services
– Leverage those to create “Chameleon” servers that can be re-purposed on the fly
– Abstract as much of this away from the physical hardware
– Doesn’t need to involve virtualization, though it’s awfully handy if you can do it that way.
[this one is a bit harder to do with MySQL config files]

Example:
[puppet/cfengine were mentioned…]
ldapconfig.py – more than a script: a methodology

– But isn’t installing packages you don’t need bad? Depends on the package….ie, gcc is bad for enterprise

“Junior annoynances”

Terminal issues

Junior:
open terminal, login to machine1
think issue is with machine2, talks to machine1.
log out of machine1
log into machine2

Senior:
opens 2 terminals each of machine1 and machine2 to start

Junior:
networking issue ticket arrives
logs into server
runs tcpdump

Senior:
networking issue ticket arrives
logs into server
looks at logs

“Fix” vs. “Solution” ie “taking orders”
Junior will try fix a problem, senior will try to figure out what the problem is. ie, “I need a samba directory mounted under an NFS mount” a junior admin will try to do exactly that, a senior admin will ask “what are you trying to do with that?” because maybe all they need is a symlink.

Fanboyism
Signs you might be a fanboy:
– Disparaging users of latest stable release of $THING for not using the nightly (unstable) build which fixes more issues
– Creating false/invalid comparisons based on popular opinion instead of experience/facts
– Going against internal standards, breaking environmental consistency, to use $THING instead of $STANDARD (but this is also how disruptive technology works)
– Being in complete denial that most technology at some point or another stinks.
– Evaluating solutions based on “I like” instead of “we need” and “this does”.

Liveblog of the Professional IT Community Conference session Mentoring: It’s for everyone

Ways to learn:
Audio
Visual
Kinetic (doing it)

Everyone learns differently, cialis sale but most people learn with some combination of all these three.

However, help you can also learn by training [that’s the truth, otolaryngologist I learned a LOT by writing the book, even things I knew, I ended up needing to research more].

Ways to train:
Explanation
Observation
Demonstration
Questioning (Socratic Method)

What is a mentor?
noun: experienced and trusted adviser. It’s not just someone who teaches, it’s someone who advises.
experienced person in a company, college or schools who trains and counsels new employees or students.

verb: to advise or train (someone, esp. a younger colleague).

A mentorship is a safe place to ask questions.

A mentor is a trainer, but a trainer who also is a professional advisor.

Finding a mentor
Someone…..
– you respect/admire
– works with similar technology
– has a compatible personality
– you have a good rapport with

Being a mentor
– Teach technical skills
– Provide advanced technical/design guidance
– Model and teach professional skills
– Be interested and invested in the [student’s] career

I am moderating and liveblogging the Professional IT Community Conference panel called Tech Women Rule! Creative Solutions for being a (or working with a) female technologist.

One point to keep in mind: The goal is not equality for equality’s sake. The goal is to have a diverse range of experience to make your company/project/whatever the best it could be.

That being said, tadalafil these issues are not just around women; they are about anyone who is “different”, whether it’s race, ethnicity, gender, sexual orientation, cultural.

So what are some of the solutions?

0) Better align expectations with reality. Are you expecting more from someone who is one gender than another? If a woman makes a mistake is it worse because she has to prove herself? Is it worse because she is representative of her gender? If she does something good is the achievement elevated more because of her gender? Either is bad.

1) Respect people for who they are. Everyone deserves respect; if someone is not at your technical level, they still deserve respect.

If someone says something that is completely wrong from a technical perspective, do not assume that they have no idea what they are talking about. It could be that they are the exact case in which that technical scenario is appropriate for them. If they are correct, your attitude will be refreshing and you might learn something. If they are indeed wrong, ask them about a scenario in which their thinking falls apart, or otherwise guide them through learning why what they are saying is wrong.

2) Be nice. Don’t condescend.

3) Be helpful. “RTFM, n00b!” is not helpful, and certainly does not follow rule #2.

4) Don’t do #1-3 for women only. Don’t treat women nicely because they’re women, and be a jerk to men because they’re men. Being helpful is good for anyone, not just women.

5) Cooperate, do not compete. Whether you are co-workers, working together on a software project, or just in a conversation, the game of “one-upping” another is a lot less useful than working together.

6) When hiring or when in an interview, concentrate on skills, not knowledge. “Skills” refers to attributes such as their ability to listen, how judgmental they are about a legacy system, whether they are open to new ideas, whether they disdain anything that is not cutting edge, and even technical skills such as thinking patterns, research patterns, algorithms, etc.

If someone says “I don’t know” in an interview, ask them “how would you go about figuring it out?” If someone says “I think it’s x and y” ask “how would you confirm/test that?” If a backup failed, do they start the backup over or do they try to figure out why it failed?

Are they thorough? Do they follow through? It is a lot easier to teach knowledge than it is to teach something like “debugging skills”.

7) Specifically encourage people to speak up. Train yourself to NOTICE when folks are not speaking up, and ask them if they have any suggestions or ideas.

8) If you are running an IT conference, specifically ask qualified women you know to speak, not about “women in IT”. If you hear of an IT conference, tell specific women you know that you think they would be a great speaker. Get women to speak at local user groups to get practice in a less intimidating space.

Resources:
Read HOWTO Encourage Women in Linux

You Just Don’t Understand
The Male Mind at Work
How to succeed in business without a penis: a guide for working women (this is a humor book)

Join and/or send messages to Systers, the world’s largest e-mail list of women in computing and technology fields.

On Monday, this web doctor his thoughts on Tuesday.

We have confirmed that there will be an entire MySQL track at Kaleidoscope! Because Kaleidoscope is less than 8 weeks away, dosage we could not go through a standard call for papers. Ronald and I have been working to come up with appropriate topics and speakers for an audience that uses MySQL but is probably more familiar with Oracle. We contacted folks we thought would be interested, pharmacy and who we thought could make it logistically, as the conference is in Washington, D.C.

We have (almost) finalized the list of speakers; the session abstracts will be finalized in the next few days. You can see the speakers at Kaleidoscope’s MySQL page, but I’ve also listed them below (alpha by last name):

Philip Antoniades, Sun/MySQL
Ronald Bradford, 42SQL
Sheeri K. Cabral, The Pythian Group
Laine Campbell, PalominoDB
Patrick Galbraith, Northscale
Sarah Novotny, Blue Gecko
Padraig O’Sullivan, Akiba Technologies Inc.
Jay Pipes, Rackspace Cloud
Dossy Shiobara, Panoptic.com
Matt Yonkovit, Percona

There are one or two more speakers we are waiting to hear back from. There will be 19 sessions, so some speakers will have more than one session.

I am very excited that MySQL has its own track at Kaleidoscope. In addition, Ronald and I will be able to attend our very first event as Oracle ACE Directors – the Sundown Sessions are a Birds-of-a-Feather-type discussion, with the Oracle ACE Directors being the panelists and the community asking questions. Immediately after the Sundown Sessions is a “Meet the Oracle ACE” event, the only part of the conference officially sponsored by Oracle.

I am attending the tomorrow.)

So I am in Seeking Senior and Beyond: The Tech Skills That Get You Promoted. The first part talks about the definition of what it means to be senior, nurse and it completely relates to DBA work:
works and plays well with other
understands “ability”
leads by example
lives to share knowledge
understands “Service”
thoughtful of the consequences of their actions
understands projects
cool under pressure

Good Qualities:
confident
empathetic
humane
personal
forthright
respectful
thorough

Bad Qualities:
disrespective
insensitive
incompetent
[my own addition – no follow through, dosage lack of attention to detail]

The Dice/Monster Factor – what do job sites see as important for a senior position?

They back up the SAGE 5-year experience requirement
Ability to code in newer languages (Ruby/Python) is more prevalent (perhaps cloud-induced?)

The cloud allows sysadmin tasks to be done by anyone…..so developers can do sysadmin work, and you end up seeing schizophrenic job descriptions such as

About the 5-year requirement:
– Senior after 5 years? What happens after 10 years?
– Most electricians, by comparison, haven’t even completed an *apprenticeship* in 5 years.

Senior Administrators Code
– not just 20-line shell scripts
– coding skills are part of a sysadmin skill
– ability to code competently *is* a factor that separates juniors from seniors
– hiring managers expect senior admins to be competent coders.

If you are not a coder
– pick a language, any language
– do not listen to fans, find one that fits how you think, they all work…..
– …that being said, some languages are more practical than others (ie, .NET probably is not the best language to learn if you are a Unix sysadmin).

Popular admin languages:
– Perl: classic admin scripting language. Learn at least the basics, because you will see it in any environment that has been around for more than 5 years.

– Ruby: object-oriented language for people who mostly like Perl (except for its OO implementation)

– Python: object-oriented language for people who mostly hate Perl, objects or no objects. For example, you don’t have to create a String object to send an output.

But what if you do not have time to learn how to program?

– senior admins are better at managing their time than junior admins, so perhaps managing time
– time management means you’ll have more time to do things, it doesn’t mean all work work work.
– Read Time Management for System Administrators – there is Google Video of a presentation by the author, Tom Limoncelli.

Consider “The Cloud”
– starting to use developer APIs to perform sysadmin tasks, so learning programming is good.
– still growing, could supplant large portions of datacenter real estate
– a coder with sysadmin knowledge: Good
– a sysadmin with coding knowledge: Good
– a coder without sysadmin knowledge: OK
– a sysadmin with no coding interest/experience: Tough place to be in

Senior Admins Have Problems Too
Many don’t document or share knowledge
Maany don’t do a good job keeping up with their craft
Cannot always be highlighted as an example of how to deal with clients
Often reinvent the wheel – also usually there is no repository
Often don’t progress beyond the “senior admin” role

….on the other hand…..
cynicism can be good…..

Advice:
learn from the good traits
observe how others respond to their bad traits
think about how you might improve upon that
strive to work and play well with others, even if you don’t have a mentor for good/bad examples.

Now he’s going into talking about Patterns in System Administration….

Why Python?

– Low WTF per minute factor
– Passes the 6-month test (if you write python code, here going back in 6 months, prostate you pretty much know what you were trying to do)
– Small Shift/no-Shift ratio (ie, you use the “Shift” key a lot in Perl because you use $ % ( ) { } etc, so you can tell what something is by context, not by $ or %)
– It’s hard to make a mess
– Objects if you need them, ignore them if you don’t.


Basics
Here’s a sample interpreter session. The >>> is the python prompt, and the … is the second/subsequent line prompt:

>>> x='hello, world!';
>>> x.upper()
'HELLO, WORLD!'
>>> def swapper(mystr):
... return mystr.swapcase()
File "<stdin>", line 2
return mystr.swapcase()
^
IndentationError: expected an indented block

You need to put a space on the second line because whitespace ‘tabbing’ is enforced in Python:

>>> def swapper(mystr):
...  return mystr.swapcase()
...
>>> swapper(x)
'HELLO, WORLD!'
>>> x
'hello, world!'

Substrings
partition is how to get substrings based on a separator:

>>> def parts(mystr, sep=','):
...  return mystr.partition(sep)
...
>>> parts(x, ',')
('hello', ',', ' world!')

You can replace text, too, using replace.

>>> def personalize(greeting, name='Brian'):
...  """Replaces 'world' with a given name"""
...  return greeting.replace('world', name)
...
>>> personalize(x, 'Brian')
'hello, Brian!'

By the way, the stuff in the triple quotes is automatic documentation. A double underscore, also called a “dunder”, is to print the stuff in the triple quotes:

>>> print personalize.__doc__
Replaces 'world' with a given name

Loop over a list of functions and do that function to some data:

>>> funclist=[swapper, personalize, parts]
>>> for func in funclist:
...  func(x)
...
'HELLO, WORLD!'
'hello, Brian!'
('hello', ',', ' world!')

Lists

>>> v=range(1,10)
>>> v
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> v[1]
2
>>> v[5]
6
>>> v[-1]
9
>>> v[-3]
7

List slicing with “:”
>>> v[:2]
[1, 2]
>>> v[4:]
[5, 6, 7, 8, 9]
>>> v[4:9]
[5, 6, 7, 8, 9]
Note that there’s no error returned even though there’s no field 9. If you did v[9], you’d get an error:
>>> v[9]
Traceback (most recent call last):
File ““, line 1, in
IndexError: list index out of range

Python uses pointers (or pointer-like things) so v[1:-1] does not print the first and last values:

>>> v[1:-1]
[2, 3, 4, 5, 6, 7, 8]

The full array syntax is [start:end:index increment]:

>>> v[::2]
[1, 3, 5, 7, 9]
>>> v[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1]
>>> v[1:-1:4]
[2, 6]
>>> v[::3]
[1, 4, 7]

Make an array of numbers with range

>>> l=range(10)
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Make a list from another list

>>> [pow(num,2) for num in l]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

append appends to the end of a list

>>> l.append( [pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]]
>>> l.pop()
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

extend takes a sequence and puts it at the end of the array.

>>> l.extend([pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

A list can be made of a transformation, an iteration and optional filter:
[ i*i for i in mylist if i % 2 == 0]
transformation is i*i
iteration is for i in mylist
optional filter is if i % 2 == 0

>>> L=range(1,6)
>>> L
[1, 2, 3, 4, 5]
>>> [ i*i for i in L if i % 2 == 0]
[4, 16]

Tuples
Tuples are immutable lists, and they use () instead of []
A tuple always has 2 elements, so a one-item tuple is defined as
x=(1,)

Dictionaries aka associative arrays/hashes:

>>> d = {'user':'jonesy', 'room':'1178'}
>>> d
{'user': 'jonesy', 'room': '1178'}
>>> d['user']
'jonesy'
>>> d.keys()
['user', 'room']
>>> d.values()
['jonesy', '1178']
>>> d.items()
[('user', 'jonesy'), ('room', '1178')]
>>> d.items()[0]
('user', 'jonesy')
>>> d.items()[0][1]
'jonesy'
>>> d.items()[0][1].swapcase()
'JONESY'

There is no order to dictionaries, so don’t rely on it.

Quotes and string formatting
– You can use single and double quotes inside each other
– Inside triple quotes, you can use single and double quotes
– Variables are not recognized in strings, uses printf-style string formatting:

>>> word='World'
>>> punc='!'
>>> print "Hello, %s%s" % (word, punc)
Hello, World!

Braces, semicolons, indents
– Use indents instead of braces
– End-of-line instead of semicolons

if x == y:
print "x == y"
for k,v in mydict.iteritems():
if v is None:
continue
print "v has a value: %s" % v

This seems like it might be problematic because of long blocks of code, but apparently code blocks don’t get that long. You can also use folds in vim [now I need to look up what folds in vim are].

You can’t assign a value in a conditional statement’s expression — because you can’t use an = sign. This is on purpose, it avoids bugs resulting from typing if x=y instead of if x==y.

The construct has no place in production code anyway, since you give up catching any exceptions.

Python modules for sysadmins:
– sys
– os
– urlib/urlib2
– time, datetime (and calendar)
– fileinput
– stat
– filecmp
– glob (to use wildcards)
– shutil
– gzip
– tarfile
– hashlib, md5, crypt
– logging
– curses
– smtplib and email
– cmd

The Zen of Python
To get this, type ‘python’ in a unix environment, then type ‘import this’ at the commandline. I did this on my Windows laptop running Cygwin:

cabral@pythianbos2 ~
$ python
Python 2.5.2 (r252:60911, Dec  2 2008, 09:26:14)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

This was liveblogged, please let me know any issues, as they may be typos….

[author’s note: personally, store I use awk a bunch in MySQL DBA work, viagra for tasks like scrubbing data from a production export for use in qa/dev, but usually have to resort to Perl for really complex stuff, but now I know how to do .]

Basics:
By default, fields are separated by any number of spaces. The -F option to awk changes the separator on commandline.
Print the first field, fields are separated by a colon.
awk -F: '{print $1}' /etc/passwd

Print the first and fifth field:
awk -F: '{$print $1,$5}' /etc/passwd

Can pattern match and use files, so you can replace:
grep foo /etc/passwd | awk -F: '{print $1,$5}'
with:
awk -F: '/foo/ {print $1,$5}' /etc/passwd

NF = built in variable (no $) used to mean “field number”
This will print the first and last fields of lines where the first field matches “foo”
awk -F: '$1 ~/foo/ {print $1,$NF}' /etc/passwd

NF = number of fields, ie, “7″
$NF = value of last field, ie “/bin/bash”
(similarly, NR is record number)


Awk makes assumptions about input, variables, and processing that you’d otherwise have to code yourself.

– “main loop” of input processing is done for you
– awk initializes variables for you, to 0
– input is viewed by awk as ‘records’ which are splittable into ‘fields’

This all makes a lot of operations very concise in awk, many things can be done w/ a one-liner that would otherwise require several lines of code.

awk key points:
– splits text into fields
– default delimiter is “any number of spaces”
– reference fields
– $0 is entire line
– create filters using ‘addresses’ which can be regexps (similar to sed)
– Turing-complete language
– has if, while, for, do-while, etc
– built-in math like exp, log, rand, sin, cos
– built-in string sub, split, index, toupper/lower

Patterns and actions
Pattern is first, then action(s)
Actions are enclosed in {}

only a pattern, no action:
'length>42'
but, the default action is to print the whole line, so this will actually do something — print lines where the length of the line is > 42. strings are just arrays in awk

only action, no pattern:
{print $2,$1}
do this to all lines of input

NR % 3 == 0
print every third line (pattern is %NR mod 3)

{print $1, $NF, $(NF-1)}
print the first field, last field, and 2nd to last field

built-in variables
NF, NR we’ve done
FS = field separator (can be regexp)
OFMT = output format for numbers (default %.6g)

Patterns
– used to filter lines processed by awk
– can be regexp
/^root/ is the pattern in the following
awk -F:'/^root/ {print $1,$NF}' /etc/passwd

– Patterns can use fields and relational operators
To print 1st, 4th and last field if value of 4th field >10:
awk -F: '$4 > 10 {print $1, $4, $NF}' /etc/passwd

awk -F: '$0 !~ /^#/ && $4 > 10 {print $1, $4, $NF}' /etc/passwd

Range patterns
sed-like addressing : you can have start and end addresses
awk ‘NR==1,NR==3′
prints only first three lines of the file
You can use regular expressions in range patterns:
awk -F:’/^root/,/^daemon/ {print $1,$NF}’ /etc/passwd
start printing at the line that starts with “root”, the last line that is processed is the line starting with “daemon”

Range pattern “gotcha” – can’t mix a range with other patterns:
To do “start at non-commented line where value of $4 is less than 10, end at the first line where value of $4 is greater than 10″

This does not work!
awk -F: '$0 !~ /^#/ $4 <= 10, $4 > 10' /etc/passwd

This is how to do it, {next} is an action that skips:
awk -F: '$0 ~ /^#/ {next} $4 <= 10, $4 > 10 {print $1, $4' /etc/passwd

Basic Aggregation
awk -F: ‘$3 > 100 {x+=1; print x}’ /etc/passwd
This gives a line of output as each matching line is processed. This gives a running total of x.

awk -F: ‘$3 > 100 {x+=1} END {print x}’ /etc/passwd
This processes the “{print x}” action only after the entire file has been processed. This gives only the final value of x.

Arrays:
Support for regular arrays
Technically multi-dimensional arrays are not supported, but array indexes are not supported, so you can make your own associative arrays.

Example:
awk -F: ‘{x[$1] = $2*($4 – $3)} END {for(key in x) {print key, x[key]}}’ stocks.txt

The part before the END creates the associative array, the part after the END prints the array.

Extreme data munging:
awk -f: '{x[$1]=($2'($4 - $3))} END {for(z in x) {print z, x[z]}}' stocks.txt

ABC,100,12.14,19.12
FOO,100,24.01,17.45

output
BAR 271.5
ABC 698

For the line “ABC,100,12.14,19.12″
the function becomes

x[ABC] = 100 * (19.12 - 12.14) = 698

Aggregate across multiple variables:
awk -F, '{x[$1] = $2*($4 - $3); y+=x[$1]} END {for(z in x) {print z, x[z]}} {print "Net:"y}}' stocks.txt

Note that y is a running *sum* (not a running count like before).

Now, the above is hard to read, this is much easier.

#!/usr/bin/awk -f

BEGIN { FS="," }
{ x[$1] = $2*($4 - $3)
y+=x[$1]
}
END {
for(z in x) {
print z, x[z]
}
}  # end for loop
{
print "Net:"y
} # end END block

This was liveblogged, so please point out any issues, as they may be typos on my part….

The Beacon Pattern:
– This is a “Get out of the business” pattern
– Identify an oft-occurring and annoying task
– Automate and document it to the point of being able to hand it off to someone far less technical

Example:
– System admins were being put in charge of scheduling rooms in the building
– They wrote a PHP web application to help them automate the task
– They refined the app, cardiology and handed it off to a secretary
– They have to maintain the app, this web but it’s far less work.

The Community Pattern:

– Prior to launch of a new service, create user documentation for it.
– Point a few early adopters at the documentation and see if they can use the service with minimal support
– Use feedback to improve documentation, and the service
– Upon launch, create a mailing list, forum, IRC channel, or Jabber chat room and ask early adopters to help test it out.
– Upon launch, your early adopters are the community, and they’ll tell new users to use the tools you’ve provided instead of calling you.

Example:
– A beowulf cluster for an academic department
– Documented like crazy, early adopters were given early access to the cluster (demand was high)
– Crated a mailing list, early adopters were added to it with their consent, functionality was tested with them.
– Email announcing launch mentioned the early adopters in a ‘thank you’ secion, and linked them to their mailing list.

The DRY pattern
DRY = Don’t repeat yourself
Identify duplicate code in your automation scripts
Put subroutines that exist in an include file, and include them in your scripts.

Example:
– “sysadmin library”
– /var/lib/adm/.*pl
– Elapsed time and # of lines to script a task for which the library was useful plunged dramatically
– new tasks were thought up that were not considered before but were obvious now (ie, users that want to change their username)
– migrating to new services became much easier

The Chameleon Pattern
– Identify commonalities among your services
– Leverage those to create “Chameleon” servers that can be re-purposed on the fly
– Abstract as much of this away from the physical hardware
– Doesn’t need to involve virtualization, though it’s awfully handy if you can do it that way.
[this one is a bit harder to do with MySQL config files]

Example:
[puppet/cfengine were mentioned…]
ldapconfig.py – more than a script: a methodology

– But isn’t installing packages you don’t need bad? Depends on the package….ie, gcc is bad for enterprise

“Junior annoynances”

Terminal issues

Junior:
open terminal, login to machine1
think issue is with machine2, talks to machine1.
log out of machine1
log into machine2

Senior:
opens 2 terminals each of machine1 and machine2 to start

Junior:
networking issue ticket arrives
logs into server
runs tcpdump

Senior:
networking issue ticket arrives
logs into server
looks at logs

“Fix” vs. “Solution” ie “taking orders”
Junior will try fix a problem, senior will try to figure out what the problem is. ie, “I need a samba directory mounted under an NFS mount” a junior admin will try to do exactly that, a senior admin will ask “what are you trying to do with that?” because maybe all they need is a symlink.

Fanboyism
Signs you might be a fanboy:
– Disparaging users of latest stable release of $THING for not using the nightly (unstable) build which fixes more issues
– Creating false/invalid comparisons based on popular opinion instead of experience/facts
– Going against internal standards, breaking environmental consistency, to use $THING instead of $STANDARD (but this is also how disruptive technology works)
– Being in complete denial that most technology at some point or another stinks.
– Evaluating solutions based on “I like” instead of “we need” and “this does”.

Liveblog of the Professional IT Community Conference session Mentoring: It’s for everyone

Ways to learn:
Audio
Visual
Kinetic (doing it)

Everyone learns differently, cialis sale but most people learn with some combination of all these three.

However, help you can also learn by training [that’s the truth, otolaryngologist I learned a LOT by writing the book, even things I knew, I ended up needing to research more].

Ways to train:
Explanation
Observation
Demonstration
Questioning (Socratic Method)

What is a mentor?
noun: experienced and trusted adviser. It’s not just someone who teaches, it’s someone who advises.
experienced person in a company, college or schools who trains and counsels new employees or students.

verb: to advise or train (someone, esp. a younger colleague).

A mentorship is a safe place to ask questions.

A mentor is a trainer, but a trainer who also is a professional advisor.

Finding a mentor
Someone…..
– you respect/admire
– works with similar technology
– has a compatible personality
– you have a good rapport with

Being a mentor
– Teach technical skills
– Provide advanced technical/design guidance
– Model and teach professional skills
– Be interested and invested in the [student’s] career

I am moderating and liveblogging the Professional IT Community Conference panel called Tech Women Rule! Creative Solutions for being a (or working with a) female technologist.

One point to keep in mind: The goal is not equality for equality’s sake. The goal is to have a diverse range of experience to make your company/project/whatever the best it could be.

That being said, tadalafil these issues are not just around women; they are about anyone who is “different”, whether it’s race, ethnicity, gender, sexual orientation, cultural.

So what are some of the solutions?

0) Better align expectations with reality. Are you expecting more from someone who is one gender than another? If a woman makes a mistake is it worse because she has to prove herself? Is it worse because she is representative of her gender? If she does something good is the achievement elevated more because of her gender? Either is bad.

1) Respect people for who they are. Everyone deserves respect; if someone is not at your technical level, they still deserve respect.

If someone says something that is completely wrong from a technical perspective, do not assume that they have no idea what they are talking about. It could be that they are the exact case in which that technical scenario is appropriate for them. If they are correct, your attitude will be refreshing and you might learn something. If they are indeed wrong, ask them about a scenario in which their thinking falls apart, or otherwise guide them through learning why what they are saying is wrong.

2) Be nice. Don’t condescend.

3) Be helpful. “RTFM, n00b!” is not helpful, and certainly does not follow rule #2.

4) Don’t do #1-3 for women only. Don’t treat women nicely because they’re women, and be a jerk to men because they’re men. Being helpful is good for anyone, not just women.

5) Cooperate, do not compete. Whether you are co-workers, working together on a software project, or just in a conversation, the game of “one-upping” another is a lot less useful than working together.

6) When hiring or when in an interview, concentrate on skills, not knowledge. “Skills” refers to attributes such as their ability to listen, how judgmental they are about a legacy system, whether they are open to new ideas, whether they disdain anything that is not cutting edge, and even technical skills such as thinking patterns, research patterns, algorithms, etc.

If someone says “I don’t know” in an interview, ask them “how would you go about figuring it out?” If someone says “I think it’s x and y” ask “how would you confirm/test that?” If a backup failed, do they start the backup over or do they try to figure out why it failed?

Are they thorough? Do they follow through? It is a lot easier to teach knowledge than it is to teach something like “debugging skills”.

7) Specifically encourage people to speak up. Train yourself to NOTICE when folks are not speaking up, and ask them if they have any suggestions or ideas.

8) If you are running an IT conference, specifically ask qualified women you know to speak, not about “women in IT”. If you hear of an IT conference, tell specific women you know that you think they would be a great speaker. Get women to speak at local user groups to get practice in a less intimidating space.

Resources:
Read HOWTO Encourage Women in Linux

You Just Don’t Understand
The Male Mind at Work
How to succeed in business without a penis: a guide for working women (this is a humor book)

Join and/or send messages to Systers, the world’s largest e-mail list of women in computing and technology fields.

In his advice is “do not change sort_buffer_size from the default.”

Baron did not explain the logic behind his reasoning, information pills he handwaves that “people utterly ruin their server performance and stability with it, check ” but does not explain how changing the sort_buffer_size kills performance and stability. Regardless of how respected and knowledgeable the source, NEVER take any advice that tells you what to do or how to do it without understanding WHY.

This article will explain the “why” of Baron’s point, and it will also talk more about understanding why, an integral part against the “Battle against any guess.” Baron’s recommendation to leave sort_buffer_size as the default is just as bad as all the advice given to change the sort_buffer_size, because all that advice (including Baron’s) does not explain the underlying causes.

First, I explain the sort_buffer_size issue. The sort buffer size, as the name implies, is a memory buffer used when ordering is needed (usually for GROUP BY and ORDER BY clauses, when the index used for the filter/join does not follow the GROUP/ORDER BY order). Increasing the sort_buffer_size means allowing more memory to be used for the sorting process.

Increasing the sort_buffer_size usually improves performance because more memory is used in sorting. It can be detrimental to performance because the full size of the sort buffer is allocated for each thread that needs to do a sort, even if that sort does not need a very large sort buffer.

A better optimization would be to change the schema and/or queries so that all that sorting is not necessary. Increasing the sort_buffer_size gives you a false sense of security that your server is performing better. Your server is performing the same tasks, only faster — the best optimization is to make the tasks smaller or eliminate some tasks. If you can have queries without so much sorting, that’s a much better optimization than changing sort_buffer_size.

That being said, increasing the sort_buffer_size is a perfectly acceptable stop-gap solution that can be implemented RIGHT NOW (it’s a dynamic variable), while you examine your queries by doing a query review with a tool such as mk-query-digest. This is indeed what Pythian does — and, by the way, not only do we recommend that course of action, but we explain it to you and help you find and optimize the queries in question.

That all assumes that having lots of sorts that require lots of memory is a bad thing. It may be that you have tuned your queries and schema such that you have eliminated as many sorts as you can, but some may remain. An intensive data mining server is a good example of a situation in which permanently increasing the sort_buffer_size may be the right solution.

Now that we have the specifics of this situation out of the way, let’s look at the Battle Against Any Guess. This is a movement against guessing games. Understanding what you are doing is essential; in the case of sort_buffer_size, you can believe that you know what you are doing by increasing sort_buffer_size. However, the real solution to the problem lies in changing the queries, not changing the memory patterns.

There is a 6-page description of the “Battle against any guess” in the Northern California Oracle User Group’s May Journal, starting on page 13. The examples are specific to Oracle, but the points made are sound even if you do not know Oracle well. For example:

Blindly implementing best practices is nothing different from guesswork; we are applying some past-proven solutions without measuring how they stand against our requirements, and without testing whether they bring us any closer to the targets we have. Industry has become so obsessed with best practices that we commonly see projects in which reviewing an environment for compliance with best practices is the ultimate goal.

One good reason you need to know *why* is also mentioned in the article: The second danger of best practices is that they easily become myths. The technology keeps improving and issues addressed by certain best practices might not be relevant anymore in the next software version.

So, even from respected folks like Baron or myself, do not take advice on face value. Ask why, understand why, and then think if there is another level. It is not always easy; often you think you understand but really you miss that other level – such as with sort_buffer_size.

On Monday, this web doctor his thoughts on Tuesday.

We have confirmed that there will be an entire MySQL track at Kaleidoscope! Because Kaleidoscope is less than 8 weeks away, dosage we could not go through a standard call for papers. Ronald and I have been working to come up with appropriate topics and speakers for an audience that uses MySQL but is probably more familiar with Oracle. We contacted folks we thought would be interested, pharmacy and who we thought could make it logistically, as the conference is in Washington, D.C.

We have (almost) finalized the list of speakers; the session abstracts will be finalized in the next few days. You can see the speakers at Kaleidoscope’s MySQL page, but I’ve also listed them below (alpha by last name):

Philip Antoniades, Sun/MySQL
Ronald Bradford, 42SQL
Sheeri K. Cabral, The Pythian Group
Laine Campbell, PalominoDB
Patrick Galbraith, Northscale
Sarah Novotny, Blue Gecko
Padraig O’Sullivan, Akiba Technologies Inc.
Jay Pipes, Rackspace Cloud
Dossy Shiobara, Panoptic.com
Matt Yonkovit, Percona

There are one or two more speakers we are waiting to hear back from. There will be 19 sessions, so some speakers will have more than one session.

I am very excited that MySQL has its own track at Kaleidoscope. In addition, Ronald and I will be able to attend our very first event as Oracle ACE Directors – the Sundown Sessions are a Birds-of-a-Feather-type discussion, with the Oracle ACE Directors being the panelists and the community asking questions. Immediately after the Sundown Sessions is a “Meet the Oracle ACE” event, the only part of the conference officially sponsored by Oracle.

I am attending the tomorrow.)

So I am in Seeking Senior and Beyond: The Tech Skills That Get You Promoted. The first part talks about the definition of what it means to be senior, nurse and it completely relates to DBA work:
works and plays well with other
understands “ability”
leads by example
lives to share knowledge
understands “Service”
thoughtful of the consequences of their actions
understands projects
cool under pressure

Good Qualities:
confident
empathetic
humane
personal
forthright
respectful
thorough

Bad Qualities:
disrespective
insensitive
incompetent
[my own addition – no follow through, dosage lack of attention to detail]

The Dice/Monster Factor – what do job sites see as important for a senior position?

They back up the SAGE 5-year experience requirement
Ability to code in newer languages (Ruby/Python) is more prevalent (perhaps cloud-induced?)

The cloud allows sysadmin tasks to be done by anyone…..so developers can do sysadmin work, and you end up seeing schizophrenic job descriptions such as

About the 5-year requirement:
– Senior after 5 years? What happens after 10 years?
– Most electricians, by comparison, haven’t even completed an *apprenticeship* in 5 years.

Senior Administrators Code
– not just 20-line shell scripts
– coding skills are part of a sysadmin skill
– ability to code competently *is* a factor that separates juniors from seniors
– hiring managers expect senior admins to be competent coders.

If you are not a coder
– pick a language, any language
– do not listen to fans, find one that fits how you think, they all work…..
– …that being said, some languages are more practical than others (ie, .NET probably is not the best language to learn if you are a Unix sysadmin).

Popular admin languages:
– Perl: classic admin scripting language. Learn at least the basics, because you will see it in any environment that has been around for more than 5 years.

– Ruby: object-oriented language for people who mostly like Perl (except for its OO implementation)

– Python: object-oriented language for people who mostly hate Perl, objects or no objects. For example, you don’t have to create a String object to send an output.

But what if you do not have time to learn how to program?

– senior admins are better at managing their time than junior admins, so perhaps managing time
– time management means you’ll have more time to do things, it doesn’t mean all work work work.
– Read Time Management for System Administrators – there is Google Video of a presentation by the author, Tom Limoncelli.

Consider “The Cloud”
– starting to use developer APIs to perform sysadmin tasks, so learning programming is good.
– still growing, could supplant large portions of datacenter real estate
– a coder with sysadmin knowledge: Good
– a sysadmin with coding knowledge: Good
– a coder without sysadmin knowledge: OK
– a sysadmin with no coding interest/experience: Tough place to be in

Senior Admins Have Problems Too
Many don’t document or share knowledge
Maany don’t do a good job keeping up with their craft
Cannot always be highlighted as an example of how to deal with clients
Often reinvent the wheel – also usually there is no repository
Often don’t progress beyond the “senior admin” role

….on the other hand…..
cynicism can be good…..

Advice:
learn from the good traits
observe how others respond to their bad traits
think about how you might improve upon that
strive to work and play well with others, even if you don’t have a mentor for good/bad examples.

Now he’s going into talking about Patterns in System Administration….

Why Python?

– Low WTF per minute factor
– Passes the 6-month test (if you write python code, here going back in 6 months, prostate you pretty much know what you were trying to do)
– Small Shift/no-Shift ratio (ie, you use the “Shift” key a lot in Perl because you use $ % ( ) { } etc, so you can tell what something is by context, not by $ or %)
– It’s hard to make a mess
– Objects if you need them, ignore them if you don’t.


Basics
Here’s a sample interpreter session. The >>> is the python prompt, and the … is the second/subsequent line prompt:

>>> x='hello, world!';
>>> x.upper()
'HELLO, WORLD!'
>>> def swapper(mystr):
... return mystr.swapcase()
File "<stdin>", line 2
return mystr.swapcase()
^
IndentationError: expected an indented block

You need to put a space on the second line because whitespace ‘tabbing’ is enforced in Python:

>>> def swapper(mystr):
...  return mystr.swapcase()
...
>>> swapper(x)
'HELLO, WORLD!'
>>> x
'hello, world!'

Substrings
partition is how to get substrings based on a separator:

>>> def parts(mystr, sep=','):
...  return mystr.partition(sep)
...
>>> parts(x, ',')
('hello', ',', ' world!')

You can replace text, too, using replace.

>>> def personalize(greeting, name='Brian'):
...  """Replaces 'world' with a given name"""
...  return greeting.replace('world', name)
...
>>> personalize(x, 'Brian')
'hello, Brian!'

By the way, the stuff in the triple quotes is automatic documentation. A double underscore, also called a “dunder”, is to print the stuff in the triple quotes:

>>> print personalize.__doc__
Replaces 'world' with a given name

Loop over a list of functions and do that function to some data:

>>> funclist=[swapper, personalize, parts]
>>> for func in funclist:
...  func(x)
...
'HELLO, WORLD!'
'hello, Brian!'
('hello', ',', ' world!')

Lists

>>> v=range(1,10)
>>> v
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> v[1]
2
>>> v[5]
6
>>> v[-1]
9
>>> v[-3]
7

List slicing with “:”
>>> v[:2]
[1, 2]
>>> v[4:]
[5, 6, 7, 8, 9]
>>> v[4:9]
[5, 6, 7, 8, 9]
Note that there’s no error returned even though there’s no field 9. If you did v[9], you’d get an error:
>>> v[9]
Traceback (most recent call last):
File ““, line 1, in
IndexError: list index out of range

Python uses pointers (or pointer-like things) so v[1:-1] does not print the first and last values:

>>> v[1:-1]
[2, 3, 4, 5, 6, 7, 8]

The full array syntax is [start:end:index increment]:

>>> v[::2]
[1, 3, 5, 7, 9]
>>> v[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1]
>>> v[1:-1:4]
[2, 6]
>>> v[::3]
[1, 4, 7]

Make an array of numbers with range

>>> l=range(10)
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Make a list from another list

>>> [pow(num,2) for num in l]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

append appends to the end of a list

>>> l.append( [pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]]
>>> l.pop()
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

extend takes a sequence and puts it at the end of the array.

>>> l.extend([pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

A list can be made of a transformation, an iteration and optional filter:
[ i*i for i in mylist if i % 2 == 0]
transformation is i*i
iteration is for i in mylist
optional filter is if i % 2 == 0

>>> L=range(1,6)
>>> L
[1, 2, 3, 4, 5]
>>> [ i*i for i in L if i % 2 == 0]
[4, 16]

Tuples
Tuples are immutable lists, and they use () instead of []
A tuple always has 2 elements, so a one-item tuple is defined as
x=(1,)

Dictionaries aka associative arrays/hashes:

>>> d = {'user':'jonesy', 'room':'1178'}
>>> d
{'user': 'jonesy', 'room': '1178'}
>>> d['user']
'jonesy'
>>> d.keys()
['user', 'room']
>>> d.values()
['jonesy', '1178']
>>> d.items()
[('user', 'jonesy'), ('room', '1178')]
>>> d.items()[0]
('user', 'jonesy')
>>> d.items()[0][1]
'jonesy'
>>> d.items()[0][1].swapcase()
'JONESY'

There is no order to dictionaries, so don’t rely on it.

Quotes and string formatting
– You can use single and double quotes inside each other
– Inside triple quotes, you can use single and double quotes
– Variables are not recognized in strings, uses printf-style string formatting:

>>> word='World'
>>> punc='!'
>>> print "Hello, %s%s" % (word, punc)
Hello, World!

Braces, semicolons, indents
– Use indents instead of braces
– End-of-line instead of semicolons

if x == y:
print "x == y"
for k,v in mydict.iteritems():
if v is None:
continue
print "v has a value: %s" % v

This seems like it might be problematic because of long blocks of code, but apparently code blocks don’t get that long. You can also use folds in vim [now I need to look up what folds in vim are].

You can’t assign a value in a conditional statement’s expression — because you can’t use an = sign. This is on purpose, it avoids bugs resulting from typing if x=y instead of if x==y.

The construct has no place in production code anyway, since you give up catching any exceptions.

Python modules for sysadmins:
– sys
– os
– urlib/urlib2
– time, datetime (and calendar)
– fileinput
– stat
– filecmp
– glob (to use wildcards)
– shutil
– gzip
– tarfile
– hashlib, md5, crypt
– logging
– curses
– smtplib and email
– cmd

The Zen of Python
To get this, type ‘python’ in a unix environment, then type ‘import this’ at the commandline. I did this on my Windows laptop running Cygwin:

cabral@pythianbos2 ~
$ python
Python 2.5.2 (r252:60911, Dec  2 2008, 09:26:14)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

This was liveblogged, please let me know any issues, as they may be typos….

[author’s note: personally, store I use awk a bunch in MySQL DBA work, viagra for tasks like scrubbing data from a production export for use in qa/dev, but usually have to resort to Perl for really complex stuff, but now I know how to do .]

Basics:
By default, fields are separated by any number of spaces. The -F option to awk changes the separator on commandline.
Print the first field, fields are separated by a colon.
awk -F: '{print $1}' /etc/passwd

Print the first and fifth field:
awk -F: '{$print $1,$5}' /etc/passwd

Can pattern match and use files, so you can replace:
grep foo /etc/passwd | awk -F: '{print $1,$5}'
with:
awk -F: '/foo/ {print $1,$5}' /etc/passwd

NF = built in variable (no $) used to mean “field number”
This will print the first and last fields of lines where the first field matches “foo”
awk -F: '$1 ~/foo/ {print $1,$NF}' /etc/passwd

NF = number of fields, ie, “7″
$NF = value of last field, ie “/bin/bash”
(similarly, NR is record number)


Awk makes assumptions about input, variables, and processing that you’d otherwise have to code yourself.

– “main loop” of input processing is done for you
– awk initializes variables for you, to 0
– input is viewed by awk as ‘records’ which are splittable into ‘fields’

This all makes a lot of operations very concise in awk, many things can be done w/ a one-liner that would otherwise require several lines of code.

awk key points:
– splits text into fields
– default delimiter is “any number of spaces”
– reference fields
– $0 is entire line
– create filters using ‘addresses’ which can be regexps (similar to sed)
– Turing-complete language
– has if, while, for, do-while, etc
– built-in math like exp, log, rand, sin, cos
– built-in string sub, split, index, toupper/lower

Patterns and actions
Pattern is first, then action(s)
Actions are enclosed in {}

only a pattern, no action:
'length>42'
but, the default action is to print the whole line, so this will actually do something — print lines where the length of the line is > 42. strings are just arrays in awk

only action, no pattern:
{print $2,$1}
do this to all lines of input

NR % 3 == 0
print every third line (pattern is %NR mod 3)

{print $1, $NF, $(NF-1)}
print the first field, last field, and 2nd to last field

built-in variables
NF, NR we’ve done
FS = field separator (can be regexp)
OFMT = output format for numbers (default %.6g)

Patterns
– used to filter lines processed by awk
– can be regexp
/^root/ is the pattern in the following
awk -F:'/^root/ {print $1,$NF}' /etc/passwd

– Patterns can use fields and relational operators
To print 1st, 4th and last field if value of 4th field >10:
awk -F: '$4 > 10 {print $1, $4, $NF}' /etc/passwd

awk -F: '$0 !~ /^#/ && $4 > 10 {print $1, $4, $NF}' /etc/passwd

Range patterns
sed-like addressing : you can have start and end addresses
awk ‘NR==1,NR==3′
prints only first three lines of the file
You can use regular expressions in range patterns:
awk -F:’/^root/,/^daemon/ {print $1,$NF}’ /etc/passwd
start printing at the line that starts with “root”, the last line that is processed is the line starting with “daemon”

Range pattern “gotcha” – can’t mix a range with other patterns:
To do “start at non-commented line where value of $4 is less than 10, end at the first line where value of $4 is greater than 10″

This does not work!
awk -F: '$0 !~ /^#/ $4 <= 10, $4 > 10' /etc/passwd

This is how to do it, {next} is an action that skips:
awk -F: '$0 ~ /^#/ {next} $4 <= 10, $4 > 10 {print $1, $4' /etc/passwd

Basic Aggregation
awk -F: ‘$3 > 100 {x+=1; print x}’ /etc/passwd
This gives a line of output as each matching line is processed. This gives a running total of x.

awk -F: ‘$3 > 100 {x+=1} END {print x}’ /etc/passwd
This processes the “{print x}” action only after the entire file has been processed. This gives only the final value of x.

Arrays:
Support for regular arrays
Technically multi-dimensional arrays are not supported, but array indexes are not supported, so you can make your own associative arrays.

Example:
awk -F: ‘{x[$1] = $2*($4 – $3)} END {for(key in x) {print key, x[key]}}’ stocks.txt

The part before the END creates the associative array, the part after the END prints the array.

Extreme data munging:
awk -f: '{x[$1]=($2'($4 - $3))} END {for(z in x) {print z, x[z]}}' stocks.txt

ABC,100,12.14,19.12
FOO,100,24.01,17.45

output
BAR 271.5
ABC 698

For the line “ABC,100,12.14,19.12″
the function becomes

x[ABC] = 100 * (19.12 - 12.14) = 698

Aggregate across multiple variables:
awk -F, '{x[$1] = $2*($4 - $3); y+=x[$1]} END {for(z in x) {print z, x[z]}} {print "Net:"y}}' stocks.txt

Note that y is a running *sum* (not a running count like before).

Now, the above is hard to read, this is much easier.

#!/usr/bin/awk -f

BEGIN { FS="," }
{ x[$1] = $2*($4 - $3)
y+=x[$1]
}
END {
for(z in x) {
print z, x[z]
}
}  # end for loop
{
print "Net:"y
} # end END block

This was liveblogged, so please point out any issues, as they may be typos on my part….

The Beacon Pattern:
– This is a “Get out of the business” pattern
– Identify an oft-occurring and annoying task
– Automate and document it to the point of being able to hand it off to someone far less technical

Example:
– System admins were being put in charge of scheduling rooms in the building
– They wrote a PHP web application to help them automate the task
– They refined the app, cardiology and handed it off to a secretary
– They have to maintain the app, this web but it’s far less work.

The Community Pattern:

– Prior to launch of a new service, create user documentation for it.
– Point a few early adopters at the documentation and see if they can use the service with minimal support
– Use feedback to improve documentation, and the service
– Upon launch, create a mailing list, forum, IRC channel, or Jabber chat room and ask early adopters to help test it out.
– Upon launch, your early adopters are the community, and they’ll tell new users to use the tools you’ve provided instead of calling you.

Example:
– A beowulf cluster for an academic department
– Documented like crazy, early adopters were given early access to the cluster (demand was high)
– Crated a mailing list, early adopters were added to it with their consent, functionality was tested with them.
– Email announcing launch mentioned the early adopters in a ‘thank you’ secion, and linked them to their mailing list.

The DRY pattern
DRY = Don’t repeat yourself
Identify duplicate code in your automation scripts
Put subroutines that exist in an include file, and include them in your scripts.

Example:
– “sysadmin library”
– /var/lib/adm/.*pl
– Elapsed time and # of lines to script a task for which the library was useful plunged dramatically
– new tasks were thought up that were not considered before but were obvious now (ie, users that want to change their username)
– migrating to new services became much easier

The Chameleon Pattern
– Identify commonalities among your services
– Leverage those to create “Chameleon” servers that can be re-purposed on the fly
– Abstract as much of this away from the physical hardware
– Doesn’t need to involve virtualization, though it’s awfully handy if you can do it that way.
[this one is a bit harder to do with MySQL config files]

Example:
[puppet/cfengine were mentioned…]
ldapconfig.py – more than a script: a methodology

– But isn’t installing packages you don’t need bad? Depends on the package….ie, gcc is bad for enterprise

“Junior annoynances”

Terminal issues

Junior:
open terminal, login to machine1
think issue is with machine2, talks to machine1.
log out of machine1
log into machine2

Senior:
opens 2 terminals each of machine1 and machine2 to start

Junior:
networking issue ticket arrives
logs into server
runs tcpdump

Senior:
networking issue ticket arrives
logs into server
looks at logs

“Fix” vs. “Solution” ie “taking orders”
Junior will try fix a problem, senior will try to figure out what the problem is. ie, “I need a samba directory mounted under an NFS mount” a junior admin will try to do exactly that, a senior admin will ask “what are you trying to do with that?” because maybe all they need is a symlink.

Fanboyism
Signs you might be a fanboy:
– Disparaging users of latest stable release of $THING for not using the nightly (unstable) build which fixes more issues
– Creating false/invalid comparisons based on popular opinion instead of experience/facts
– Going against internal standards, breaking environmental consistency, to use $THING instead of $STANDARD (but this is also how disruptive technology works)
– Being in complete denial that most technology at some point or another stinks.
– Evaluating solutions based on “I like” instead of “we need” and “this does”.

Liveblog of the Professional IT Community Conference session Mentoring: It’s for everyone

Ways to learn:
Audio
Visual
Kinetic (doing it)

Everyone learns differently, cialis sale but most people learn with some combination of all these three.

However, help you can also learn by training [that’s the truth, otolaryngologist I learned a LOT by writing the book, even things I knew, I ended up needing to research more].

Ways to train:
Explanation
Observation
Demonstration
Questioning (Socratic Method)

What is a mentor?
noun: experienced and trusted adviser. It’s not just someone who teaches, it’s someone who advises.
experienced person in a company, college or schools who trains and counsels new employees or students.

verb: to advise or train (someone, esp. a younger colleague).

A mentorship is a safe place to ask questions.

A mentor is a trainer, but a trainer who also is a professional advisor.

Finding a mentor
Someone…..
– you respect/admire
– works with similar technology
– has a compatible personality
– you have a good rapport with

Being a mentor
– Teach technical skills
– Provide advanced technical/design guidance
– Model and teach professional skills
– Be interested and invested in the [student’s] career

I am moderating and liveblogging the Professional IT Community Conference panel called Tech Women Rule! Creative Solutions for being a (or working with a) female technologist.

One point to keep in mind: The goal is not equality for equality’s sake. The goal is to have a diverse range of experience to make your company/project/whatever the best it could be.

That being said, tadalafil these issues are not just around women; they are about anyone who is “different”, whether it’s race, ethnicity, gender, sexual orientation, cultural.

So what are some of the solutions?

0) Better align expectations with reality. Are you expecting more from someone who is one gender than another? If a woman makes a mistake is it worse because she has to prove herself? Is it worse because she is representative of her gender? If she does something good is the achievement elevated more because of her gender? Either is bad.

1) Respect people for who they are. Everyone deserves respect; if someone is not at your technical level, they still deserve respect.

If someone says something that is completely wrong from a technical perspective, do not assume that they have no idea what they are talking about. It could be that they are the exact case in which that technical scenario is appropriate for them. If they are correct, your attitude will be refreshing and you might learn something. If they are indeed wrong, ask them about a scenario in which their thinking falls apart, or otherwise guide them through learning why what they are saying is wrong.

2) Be nice. Don’t condescend.

3) Be helpful. “RTFM, n00b!” is not helpful, and certainly does not follow rule #2.

4) Don’t do #1-3 for women only. Don’t treat women nicely because they’re women, and be a jerk to men because they’re men. Being helpful is good for anyone, not just women.

5) Cooperate, do not compete. Whether you are co-workers, working together on a software project, or just in a conversation, the game of “one-upping” another is a lot less useful than working together.

6) When hiring or when in an interview, concentrate on skills, not knowledge. “Skills” refers to attributes such as their ability to listen, how judgmental they are about a legacy system, whether they are open to new ideas, whether they disdain anything that is not cutting edge, and even technical skills such as thinking patterns, research patterns, algorithms, etc.

If someone says “I don’t know” in an interview, ask them “how would you go about figuring it out?” If someone says “I think it’s x and y” ask “how would you confirm/test that?” If a backup failed, do they start the backup over or do they try to figure out why it failed?

Are they thorough? Do they follow through? It is a lot easier to teach knowledge than it is to teach something like “debugging skills”.

7) Specifically encourage people to speak up. Train yourself to NOTICE when folks are not speaking up, and ask them if they have any suggestions or ideas.

8) If you are running an IT conference, specifically ask qualified women you know to speak, not about “women in IT”. If you hear of an IT conference, tell specific women you know that you think they would be a great speaker. Get women to speak at local user groups to get practice in a less intimidating space.

Resources:
Read HOWTO Encourage Women in Linux

You Just Don’t Understand
The Male Mind at Work
How to succeed in business without a penis: a guide for working women (this is a humor book)

Join and/or send messages to Systers, the world’s largest e-mail list of women in computing and technology fields.

In his advice is “do not change sort_buffer_size from the default.”

Baron did not explain the logic behind his reasoning, information pills he handwaves that “people utterly ruin their server performance and stability with it, check ” but does not explain how changing the sort_buffer_size kills performance and stability. Regardless of how respected and knowledgeable the source, NEVER take any advice that tells you what to do or how to do it without understanding WHY.

This article will explain the “why” of Baron’s point, and it will also talk more about understanding why, an integral part against the “Battle against any guess.” Baron’s recommendation to leave sort_buffer_size as the default is just as bad as all the advice given to change the sort_buffer_size, because all that advice (including Baron’s) does not explain the underlying causes.

First, I explain the sort_buffer_size issue. The sort buffer size, as the name implies, is a memory buffer used when ordering is needed (usually for GROUP BY and ORDER BY clauses, when the index used for the filter/join does not follow the GROUP/ORDER BY order). Increasing the sort_buffer_size means allowing more memory to be used for the sorting process.

Increasing the sort_buffer_size usually improves performance because more memory is used in sorting. It can be detrimental to performance because the full size of the sort buffer is allocated for each thread that needs to do a sort, even if that sort does not need a very large sort buffer.

A better optimization would be to change the schema and/or queries so that all that sorting is not necessary. Increasing the sort_buffer_size gives you a false sense of security that your server is performing better. Your server is performing the same tasks, only faster — the best optimization is to make the tasks smaller or eliminate some tasks. If you can have queries without so much sorting, that’s a much better optimization than changing sort_buffer_size.

That being said, increasing the sort_buffer_size is a perfectly acceptable stop-gap solution that can be implemented RIGHT NOW (it’s a dynamic variable), while you examine your queries by doing a query review with a tool such as mk-query-digest. This is indeed what Pythian does — and, by the way, not only do we recommend that course of action, but we explain it to you and help you find and optimize the queries in question.

That all assumes that having lots of sorts that require lots of memory is a bad thing. It may be that you have tuned your queries and schema such that you have eliminated as many sorts as you can, but some may remain. An intensive data mining server is a good example of a situation in which permanently increasing the sort_buffer_size may be the right solution.

Now that we have the specifics of this situation out of the way, let’s look at the Battle Against Any Guess. This is a movement against guessing games. Understanding what you are doing is essential; in the case of sort_buffer_size, you can believe that you know what you are doing by increasing sort_buffer_size. However, the real solution to the problem lies in changing the queries, not changing the memory patterns.

There is a 6-page description of the “Battle against any guess” in the Northern California Oracle User Group’s May Journal, starting on page 13. The examples are specific to Oracle, but the points made are sound even if you do not know Oracle well. For example:

Blindly implementing best practices is nothing different from guesswork; we are applying some past-proven solutions without measuring how they stand against our requirements, and without testing whether they bring us any closer to the targets we have. Industry has become so obsessed with best practices that we commonly see projects in which reviewing an environment for compliance with best practices is the ultimate goal.

One good reason you need to know *why* is also mentioned in the article: The second danger of best practices is that they easily become myths. The technology keeps improving and issues addressed by certain best practices might not be relevant anymore in the next software version.

So, even from respected folks like Baron or myself, do not take advice on face value. Ask why, understand why, and then think if there is another level. It is not always easy; often you think you understand but really you miss that other level – such as with sort_buffer_size.

In but there are many negatives.

I have a hard time trying to figure out the deeper meaning behind Peter’s post, allergist given that Percona writes a storage engine for MySQL, information pills XtraDB. Does this mean that Percona will stop developing XtraDB? Does this mean that the Percona Server will diverge farther and farther away from MySQL so that they’re not compatible any more and migrating from MySQL to Percona Server is very difficult?

Or maybe it’s just that Peter is saying one thing and doing the opposite; which just seems wrong because that would be blatant hypocrisy on Percona’s part.

(This idea was a comment on the blog post but seems to be trapped in the spam filter, so I’m posting it; apologies if the comment comes through eventually….)

My own opinion of the issue: Peter is factually correct with what he says. However, it’s nice to have the framework and be allowed to use more than one storage engine, or use exclusively one storage engine that’s not MyISAM.

On Monday, this web doctor his thoughts on Tuesday.

We have confirmed that there will be an entire MySQL track at Kaleidoscope! Because Kaleidoscope is less than 8 weeks away, dosage we could not go through a standard call for papers. Ronald and I have been working to come up with appropriate topics and speakers for an audience that uses MySQL but is probably more familiar with Oracle. We contacted folks we thought would be interested, pharmacy and who we thought could make it logistically, as the conference is in Washington, D.C.

We have (almost) finalized the list of speakers; the session abstracts will be finalized in the next few days. You can see the speakers at Kaleidoscope’s MySQL page, but I’ve also listed them below (alpha by last name):

Philip Antoniades, Sun/MySQL
Ronald Bradford, 42SQL
Sheeri K. Cabral, The Pythian Group
Laine Campbell, PalominoDB
Patrick Galbraith, Northscale
Sarah Novotny, Blue Gecko
Padraig O’Sullivan, Akiba Technologies Inc.
Jay Pipes, Rackspace Cloud
Dossy Shiobara, Panoptic.com
Matt Yonkovit, Percona

There are one or two more speakers we are waiting to hear back from. There will be 19 sessions, so some speakers will have more than one session.

I am very excited that MySQL has its own track at Kaleidoscope. In addition, Ronald and I will be able to attend our very first event as Oracle ACE Directors – the Sundown Sessions are a Birds-of-a-Feather-type discussion, with the Oracle ACE Directors being the panelists and the community asking questions. Immediately after the Sundown Sessions is a “Meet the Oracle ACE” event, the only part of the conference officially sponsored by Oracle.

I am attending the tomorrow.)

So I am in Seeking Senior and Beyond: The Tech Skills That Get You Promoted. The first part talks about the definition of what it means to be senior, nurse and it completely relates to DBA work:
works and plays well with other
understands “ability”
leads by example
lives to share knowledge
understands “Service”
thoughtful of the consequences of their actions
understands projects
cool under pressure

Good Qualities:
confident
empathetic
humane
personal
forthright
respectful
thorough

Bad Qualities:
disrespective
insensitive
incompetent
[my own addition – no follow through, dosage lack of attention to detail]

The Dice/Monster Factor – what do job sites see as important for a senior position?

They back up the SAGE 5-year experience requirement
Ability to code in newer languages (Ruby/Python) is more prevalent (perhaps cloud-induced?)

The cloud allows sysadmin tasks to be done by anyone…..so developers can do sysadmin work, and you end up seeing schizophrenic job descriptions such as

About the 5-year requirement:
– Senior after 5 years? What happens after 10 years?
– Most electricians, by comparison, haven’t even completed an *apprenticeship* in 5 years.

Senior Administrators Code
– not just 20-line shell scripts
– coding skills are part of a sysadmin skill
– ability to code competently *is* a factor that separates juniors from seniors
– hiring managers expect senior admins to be competent coders.

If you are not a coder
– pick a language, any language
– do not listen to fans, find one that fits how you think, they all work…..
– …that being said, some languages are more practical than others (ie, .NET probably is not the best language to learn if you are a Unix sysadmin).

Popular admin languages:
– Perl: classic admin scripting language. Learn at least the basics, because you will see it in any environment that has been around for more than 5 years.

– Ruby: object-oriented language for people who mostly like Perl (except for its OO implementation)

– Python: object-oriented language for people who mostly hate Perl, objects or no objects. For example, you don’t have to create a String object to send an output.

But what if you do not have time to learn how to program?

– senior admins are better at managing their time than junior admins, so perhaps managing time
– time management means you’ll have more time to do things, it doesn’t mean all work work work.
– Read Time Management for System Administrators – there is Google Video of a presentation by the author, Tom Limoncelli.

Consider “The Cloud”
– starting to use developer APIs to perform sysadmin tasks, so learning programming is good.
– still growing, could supplant large portions of datacenter real estate
– a coder with sysadmin knowledge: Good
– a sysadmin with coding knowledge: Good
– a coder without sysadmin knowledge: OK
– a sysadmin with no coding interest/experience: Tough place to be in

Senior Admins Have Problems Too
Many don’t document or share knowledge
Maany don’t do a good job keeping up with their craft
Cannot always be highlighted as an example of how to deal with clients
Often reinvent the wheel – also usually there is no repository
Often don’t progress beyond the “senior admin” role

….on the other hand…..
cynicism can be good…..

Advice:
learn from the good traits
observe how others respond to their bad traits
think about how you might improve upon that
strive to work and play well with others, even if you don’t have a mentor for good/bad examples.

Now he’s going into talking about Patterns in System Administration….

Why Python?

– Low WTF per minute factor
– Passes the 6-month test (if you write python code, here going back in 6 months, prostate you pretty much know what you were trying to do)
– Small Shift/no-Shift ratio (ie, you use the “Shift” key a lot in Perl because you use $ % ( ) { } etc, so you can tell what something is by context, not by $ or %)
– It’s hard to make a mess
– Objects if you need them, ignore them if you don’t.


Basics
Here’s a sample interpreter session. The >>> is the python prompt, and the … is the second/subsequent line prompt:

>>> x='hello, world!';
>>> x.upper()
'HELLO, WORLD!'
>>> def swapper(mystr):
... return mystr.swapcase()
File "<stdin>", line 2
return mystr.swapcase()
^
IndentationError: expected an indented block

You need to put a space on the second line because whitespace ‘tabbing’ is enforced in Python:

>>> def swapper(mystr):
...  return mystr.swapcase()
...
>>> swapper(x)
'HELLO, WORLD!'
>>> x
'hello, world!'

Substrings
partition is how to get substrings based on a separator:

>>> def parts(mystr, sep=','):
...  return mystr.partition(sep)
...
>>> parts(x, ',')
('hello', ',', ' world!')

You can replace text, too, using replace.

>>> def personalize(greeting, name='Brian'):
...  """Replaces 'world' with a given name"""
...  return greeting.replace('world', name)
...
>>> personalize(x, 'Brian')
'hello, Brian!'

By the way, the stuff in the triple quotes is automatic documentation. A double underscore, also called a “dunder”, is to print the stuff in the triple quotes:

>>> print personalize.__doc__
Replaces 'world' with a given name

Loop over a list of functions and do that function to some data:

>>> funclist=[swapper, personalize, parts]
>>> for func in funclist:
...  func(x)
...
'HELLO, WORLD!'
'hello, Brian!'
('hello', ',', ' world!')

Lists

>>> v=range(1,10)
>>> v
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> v[1]
2
>>> v[5]
6
>>> v[-1]
9
>>> v[-3]
7

List slicing with “:”
>>> v[:2]
[1, 2]
>>> v[4:]
[5, 6, 7, 8, 9]
>>> v[4:9]
[5, 6, 7, 8, 9]
Note that there’s no error returned even though there’s no field 9. If you did v[9], you’d get an error:
>>> v[9]
Traceback (most recent call last):
File ““, line 1, in
IndexError: list index out of range

Python uses pointers (or pointer-like things) so v[1:-1] does not print the first and last values:

>>> v[1:-1]
[2, 3, 4, 5, 6, 7, 8]

The full array syntax is [start:end:index increment]:

>>> v[::2]
[1, 3, 5, 7, 9]
>>> v[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1]
>>> v[1:-1:4]
[2, 6]
>>> v[::3]
[1, 4, 7]

Make an array of numbers with range

>>> l=range(10)
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Make a list from another list

>>> [pow(num,2) for num in l]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

append appends to the end of a list

>>> l.append( [pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]]
>>> l.pop()
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

extend takes a sequence and puts it at the end of the array.

>>> l.extend([pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

A list can be made of a transformation, an iteration and optional filter:
[ i*i for i in mylist if i % 2 == 0]
transformation is i*i
iteration is for i in mylist
optional filter is if i % 2 == 0

>>> L=range(1,6)
>>> L
[1, 2, 3, 4, 5]
>>> [ i*i for i in L if i % 2 == 0]
[4, 16]

Tuples
Tuples are immutable lists, and they use () instead of []
A tuple always has 2 elements, so a one-item tuple is defined as
x=(1,)

Dictionaries aka associative arrays/hashes:

>>> d = {'user':'jonesy', 'room':'1178'}
>>> d
{'user': 'jonesy', 'room': '1178'}
>>> d['user']
'jonesy'
>>> d.keys()
['user', 'room']
>>> d.values()
['jonesy', '1178']
>>> d.items()
[('user', 'jonesy'), ('room', '1178')]
>>> d.items()[0]
('user', 'jonesy')
>>> d.items()[0][1]
'jonesy'
>>> d.items()[0][1].swapcase()
'JONESY'

There is no order to dictionaries, so don’t rely on it.

Quotes and string formatting
– You can use single and double quotes inside each other
– Inside triple quotes, you can use single and double quotes
– Variables are not recognized in strings, uses printf-style string formatting:

>>> word='World'
>>> punc='!'
>>> print "Hello, %s%s" % (word, punc)
Hello, World!

Braces, semicolons, indents
– Use indents instead of braces
– End-of-line instead of semicolons

if x == y:
print "x == y"
for k,v in mydict.iteritems():
if v is None:
continue
print "v has a value: %s" % v

This seems like it might be problematic because of long blocks of code, but apparently code blocks don’t get that long. You can also use folds in vim [now I need to look up what folds in vim are].

You can’t assign a value in a conditional statement’s expression — because you can’t use an = sign. This is on purpose, it avoids bugs resulting from typing if x=y instead of if x==y.

The construct has no place in production code anyway, since you give up catching any exceptions.

Python modules for sysadmins:
– sys
– os
– urlib/urlib2
– time, datetime (and calendar)
– fileinput
– stat
– filecmp
– glob (to use wildcards)
– shutil
– gzip
– tarfile
– hashlib, md5, crypt
– logging
– curses
– smtplib and email
– cmd

The Zen of Python
To get this, type ‘python’ in a unix environment, then type ‘import this’ at the commandline. I did this on my Windows laptop running Cygwin:

cabral@pythianbos2 ~
$ python
Python 2.5.2 (r252:60911, Dec  2 2008, 09:26:14)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

This was liveblogged, please let me know any issues, as they may be typos….

[author’s note: personally, store I use awk a bunch in MySQL DBA work, viagra for tasks like scrubbing data from a production export for use in qa/dev, but usually have to resort to Perl for really complex stuff, but now I know how to do .]

Basics:
By default, fields are separated by any number of spaces. The -F option to awk changes the separator on commandline.
Print the first field, fields are separated by a colon.
awk -F: '{print $1}' /etc/passwd

Print the first and fifth field:
awk -F: '{$print $1,$5}' /etc/passwd

Can pattern match and use files, so you can replace:
grep foo /etc/passwd | awk -F: '{print $1,$5}'
with:
awk -F: '/foo/ {print $1,$5}' /etc/passwd

NF = built in variable (no $) used to mean “field number”
This will print the first and last fields of lines where the first field matches “foo”
awk -F: '$1 ~/foo/ {print $1,$NF}' /etc/passwd

NF = number of fields, ie, “7″
$NF = value of last field, ie “/bin/bash”
(similarly, NR is record number)


Awk makes assumptions about input, variables, and processing that you’d otherwise have to code yourself.

– “main loop” of input processing is done for you
– awk initializes variables for you, to 0
– input is viewed by awk as ‘records’ which are splittable into ‘fields’

This all makes a lot of operations very concise in awk, many things can be done w/ a one-liner that would otherwise require several lines of code.

awk key points:
– splits text into fields
– default delimiter is “any number of spaces”
– reference fields
– $0 is entire line
– create filters using ‘addresses’ which can be regexps (similar to sed)
– Turing-complete language
– has if, while, for, do-while, etc
– built-in math like exp, log, rand, sin, cos
– built-in string sub, split, index, toupper/lower

Patterns and actions
Pattern is first, then action(s)
Actions are enclosed in {}

only a pattern, no action:
'length>42'
but, the default action is to print the whole line, so this will actually do something — print lines where the length of the line is > 42. strings are just arrays in awk

only action, no pattern:
{print $2,$1}
do this to all lines of input

NR % 3 == 0
print every third line (pattern is %NR mod 3)

{print $1, $NF, $(NF-1)}
print the first field, last field, and 2nd to last field

built-in variables
NF, NR we’ve done
FS = field separator (can be regexp)
OFMT = output format for numbers (default %.6g)

Patterns
– used to filter lines processed by awk
– can be regexp
/^root/ is the pattern in the following
awk -F:'/^root/ {print $1,$NF}' /etc/passwd

– Patterns can use fields and relational operators
To print 1st, 4th and last field if value of 4th field >10:
awk -F: '$4 > 10 {print $1, $4, $NF}' /etc/passwd

awk -F: '$0 !~ /^#/ && $4 > 10 {print $1, $4, $NF}' /etc/passwd

Range patterns
sed-like addressing : you can have start and end addresses
awk ‘NR==1,NR==3′
prints only first three lines of the file
You can use regular expressions in range patterns:
awk -F:’/^root/,/^daemon/ {print $1,$NF}’ /etc/passwd
start printing at the line that starts with “root”, the last line that is processed is the line starting with “daemon”

Range pattern “gotcha” – can’t mix a range with other patterns:
To do “start at non-commented line where value of $4 is less than 10, end at the first line where value of $4 is greater than 10″

This does not work!
awk -F: '$0 !~ /^#/ $4 <= 10, $4 > 10' /etc/passwd

This is how to do it, {next} is an action that skips:
awk -F: '$0 ~ /^#/ {next} $4 <= 10, $4 > 10 {print $1, $4' /etc/passwd

Basic Aggregation
awk -F: ‘$3 > 100 {x+=1; print x}’ /etc/passwd
This gives a line of output as each matching line is processed. This gives a running total of x.

awk -F: ‘$3 > 100 {x+=1} END {print x}’ /etc/passwd
This processes the “{print x}” action only after the entire file has been processed. This gives only the final value of x.

Arrays:
Support for regular arrays
Technically multi-dimensional arrays are not supported, but array indexes are not supported, so you can make your own associative arrays.

Example:
awk -F: ‘{x[$1] = $2*($4 – $3)} END {for(key in x) {print key, x[key]}}’ stocks.txt

The part before the END creates the associative array, the part after the END prints the array.

Extreme data munging:
awk -f: '{x[$1]=($2'($4 - $3))} END {for(z in x) {print z, x[z]}}' stocks.txt

ABC,100,12.14,19.12
FOO,100,24.01,17.45

output
BAR 271.5
ABC 698

For the line “ABC,100,12.14,19.12″
the function becomes

x[ABC] = 100 * (19.12 - 12.14) = 698

Aggregate across multiple variables:
awk -F, '{x[$1] = $2*($4 - $3); y+=x[$1]} END {for(z in x) {print z, x[z]}} {print "Net:"y}}' stocks.txt

Note that y is a running *sum* (not a running count like before).

Now, the above is hard to read, this is much easier.

#!/usr/bin/awk -f

BEGIN { FS="," }
{ x[$1] = $2*($4 - $3)
y+=x[$1]
}
END {
for(z in x) {
print z, x[z]
}
}  # end for loop
{
print "Net:"y
} # end END block

This was liveblogged, so please point out any issues, as they may be typos on my part….

The Beacon Pattern:
– This is a “Get out of the business” pattern
– Identify an oft-occurring and annoying task
– Automate and document it to the point of being able to hand it off to someone far less technical

Example:
– System admins were being put in charge of scheduling rooms in the building
– They wrote a PHP web application to help them automate the task
– They refined the app, cardiology and handed it off to a secretary
– They have to maintain the app, this web but it’s far less work.

The Community Pattern:

– Prior to launch of a new service, create user documentation for it.
– Point a few early adopters at the documentation and see if they can use the service with minimal support
– Use feedback to improve documentation, and the service
– Upon launch, create a mailing list, forum, IRC channel, or Jabber chat room and ask early adopters to help test it out.
– Upon launch, your early adopters are the community, and they’ll tell new users to use the tools you’ve provided instead of calling you.

Example:
– A beowulf cluster for an academic department
– Documented like crazy, early adopters were given early access to the cluster (demand was high)
– Crated a mailing list, early adopters were added to it with their consent, functionality was tested with them.
– Email announcing launch mentioned the early adopters in a ‘thank you’ secion, and linked them to their mailing list.

The DRY pattern
DRY = Don’t repeat yourself
Identify duplicate code in your automation scripts
Put subroutines that exist in an include file, and include them in your scripts.

Example:
– “sysadmin library”
– /var/lib/adm/.*pl
– Elapsed time and # of lines to script a task for which the library was useful plunged dramatically
– new tasks were thought up that were not considered before but were obvious now (ie, users that want to change their username)
– migrating to new services became much easier

The Chameleon Pattern
– Identify commonalities among your services
– Leverage those to create “Chameleon” servers that can be re-purposed on the fly
– Abstract as much of this away from the physical hardware
– Doesn’t need to involve virtualization, though it’s awfully handy if you can do it that way.
[this one is a bit harder to do with MySQL config files]

Example:
[puppet/cfengine were mentioned…]
ldapconfig.py – more than a script: a methodology

– But isn’t installing packages you don’t need bad? Depends on the package….ie, gcc is bad for enterprise

“Junior annoynances”

Terminal issues

Junior:
open terminal, login to machine1
think issue is with machine2, talks to machine1.
log out of machine1
log into machine2

Senior:
opens 2 terminals each of machine1 and machine2 to start

Junior:
networking issue ticket arrives
logs into server
runs tcpdump

Senior:
networking issue ticket arrives
logs into server
looks at logs

“Fix” vs. “Solution” ie “taking orders”
Junior will try fix a problem, senior will try to figure out what the problem is. ie, “I need a samba directory mounted under an NFS mount” a junior admin will try to do exactly that, a senior admin will ask “what are you trying to do with that?” because maybe all they need is a symlink.

Fanboyism
Signs you might be a fanboy:
– Disparaging users of latest stable release of $THING for not using the nightly (unstable) build which fixes more issues
– Creating false/invalid comparisons based on popular opinion instead of experience/facts
– Going against internal standards, breaking environmental consistency, to use $THING instead of $STANDARD (but this is also how disruptive technology works)
– Being in complete denial that most technology at some point or another stinks.
– Evaluating solutions based on “I like” instead of “we need” and “this does”.

Liveblog of the Professional IT Community Conference session Mentoring: It’s for everyone

Ways to learn:
Audio
Visual
Kinetic (doing it)

Everyone learns differently, cialis sale but most people learn with some combination of all these three.

However, help you can also learn by training [that’s the truth, otolaryngologist I learned a LOT by writing the book, even things I knew, I ended up needing to research more].

Ways to train:
Explanation
Observation
Demonstration
Questioning (Socratic Method)

What is a mentor?
noun: experienced and trusted adviser. It’s not just someone who teaches, it’s someone who advises.
experienced person in a company, college or schools who trains and counsels new employees or students.

verb: to advise or train (someone, esp. a younger colleague).

A mentorship is a safe place to ask questions.

A mentor is a trainer, but a trainer who also is a professional advisor.

Finding a mentor
Someone…..
– you respect/admire
– works with similar technology
– has a compatible personality
– you have a good rapport with

Being a mentor
– Teach technical skills
– Provide advanced technical/design guidance
– Model and teach professional skills
– Be interested and invested in the [student’s] career

I am moderating and liveblogging the Professional IT Community Conference panel called Tech Women Rule! Creative Solutions for being a (or working with a) female technologist.

One point to keep in mind: The goal is not equality for equality’s sake. The goal is to have a diverse range of experience to make your company/project/whatever the best it could be.

That being said, tadalafil these issues are not just around women; they are about anyone who is “different”, whether it’s race, ethnicity, gender, sexual orientation, cultural.

So what are some of the solutions?

0) Better align expectations with reality. Are you expecting more from someone who is one gender than another? If a woman makes a mistake is it worse because she has to prove herself? Is it worse because she is representative of her gender? If she does something good is the achievement elevated more because of her gender? Either is bad.

1) Respect people for who they are. Everyone deserves respect; if someone is not at your technical level, they still deserve respect.

If someone says something that is completely wrong from a technical perspective, do not assume that they have no idea what they are talking about. It could be that they are the exact case in which that technical scenario is appropriate for them. If they are correct, your attitude will be refreshing and you might learn something. If they are indeed wrong, ask them about a scenario in which their thinking falls apart, or otherwise guide them through learning why what they are saying is wrong.

2) Be nice. Don’t condescend.

3) Be helpful. “RTFM, n00b!” is not helpful, and certainly does not follow rule #2.

4) Don’t do #1-3 for women only. Don’t treat women nicely because they’re women, and be a jerk to men because they’re men. Being helpful is good for anyone, not just women.

5) Cooperate, do not compete. Whether you are co-workers, working together on a software project, or just in a conversation, the game of “one-upping” another is a lot less useful than working together.

6) When hiring or when in an interview, concentrate on skills, not knowledge. “Skills” refers to attributes such as their ability to listen, how judgmental they are about a legacy system, whether they are open to new ideas, whether they disdain anything that is not cutting edge, and even technical skills such as thinking patterns, research patterns, algorithms, etc.

If someone says “I don’t know” in an interview, ask them “how would you go about figuring it out?” If someone says “I think it’s x and y” ask “how would you confirm/test that?” If a backup failed, do they start the backup over or do they try to figure out why it failed?

Are they thorough? Do they follow through? It is a lot easier to teach knowledge than it is to teach something like “debugging skills”.

7) Specifically encourage people to speak up. Train yourself to NOTICE when folks are not speaking up, and ask them if they have any suggestions or ideas.

8) If you are running an IT conference, specifically ask qualified women you know to speak, not about “women in IT”. If you hear of an IT conference, tell specific women you know that you think they would be a great speaker. Get women to speak at local user groups to get practice in a less intimidating space.

Resources:
Read HOWTO Encourage Women in Linux

You Just Don’t Understand
The Male Mind at Work
How to succeed in business without a penis: a guide for working women (this is a humor book)

Join and/or send messages to Systers, the world’s largest e-mail list of women in computing and technology fields.

In his advice is “do not change sort_buffer_size from the default.”

Baron did not explain the logic behind his reasoning, information pills he handwaves that “people utterly ruin their server performance and stability with it, check ” but does not explain how changing the sort_buffer_size kills performance and stability. Regardless of how respected and knowledgeable the source, NEVER take any advice that tells you what to do or how to do it without understanding WHY.

This article will explain the “why” of Baron’s point, and it will also talk more about understanding why, an integral part against the “Battle against any guess.” Baron’s recommendation to leave sort_buffer_size as the default is just as bad as all the advice given to change the sort_buffer_size, because all that advice (including Baron’s) does not explain the underlying causes.

First, I explain the sort_buffer_size issue. The sort buffer size, as the name implies, is a memory buffer used when ordering is needed (usually for GROUP BY and ORDER BY clauses, when the index used for the filter/join does not follow the GROUP/ORDER BY order). Increasing the sort_buffer_size means allowing more memory to be used for the sorting process.

Increasing the sort_buffer_size usually improves performance because more memory is used in sorting. It can be detrimental to performance because the full size of the sort buffer is allocated for each thread that needs to do a sort, even if that sort does not need a very large sort buffer.

A better optimization would be to change the schema and/or queries so that all that sorting is not necessary. Increasing the sort_buffer_size gives you a false sense of security that your server is performing better. Your server is performing the same tasks, only faster — the best optimization is to make the tasks smaller or eliminate some tasks. If you can have queries without so much sorting, that’s a much better optimization than changing sort_buffer_size.

That being said, increasing the sort_buffer_size is a perfectly acceptable stop-gap solution that can be implemented RIGHT NOW (it’s a dynamic variable), while you examine your queries by doing a query review with a tool such as mk-query-digest. This is indeed what Pythian does — and, by the way, not only do we recommend that course of action, but we explain it to you and help you find and optimize the queries in question.

That all assumes that having lots of sorts that require lots of memory is a bad thing. It may be that you have tuned your queries and schema such that you have eliminated as many sorts as you can, but some may remain. An intensive data mining server is a good example of a situation in which permanently increasing the sort_buffer_size may be the right solution.

Now that we have the specifics of this situation out of the way, let’s look at the Battle Against Any Guess. This is a movement against guessing games. Understanding what you are doing is essential; in the case of sort_buffer_size, you can believe that you know what you are doing by increasing sort_buffer_size. However, the real solution to the problem lies in changing the queries, not changing the memory patterns.

There is a 6-page description of the “Battle against any guess” in the Northern California Oracle User Group’s May Journal, starting on page 13. The examples are specific to Oracle, but the points made are sound even if you do not know Oracle well. For example:

Blindly implementing best practices is nothing different from guesswork; we are applying some past-proven solutions without measuring how they stand against our requirements, and without testing whether they bring us any closer to the targets we have. Industry has become so obsessed with best practices that we commonly see projects in which reviewing an environment for compliance with best practices is the ultimate goal.

One good reason you need to know *why* is also mentioned in the article: The second danger of best practices is that they easily become myths. The technology keeps improving and issues addressed by certain best practices might not be relevant anymore in the next software version.

So, even from respected folks like Baron or myself, do not take advice on face value. Ask why, understand why, and then think if there is another level. It is not always easy; often you think you understand but really you miss that other level – such as with sort_buffer_size.

In but there are many negatives.

I have a hard time trying to figure out the deeper meaning behind Peter’s post, allergist given that Percona writes a storage engine for MySQL, information pills XtraDB. Does this mean that Percona will stop developing XtraDB? Does this mean that the Percona Server will diverge farther and farther away from MySQL so that they’re not compatible any more and migrating from MySQL to Percona Server is very difficult?

Or maybe it’s just that Peter is saying one thing and doing the opposite; which just seems wrong because that would be blatant hypocrisy on Percona’s part.

(This idea was a comment on the blog post but seems to be trapped in the spam filter, so I’m posting it; apologies if the comment comes through eventually….)

My own opinion of the issue: Peter is factually correct with what he says. However, it’s nice to have the framework and be allowed to use more than one storage engine, or use exclusively one storage engine that’s not MyISAM.

I have been talking with a group of folks who have been making a product that has lots of free functionality, bronchi seeing and managing the MySQL config file (/etc/my.cnf), approved seeing and managing accounts, a small dashboard of overall health graphs, and more.

With this free tool you can look at and manage local and remote databases. It supports ssh tunneling, including ssh using password-protected ssh keys. It’s pretty neat, and I have been working with the product manager to add features. I think this took will become the de facto standard for centralized GUI administration of MySQL.

The tool is
MySQL workbench….Surprise! One of the best new features for the administrator is that you can now create an administration connection for an existing workbench connection with a click of a button, instead of having to enter in all that information again.

I use the “developer” version, 5.2.21. Note that the 5.1 version does not have administration capabilities.

On Monday, this web doctor his thoughts on Tuesday.

We have confirmed that there will be an entire MySQL track at Kaleidoscope! Because Kaleidoscope is less than 8 weeks away, dosage we could not go through a standard call for papers. Ronald and I have been working to come up with appropriate topics and speakers for an audience that uses MySQL but is probably more familiar with Oracle. We contacted folks we thought would be interested, pharmacy and who we thought could make it logistically, as the conference is in Washington, D.C.

We have (almost) finalized the list of speakers; the session abstracts will be finalized in the next few days. You can see the speakers at Kaleidoscope’s MySQL page, but I’ve also listed them below (alpha by last name):

Philip Antoniades, Sun/MySQL
Ronald Bradford, 42SQL
Sheeri K. Cabral, The Pythian Group
Laine Campbell, PalominoDB
Patrick Galbraith, Northscale
Sarah Novotny, Blue Gecko
Padraig O’Sullivan, Akiba Technologies Inc.
Jay Pipes, Rackspace Cloud
Dossy Shiobara, Panoptic.com
Matt Yonkovit, Percona

There are one or two more speakers we are waiting to hear back from. There will be 19 sessions, so some speakers will have more than one session.

I am very excited that MySQL has its own track at Kaleidoscope. In addition, Ronald and I will be able to attend our very first event as Oracle ACE Directors – the Sundown Sessions are a Birds-of-a-Feather-type discussion, with the Oracle ACE Directors being the panelists and the community asking questions. Immediately after the Sundown Sessions is a “Meet the Oracle ACE” event, the only part of the conference officially sponsored by Oracle.

I am attending the tomorrow.)

So I am in Seeking Senior and Beyond: The Tech Skills That Get You Promoted. The first part talks about the definition of what it means to be senior, nurse and it completely relates to DBA work:
works and plays well with other
understands “ability”
leads by example
lives to share knowledge
understands “Service”
thoughtful of the consequences of their actions
understands projects
cool under pressure

Good Qualities:
confident
empathetic
humane
personal
forthright
respectful
thorough

Bad Qualities:
disrespective
insensitive
incompetent
[my own addition – no follow through, dosage lack of attention to detail]

The Dice/Monster Factor – what do job sites see as important for a senior position?

They back up the SAGE 5-year experience requirement
Ability to code in newer languages (Ruby/Python) is more prevalent (perhaps cloud-induced?)

The cloud allows sysadmin tasks to be done by anyone…..so developers can do sysadmin work, and you end up seeing schizophrenic job descriptions such as

About the 5-year requirement:
– Senior after 5 years? What happens after 10 years?
– Most electricians, by comparison, haven’t even completed an *apprenticeship* in 5 years.

Senior Administrators Code
– not just 20-line shell scripts
– coding skills are part of a sysadmin skill
– ability to code competently *is* a factor that separates juniors from seniors
– hiring managers expect senior admins to be competent coders.

If you are not a coder
– pick a language, any language
– do not listen to fans, find one that fits how you think, they all work…..
– …that being said, some languages are more practical than others (ie, .NET probably is not the best language to learn if you are a Unix sysadmin).

Popular admin languages:
– Perl: classic admin scripting language. Learn at least the basics, because you will see it in any environment that has been around for more than 5 years.

– Ruby: object-oriented language for people who mostly like Perl (except for its OO implementation)

– Python: object-oriented language for people who mostly hate Perl, objects or no objects. For example, you don’t have to create a String object to send an output.

But what if you do not have time to learn how to program?

– senior admins are better at managing their time than junior admins, so perhaps managing time
– time management means you’ll have more time to do things, it doesn’t mean all work work work.
– Read Time Management for System Administrators – there is Google Video of a presentation by the author, Tom Limoncelli.

Consider “The Cloud”
– starting to use developer APIs to perform sysadmin tasks, so learning programming is good.
– still growing, could supplant large portions of datacenter real estate
– a coder with sysadmin knowledge: Good
– a sysadmin with coding knowledge: Good
– a coder without sysadmin knowledge: OK
– a sysadmin with no coding interest/experience: Tough place to be in

Senior Admins Have Problems Too
Many don’t document or share knowledge
Maany don’t do a good job keeping up with their craft
Cannot always be highlighted as an example of how to deal with clients
Often reinvent the wheel – also usually there is no repository
Often don’t progress beyond the “senior admin” role

….on the other hand…..
cynicism can be good…..

Advice:
learn from the good traits
observe how others respond to their bad traits
think about how you might improve upon that
strive to work and play well with others, even if you don’t have a mentor for good/bad examples.

Now he’s going into talking about Patterns in System Administration….

Why Python?

– Low WTF per minute factor
– Passes the 6-month test (if you write python code, here going back in 6 months, prostate you pretty much know what you were trying to do)
– Small Shift/no-Shift ratio (ie, you use the “Shift” key a lot in Perl because you use $ % ( ) { } etc, so you can tell what something is by context, not by $ or %)
– It’s hard to make a mess
– Objects if you need them, ignore them if you don’t.


Basics
Here’s a sample interpreter session. The >>> is the python prompt, and the … is the second/subsequent line prompt:

>>> x='hello, world!';
>>> x.upper()
'HELLO, WORLD!'
>>> def swapper(mystr):
... return mystr.swapcase()
File "<stdin>", line 2
return mystr.swapcase()
^
IndentationError: expected an indented block

You need to put a space on the second line because whitespace ‘tabbing’ is enforced in Python:

>>> def swapper(mystr):
...  return mystr.swapcase()
...
>>> swapper(x)
'HELLO, WORLD!'
>>> x
'hello, world!'

Substrings
partition is how to get substrings based on a separator:

>>> def parts(mystr, sep=','):
...  return mystr.partition(sep)
...
>>> parts(x, ',')
('hello', ',', ' world!')

You can replace text, too, using replace.

>>> def personalize(greeting, name='Brian'):
...  """Replaces 'world' with a given name"""
...  return greeting.replace('world', name)
...
>>> personalize(x, 'Brian')
'hello, Brian!'

By the way, the stuff in the triple quotes is automatic documentation. A double underscore, also called a “dunder”, is to print the stuff in the triple quotes:

>>> print personalize.__doc__
Replaces 'world' with a given name

Loop over a list of functions and do that function to some data:

>>> funclist=[swapper, personalize, parts]
>>> for func in funclist:
...  func(x)
...
'HELLO, WORLD!'
'hello, Brian!'
('hello', ',', ' world!')

Lists

>>> v=range(1,10)
>>> v
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> v[1]
2
>>> v[5]
6
>>> v[-1]
9
>>> v[-3]
7

List slicing with “:”
>>> v[:2]
[1, 2]
>>> v[4:]
[5, 6, 7, 8, 9]
>>> v[4:9]
[5, 6, 7, 8, 9]
Note that there’s no error returned even though there’s no field 9. If you did v[9], you’d get an error:
>>> v[9]
Traceback (most recent call last):
File ““, line 1, in
IndexError: list index out of range

Python uses pointers (or pointer-like things) so v[1:-1] does not print the first and last values:

>>> v[1:-1]
[2, 3, 4, 5, 6, 7, 8]

The full array syntax is [start:end:index increment]:

>>> v[::2]
[1, 3, 5, 7, 9]
>>> v[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1]
>>> v[1:-1:4]
[2, 6]
>>> v[::3]
[1, 4, 7]

Make an array of numbers with range

>>> l=range(10)
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Make a list from another list

>>> [pow(num,2) for num in l]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

append appends to the end of a list

>>> l.append( [pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]]
>>> l.pop()
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

extend takes a sequence and puts it at the end of the array.

>>> l.extend([pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

A list can be made of a transformation, an iteration and optional filter:
[ i*i for i in mylist if i % 2 == 0]
transformation is i*i
iteration is for i in mylist
optional filter is if i % 2 == 0

>>> L=range(1,6)
>>> L
[1, 2, 3, 4, 5]
>>> [ i*i for i in L if i % 2 == 0]
[4, 16]

Tuples
Tuples are immutable lists, and they use () instead of []
A tuple always has 2 elements, so a one-item tuple is defined as
x=(1,)

Dictionaries aka associative arrays/hashes:

>>> d = {'user':'jonesy', 'room':'1178'}
>>> d
{'user': 'jonesy', 'room': '1178'}
>>> d['user']
'jonesy'
>>> d.keys()
['user', 'room']
>>> d.values()
['jonesy', '1178']
>>> d.items()
[('user', 'jonesy'), ('room', '1178')]
>>> d.items()[0]
('user', 'jonesy')
>>> d.items()[0][1]
'jonesy'
>>> d.items()[0][1].swapcase()
'JONESY'

There is no order to dictionaries, so don’t rely on it.

Quotes and string formatting
– You can use single and double quotes inside each other
– Inside triple quotes, you can use single and double quotes
– Variables are not recognized in strings, uses printf-style string formatting:

>>> word='World'
>>> punc='!'
>>> print "Hello, %s%s" % (word, punc)
Hello, World!

Braces, semicolons, indents
– Use indents instead of braces
– End-of-line instead of semicolons

if x == y:
print "x == y"
for k,v in mydict.iteritems():
if v is None:
continue
print "v has a value: %s" % v

This seems like it might be problematic because of long blocks of code, but apparently code blocks don’t get that long. You can also use folds in vim [now I need to look up what folds in vim are].

You can’t assign a value in a conditional statement’s expression — because you can’t use an = sign. This is on purpose, it avoids bugs resulting from typing if x=y instead of if x==y.

The construct has no place in production code anyway, since you give up catching any exceptions.

Python modules for sysadmins:
– sys
– os
– urlib/urlib2
– time, datetime (and calendar)
– fileinput
– stat
– filecmp
– glob (to use wildcards)
– shutil
– gzip
– tarfile
– hashlib, md5, crypt
– logging
– curses
– smtplib and email
– cmd

The Zen of Python
To get this, type ‘python’ in a unix environment, then type ‘import this’ at the commandline. I did this on my Windows laptop running Cygwin:

cabral@pythianbos2 ~
$ python
Python 2.5.2 (r252:60911, Dec  2 2008, 09:26:14)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

This was liveblogged, please let me know any issues, as they may be typos….

[author’s note: personally, store I use awk a bunch in MySQL DBA work, viagra for tasks like scrubbing data from a production export for use in qa/dev, but usually have to resort to Perl for really complex stuff, but now I know how to do .]

Basics:
By default, fields are separated by any number of spaces. The -F option to awk changes the separator on commandline.
Print the first field, fields are separated by a colon.
awk -F: '{print $1}' /etc/passwd

Print the first and fifth field:
awk -F: '{$print $1,$5}' /etc/passwd

Can pattern match and use files, so you can replace:
grep foo /etc/passwd | awk -F: '{print $1,$5}'
with:
awk -F: '/foo/ {print $1,$5}' /etc/passwd

NF = built in variable (no $) used to mean “field number”
This will print the first and last fields of lines where the first field matches “foo”
awk -F: '$1 ~/foo/ {print $1,$NF}' /etc/passwd

NF = number of fields, ie, “7″
$NF = value of last field, ie “/bin/bash”
(similarly, NR is record number)


Awk makes assumptions about input, variables, and processing that you’d otherwise have to code yourself.

– “main loop” of input processing is done for you
– awk initializes variables for you, to 0
– input is viewed by awk as ‘records’ which are splittable into ‘fields’

This all makes a lot of operations very concise in awk, many things can be done w/ a one-liner that would otherwise require several lines of code.

awk key points:
– splits text into fields
– default delimiter is “any number of spaces”
– reference fields
– $0 is entire line
– create filters using ‘addresses’ which can be regexps (similar to sed)
– Turing-complete language
– has if, while, for, do-while, etc
– built-in math like exp, log, rand, sin, cos
– built-in string sub, split, index, toupper/lower

Patterns and actions
Pattern is first, then action(s)
Actions are enclosed in {}

only a pattern, no action:
'length>42'
but, the default action is to print the whole line, so this will actually do something — print lines where the length of the line is > 42. strings are just arrays in awk

only action, no pattern:
{print $2,$1}
do this to all lines of input

NR % 3 == 0
print every third line (pattern is %NR mod 3)

{print $1, $NF, $(NF-1)}
print the first field, last field, and 2nd to last field

built-in variables
NF, NR we’ve done
FS = field separator (can be regexp)
OFMT = output format for numbers (default %.6g)

Patterns
– used to filter lines processed by awk
– can be regexp
/^root/ is the pattern in the following
awk -F:'/^root/ {print $1,$NF}' /etc/passwd

– Patterns can use fields and relational operators
To print 1st, 4th and last field if value of 4th field >10:
awk -F: '$4 > 10 {print $1, $4, $NF}' /etc/passwd

awk -F: '$0 !~ /^#/ && $4 > 10 {print $1, $4, $NF}' /etc/passwd

Range patterns
sed-like addressing : you can have start and end addresses
awk ‘NR==1,NR==3′
prints only first three lines of the file
You can use regular expressions in range patterns:
awk -F:’/^root/,/^daemon/ {print $1,$NF}’ /etc/passwd
start printing at the line that starts with “root”, the last line that is processed is the line starting with “daemon”

Range pattern “gotcha” – can’t mix a range with other patterns:
To do “start at non-commented line where value of $4 is less than 10, end at the first line where value of $4 is greater than 10″

This does not work!
awk -F: '$0 !~ /^#/ $4 <= 10, $4 > 10' /etc/passwd

This is how to do it, {next} is an action that skips:
awk -F: '$0 ~ /^#/ {next} $4 <= 10, $4 > 10 {print $1, $4' /etc/passwd

Basic Aggregation
awk -F: ‘$3 > 100 {x+=1; print x}’ /etc/passwd
This gives a line of output as each matching line is processed. This gives a running total of x.

awk -F: ‘$3 > 100 {x+=1} END {print x}’ /etc/passwd
This processes the “{print x}” action only after the entire file has been processed. This gives only the final value of x.

Arrays:
Support for regular arrays
Technically multi-dimensional arrays are not supported, but array indexes are not supported, so you can make your own associative arrays.

Example:
awk -F: ‘{x[$1] = $2*($4 – $3)} END {for(key in x) {print key, x[key]}}’ stocks.txt

The part before the END creates the associative array, the part after the END prints the array.

Extreme data munging:
awk -f: '{x[$1]=($2'($4 - $3))} END {for(z in x) {print z, x[z]}}' stocks.txt

ABC,100,12.14,19.12
FOO,100,24.01,17.45

output
BAR 271.5
ABC 698

For the line “ABC,100,12.14,19.12″
the function becomes

x[ABC] = 100 * (19.12 - 12.14) = 698

Aggregate across multiple variables:
awk -F, '{x[$1] = $2*($4 - $3); y+=x[$1]} END {for(z in x) {print z, x[z]}} {print "Net:"y}}' stocks.txt

Note that y is a running *sum* (not a running count like before).

Now, the above is hard to read, this is much easier.

#!/usr/bin/awk -f

BEGIN { FS="," }
{ x[$1] = $2*($4 - $3)
y+=x[$1]
}
END {
for(z in x) {
print z, x[z]
}
}  # end for loop
{
print "Net:"y
} # end END block

This was liveblogged, so please point out any issues, as they may be typos on my part….

The Beacon Pattern:
– This is a “Get out of the business” pattern
– Identify an oft-occurring and annoying task
– Automate and document it to the point of being able to hand it off to someone far less technical

Example:
– System admins were being put in charge of scheduling rooms in the building
– They wrote a PHP web application to help them automate the task
– They refined the app, cardiology and handed it off to a secretary
– They have to maintain the app, this web but it’s far less work.

The Community Pattern:

– Prior to launch of a new service, create user documentation for it.
– Point a few early adopters at the documentation and see if they can use the service with minimal support
– Use feedback to improve documentation, and the service
– Upon launch, create a mailing list, forum, IRC channel, or Jabber chat room and ask early adopters to help test it out.
– Upon launch, your early adopters are the community, and they’ll tell new users to use the tools you’ve provided instead of calling you.

Example:
– A beowulf cluster for an academic department
– Documented like crazy, early adopters were given early access to the cluster (demand was high)
– Crated a mailing list, early adopters were added to it with their consent, functionality was tested with them.
– Email announcing launch mentioned the early adopters in a ‘thank you’ secion, and linked them to their mailing list.

The DRY pattern
DRY = Don’t repeat yourself
Identify duplicate code in your automation scripts
Put subroutines that exist in an include file, and include them in your scripts.

Example:
– “sysadmin library”
– /var/lib/adm/.*pl
– Elapsed time and # of lines to script a task for which the library was useful plunged dramatically
– new tasks were thought up that were not considered before but were obvious now (ie, users that want to change their username)
– migrating to new services became much easier

The Chameleon Pattern
– Identify commonalities among your services
– Leverage those to create “Chameleon” servers that can be re-purposed on the fly
– Abstract as much of this away from the physical hardware
– Doesn’t need to involve virtualization, though it’s awfully handy if you can do it that way.
[this one is a bit harder to do with MySQL config files]

Example:
[puppet/cfengine were mentioned…]
ldapconfig.py – more than a script: a methodology

– But isn’t installing packages you don’t need bad? Depends on the package….ie, gcc is bad for enterprise

“Junior annoynances”

Terminal issues

Junior:
open terminal, login to machine1
think issue is with machine2, talks to machine1.
log out of machine1
log into machine2

Senior:
opens 2 terminals each of machine1 and machine2 to start

Junior:
networking issue ticket arrives
logs into server
runs tcpdump

Senior:
networking issue ticket arrives
logs into server
looks at logs

“Fix” vs. “Solution” ie “taking orders”
Junior will try fix a problem, senior will try to figure out what the problem is. ie, “I need a samba directory mounted under an NFS mount” a junior admin will try to do exactly that, a senior admin will ask “what are you trying to do with that?” because maybe all they need is a symlink.

Fanboyism
Signs you might be a fanboy:
– Disparaging users of latest stable release of $THING for not using the nightly (unstable) build which fixes more issues
– Creating false/invalid comparisons based on popular opinion instead of experience/facts
– Going against internal standards, breaking environmental consistency, to use $THING instead of $STANDARD (but this is also how disruptive technology works)
– Being in complete denial that most technology at some point or another stinks.
– Evaluating solutions based on “I like” instead of “we need” and “this does”.

Liveblog of the Professional IT Community Conference session Mentoring: It’s for everyone

Ways to learn:
Audio
Visual
Kinetic (doing it)

Everyone learns differently, cialis sale but most people learn with some combination of all these three.

However, help you can also learn by training [that’s the truth, otolaryngologist I learned a LOT by writing the book, even things I knew, I ended up needing to research more].

Ways to train:
Explanation
Observation
Demonstration
Questioning (Socratic Method)

What is a mentor?
noun: experienced and trusted adviser. It’s not just someone who teaches, it’s someone who advises.
experienced person in a company, college or schools who trains and counsels new employees or students.

verb: to advise or train (someone, esp. a younger colleague).

A mentorship is a safe place to ask questions.

A mentor is a trainer, but a trainer who also is a professional advisor.

Finding a mentor
Someone…..
– you respect/admire
– works with similar technology
– has a compatible personality
– you have a good rapport with

Being a mentor
– Teach technical skills
– Provide advanced technical/design guidance
– Model and teach professional skills
– Be interested and invested in the [student’s] career

I am moderating and liveblogging the Professional IT Community Conference panel called Tech Women Rule! Creative Solutions for being a (or working with a) female technologist.

One point to keep in mind: The goal is not equality for equality’s sake. The goal is to have a diverse range of experience to make your company/project/whatever the best it could be.

That being said, tadalafil these issues are not just around women; they are about anyone who is “different”, whether it’s race, ethnicity, gender, sexual orientation, cultural.

So what are some of the solutions?

0) Better align expectations with reality. Are you expecting more from someone who is one gender than another? If a woman makes a mistake is it worse because she has to prove herself? Is it worse because she is representative of her gender? If she does something good is the achievement elevated more because of her gender? Either is bad.

1) Respect people for who they are. Everyone deserves respect; if someone is not at your technical level, they still deserve respect.

If someone says something that is completely wrong from a technical perspective, do not assume that they have no idea what they are talking about. It could be that they are the exact case in which that technical scenario is appropriate for them. If they are correct, your attitude will be refreshing and you might learn something. If they are indeed wrong, ask them about a scenario in which their thinking falls apart, or otherwise guide them through learning why what they are saying is wrong.

2) Be nice. Don’t condescend.

3) Be helpful. “RTFM, n00b!” is not helpful, and certainly does not follow rule #2.

4) Don’t do #1-3 for women only. Don’t treat women nicely because they’re women, and be a jerk to men because they’re men. Being helpful is good for anyone, not just women.

5) Cooperate, do not compete. Whether you are co-workers, working together on a software project, or just in a conversation, the game of “one-upping” another is a lot less useful than working together.

6) When hiring or when in an interview, concentrate on skills, not knowledge. “Skills” refers to attributes such as their ability to listen, how judgmental they are about a legacy system, whether they are open to new ideas, whether they disdain anything that is not cutting edge, and even technical skills such as thinking patterns, research patterns, algorithms, etc.

If someone says “I don’t know” in an interview, ask them “how would you go about figuring it out?” If someone says “I think it’s x and y” ask “how would you confirm/test that?” If a backup failed, do they start the backup over or do they try to figure out why it failed?

Are they thorough? Do they follow through? It is a lot easier to teach knowledge than it is to teach something like “debugging skills”.

7) Specifically encourage people to speak up. Train yourself to NOTICE when folks are not speaking up, and ask them if they have any suggestions or ideas.

8) If you are running an IT conference, specifically ask qualified women you know to speak, not about “women in IT”. If you hear of an IT conference, tell specific women you know that you think they would be a great speaker. Get women to speak at local user groups to get practice in a less intimidating space.

Resources:
Read HOWTO Encourage Women in Linux

You Just Don’t Understand
The Male Mind at Work
How to succeed in business without a penis: a guide for working women (this is a humor book)

Join and/or send messages to Systers, the world’s largest e-mail list of women in computing and technology fields.

In his advice is “do not change sort_buffer_size from the default.”

Baron did not explain the logic behind his reasoning, information pills he handwaves that “people utterly ruin their server performance and stability with it, check ” but does not explain how changing the sort_buffer_size kills performance and stability. Regardless of how respected and knowledgeable the source, NEVER take any advice that tells you what to do or how to do it without understanding WHY.

This article will explain the “why” of Baron’s point, and it will also talk more about understanding why, an integral part against the “Battle against any guess.” Baron’s recommendation to leave sort_buffer_size as the default is just as bad as all the advice given to change the sort_buffer_size, because all that advice (including Baron’s) does not explain the underlying causes.

First, I explain the sort_buffer_size issue. The sort buffer size, as the name implies, is a memory buffer used when ordering is needed (usually for GROUP BY and ORDER BY clauses, when the index used for the filter/join does not follow the GROUP/ORDER BY order). Increasing the sort_buffer_size means allowing more memory to be used for the sorting process.

Increasing the sort_buffer_size usually improves performance because more memory is used in sorting. It can be detrimental to performance because the full size of the sort buffer is allocated for each thread that needs to do a sort, even if that sort does not need a very large sort buffer.

A better optimization would be to change the schema and/or queries so that all that sorting is not necessary. Increasing the sort_buffer_size gives you a false sense of security that your server is performing better. Your server is performing the same tasks, only faster — the best optimization is to make the tasks smaller or eliminate some tasks. If you can have queries without so much sorting, that’s a much better optimization than changing sort_buffer_size.

That being said, increasing the sort_buffer_size is a perfectly acceptable stop-gap solution that can be implemented RIGHT NOW (it’s a dynamic variable), while you examine your queries by doing a query review with a tool such as mk-query-digest. This is indeed what Pythian does — and, by the way, not only do we recommend that course of action, but we explain it to you and help you find and optimize the queries in question.

That all assumes that having lots of sorts that require lots of memory is a bad thing. It may be that you have tuned your queries and schema such that you have eliminated as many sorts as you can, but some may remain. An intensive data mining server is a good example of a situation in which permanently increasing the sort_buffer_size may be the right solution.

Now that we have the specifics of this situation out of the way, let’s look at the Battle Against Any Guess. This is a movement against guessing games. Understanding what you are doing is essential; in the case of sort_buffer_size, you can believe that you know what you are doing by increasing sort_buffer_size. However, the real solution to the problem lies in changing the queries, not changing the memory patterns.

There is a 6-page description of the “Battle against any guess” in the Northern California Oracle User Group’s May Journal, starting on page 13. The examples are specific to Oracle, but the points made are sound even if you do not know Oracle well. For example:

Blindly implementing best practices is nothing different from guesswork; we are applying some past-proven solutions without measuring how they stand against our requirements, and without testing whether they bring us any closer to the targets we have. Industry has become so obsessed with best practices that we commonly see projects in which reviewing an environment for compliance with best practices is the ultimate goal.

One good reason you need to know *why* is also mentioned in the article: The second danger of best practices is that they easily become myths. The technology keeps improving and issues addressed by certain best practices might not be relevant anymore in the next software version.

So, even from respected folks like Baron or myself, do not take advice on face value. Ask why, understand why, and then think if there is another level. It is not always easy; often you think you understand but really you miss that other level – such as with sort_buffer_size.

In but there are many negatives.

I have a hard time trying to figure out the deeper meaning behind Peter’s post, allergist given that Percona writes a storage engine for MySQL, information pills XtraDB. Does this mean that Percona will stop developing XtraDB? Does this mean that the Percona Server will diverge farther and farther away from MySQL so that they’re not compatible any more and migrating from MySQL to Percona Server is very difficult?

Or maybe it’s just that Peter is saying one thing and doing the opposite; which just seems wrong because that would be blatant hypocrisy on Percona’s part.

(This idea was a comment on the blog post but seems to be trapped in the spam filter, so I’m posting it; apologies if the comment comes through eventually….)

My own opinion of the issue: Peter is factually correct with what he says. However, it’s nice to have the framework and be allowed to use more than one storage engine, or use exclusively one storage engine that’s not MyISAM.

I have been talking with a group of folks who have been making a product that has lots of free functionality, bronchi seeing and managing the MySQL config file (/etc/my.cnf), approved seeing and managing accounts, a small dashboard of overall health graphs, and more.

With this free tool you can look at and manage local and remote databases. It supports ssh tunneling, including ssh using password-protected ssh keys. It’s pretty neat, and I have been working with the product manager to add features. I think this took will become the de facto standard for centralized GUI administration of MySQL.

The tool is
MySQL workbench….Surprise! One of the best new features for the administrator is that you can now create an administration connection for an existing workbench connection with a click of a button, instead of having to enter in all that information again.

I use the “developer” version, 5.2.21. Note that the 5.1 version does not have administration capabilities.

A MySQL user group member saw that I use and I can even connect directly to Cygwin with an icon.

But Poderosa is not the tool I wanted to mention….Another user group member mentioned makes a nicely tabbed browsing window where you can open sessions by double-clicking the connections, sildenafil which are now listed on the right-hand side.

See screenshot below:

I have not played with other features such as sending a command to multiple windows, but even just having this is a HUGE win.

On Monday, this web doctor his thoughts on Tuesday.

We have confirmed that there will be an entire MySQL track at Kaleidoscope! Because Kaleidoscope is less than 8 weeks away, dosage we could not go through a standard call for papers. Ronald and I have been working to come up with appropriate topics and speakers for an audience that uses MySQL but is probably more familiar with Oracle. We contacted folks we thought would be interested, pharmacy and who we thought could make it logistically, as the conference is in Washington, D.C.

We have (almost) finalized the list of speakers; the session abstracts will be finalized in the next few days. You can see the speakers at Kaleidoscope’s MySQL page, but I’ve also listed them below (alpha by last name):

Philip Antoniades, Sun/MySQL
Ronald Bradford, 42SQL
Sheeri K. Cabral, The Pythian Group
Laine Campbell, PalominoDB
Patrick Galbraith, Northscale
Sarah Novotny, Blue Gecko
Padraig O’Sullivan, Akiba Technologies Inc.
Jay Pipes, Rackspace Cloud
Dossy Shiobara, Panoptic.com
Matt Yonkovit, Percona

There are one or two more speakers we are waiting to hear back from. There will be 19 sessions, so some speakers will have more than one session.

I am very excited that MySQL has its own track at Kaleidoscope. In addition, Ronald and I will be able to attend our very first event as Oracle ACE Directors – the Sundown Sessions are a Birds-of-a-Feather-type discussion, with the Oracle ACE Directors being the panelists and the community asking questions. Immediately after the Sundown Sessions is a “Meet the Oracle ACE” event, the only part of the conference officially sponsored by Oracle.

I am attending the tomorrow.)

So I am in Seeking Senior and Beyond: The Tech Skills That Get You Promoted. The first part talks about the definition of what it means to be senior, nurse and it completely relates to DBA work:
works and plays well with other
understands “ability”
leads by example
lives to share knowledge
understands “Service”
thoughtful of the consequences of their actions
understands projects
cool under pressure

Good Qualities:
confident
empathetic
humane
personal
forthright
respectful
thorough

Bad Qualities:
disrespective
insensitive
incompetent
[my own addition – no follow through, dosage lack of attention to detail]

The Dice/Monster Factor – what do job sites see as important for a senior position?

They back up the SAGE 5-year experience requirement
Ability to code in newer languages (Ruby/Python) is more prevalent (perhaps cloud-induced?)

The cloud allows sysadmin tasks to be done by anyone…..so developers can do sysadmin work, and you end up seeing schizophrenic job descriptions such as

About the 5-year requirement:
– Senior after 5 years? What happens after 10 years?
– Most electricians, by comparison, haven’t even completed an *apprenticeship* in 5 years.

Senior Administrators Code
– not just 20-line shell scripts
– coding skills are part of a sysadmin skill
– ability to code competently *is* a factor that separates juniors from seniors
– hiring managers expect senior admins to be competent coders.

If you are not a coder
– pick a language, any language
– do not listen to fans, find one that fits how you think, they all work…..
– …that being said, some languages are more practical than others (ie, .NET probably is not the best language to learn if you are a Unix sysadmin).

Popular admin languages:
– Perl: classic admin scripting language. Learn at least the basics, because you will see it in any environment that has been around for more than 5 years.

– Ruby: object-oriented language for people who mostly like Perl (except for its OO implementation)

– Python: object-oriented language for people who mostly hate Perl, objects or no objects. For example, you don’t have to create a String object to send an output.

But what if you do not have time to learn how to program?

– senior admins are better at managing their time than junior admins, so perhaps managing time
– time management means you’ll have more time to do things, it doesn’t mean all work work work.
– Read Time Management for System Administrators – there is Google Video of a presentation by the author, Tom Limoncelli.

Consider “The Cloud”
– starting to use developer APIs to perform sysadmin tasks, so learning programming is good.
– still growing, could supplant large portions of datacenter real estate
– a coder with sysadmin knowledge: Good
– a sysadmin with coding knowledge: Good
– a coder without sysadmin knowledge: OK
– a sysadmin with no coding interest/experience: Tough place to be in

Senior Admins Have Problems Too
Many don’t document or share knowledge
Maany don’t do a good job keeping up with their craft
Cannot always be highlighted as an example of how to deal with clients
Often reinvent the wheel – also usually there is no repository
Often don’t progress beyond the “senior admin” role

….on the other hand…..
cynicism can be good…..

Advice:
learn from the good traits
observe how others respond to their bad traits
think about how you might improve upon that
strive to work and play well with others, even if you don’t have a mentor for good/bad examples.

Now he’s going into talking about Patterns in System Administration….

Why Python?

– Low WTF per minute factor
– Passes the 6-month test (if you write python code, here going back in 6 months, prostate you pretty much know what you were trying to do)
– Small Shift/no-Shift ratio (ie, you use the “Shift” key a lot in Perl because you use $ % ( ) { } etc, so you can tell what something is by context, not by $ or %)
– It’s hard to make a mess
– Objects if you need them, ignore them if you don’t.


Basics
Here’s a sample interpreter session. The >>> is the python prompt, and the … is the second/subsequent line prompt:

>>> x='hello, world!';
>>> x.upper()
'HELLO, WORLD!'
>>> def swapper(mystr):
... return mystr.swapcase()
File "<stdin>", line 2
return mystr.swapcase()
^
IndentationError: expected an indented block

You need to put a space on the second line because whitespace ‘tabbing’ is enforced in Python:

>>> def swapper(mystr):
...  return mystr.swapcase()
...
>>> swapper(x)
'HELLO, WORLD!'
>>> x
'hello, world!'

Substrings
partition is how to get substrings based on a separator:

>>> def parts(mystr, sep=','):
...  return mystr.partition(sep)
...
>>> parts(x, ',')
('hello', ',', ' world!')

You can replace text, too, using replace.

>>> def personalize(greeting, name='Brian'):
...  """Replaces 'world' with a given name"""
...  return greeting.replace('world', name)
...
>>> personalize(x, 'Brian')
'hello, Brian!'

By the way, the stuff in the triple quotes is automatic documentation. A double underscore, also called a “dunder”, is to print the stuff in the triple quotes:

>>> print personalize.__doc__
Replaces 'world' with a given name

Loop over a list of functions and do that function to some data:

>>> funclist=[swapper, personalize, parts]
>>> for func in funclist:
...  func(x)
...
'HELLO, WORLD!'
'hello, Brian!'
('hello', ',', ' world!')

Lists

>>> v=range(1,10)
>>> v
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> v[1]
2
>>> v[5]
6
>>> v[-1]
9
>>> v[-3]
7

List slicing with “:”
>>> v[:2]
[1, 2]
>>> v[4:]
[5, 6, 7, 8, 9]
>>> v[4:9]
[5, 6, 7, 8, 9]
Note that there’s no error returned even though there’s no field 9. If you did v[9], you’d get an error:
>>> v[9]
Traceback (most recent call last):
File ““, line 1, in
IndexError: list index out of range

Python uses pointers (or pointer-like things) so v[1:-1] does not print the first and last values:

>>> v[1:-1]
[2, 3, 4, 5, 6, 7, 8]

The full array syntax is [start:end:index increment]:

>>> v[::2]
[1, 3, 5, 7, 9]
>>> v[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1]
>>> v[1:-1:4]
[2, 6]
>>> v[::3]
[1, 4, 7]

Make an array of numbers with range

>>> l=range(10)
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Make a list from another list

>>> [pow(num,2) for num in l]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

append appends to the end of a list

>>> l.append( [pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]]
>>> l.pop()
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

extend takes a sequence and puts it at the end of the array.

>>> l.extend([pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

A list can be made of a transformation, an iteration and optional filter:
[ i*i for i in mylist if i % 2 == 0]
transformation is i*i
iteration is for i in mylist
optional filter is if i % 2 == 0

>>> L=range(1,6)
>>> L
[1, 2, 3, 4, 5]
>>> [ i*i for i in L if i % 2 == 0]
[4, 16]

Tuples
Tuples are immutable lists, and they use () instead of []
A tuple always has 2 elements, so a one-item tuple is defined as
x=(1,)

Dictionaries aka associative arrays/hashes:

>>> d = {'user':'jonesy', 'room':'1178'}
>>> d
{'user': 'jonesy', 'room': '1178'}
>>> d['user']
'jonesy'
>>> d.keys()
['user', 'room']
>>> d.values()
['jonesy', '1178']
>>> d.items()
[('user', 'jonesy'), ('room', '1178')]
>>> d.items()[0]
('user', 'jonesy')
>>> d.items()[0][1]
'jonesy'
>>> d.items()[0][1].swapcase()
'JONESY'

There is no order to dictionaries, so don’t rely on it.

Quotes and string formatting
– You can use single and double quotes inside each other
– Inside triple quotes, you can use single and double quotes
– Variables are not recognized in strings, uses printf-style string formatting:

>>> word='World'
>>> punc='!'
>>> print "Hello, %s%s" % (word, punc)
Hello, World!

Braces, semicolons, indents
– Use indents instead of braces
– End-of-line instead of semicolons

if x == y:
print "x == y"
for k,v in mydict.iteritems():
if v is None:
continue
print "v has a value: %s" % v

This seems like it might be problematic because of long blocks of code, but apparently code blocks don’t get that long. You can also use folds in vim [now I need to look up what folds in vim are].

You can’t assign a value in a conditional statement’s expression — because you can’t use an = sign. This is on purpose, it avoids bugs resulting from typing if x=y instead of if x==y.

The construct has no place in production code anyway, since you give up catching any exceptions.

Python modules for sysadmins:
– sys
– os
– urlib/urlib2
– time, datetime (and calendar)
– fileinput
– stat
– filecmp
– glob (to use wildcards)
– shutil
– gzip
– tarfile
– hashlib, md5, crypt
– logging
– curses
– smtplib and email
– cmd

The Zen of Python
To get this, type ‘python’ in a unix environment, then type ‘import this’ at the commandline. I did this on my Windows laptop running Cygwin:

cabral@pythianbos2 ~
$ python
Python 2.5.2 (r252:60911, Dec  2 2008, 09:26:14)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

This was liveblogged, please let me know any issues, as they may be typos….

[author’s note: personally, store I use awk a bunch in MySQL DBA work, viagra for tasks like scrubbing data from a production export for use in qa/dev, but usually have to resort to Perl for really complex stuff, but now I know how to do .]

Basics:
By default, fields are separated by any number of spaces. The -F option to awk changes the separator on commandline.
Print the first field, fields are separated by a colon.
awk -F: '{print $1}' /etc/passwd

Print the first and fifth field:
awk -F: '{$print $1,$5}' /etc/passwd

Can pattern match and use files, so you can replace:
grep foo /etc/passwd | awk -F: '{print $1,$5}'
with:
awk -F: '/foo/ {print $1,$5}' /etc/passwd

NF = built in variable (no $) used to mean “field number”
This will print the first and last fields of lines where the first field matches “foo”
awk -F: '$1 ~/foo/ {print $1,$NF}' /etc/passwd

NF = number of fields, ie, “7″
$NF = value of last field, ie “/bin/bash”
(similarly, NR is record number)


Awk makes assumptions about input, variables, and processing that you’d otherwise have to code yourself.

– “main loop” of input processing is done for you
– awk initializes variables for you, to 0
– input is viewed by awk as ‘records’ which are splittable into ‘fields’

This all makes a lot of operations very concise in awk, many things can be done w/ a one-liner that would otherwise require several lines of code.

awk key points:
– splits text into fields
– default delimiter is “any number of spaces”
– reference fields
– $0 is entire line
– create filters using ‘addresses’ which can be regexps (similar to sed)
– Turing-complete language
– has if, while, for, do-while, etc
– built-in math like exp, log, rand, sin, cos
– built-in string sub, split, index, toupper/lower

Patterns and actions
Pattern is first, then action(s)
Actions are enclosed in {}

only a pattern, no action:
'length>42'
but, the default action is to print the whole line, so this will actually do something — print lines where the length of the line is > 42. strings are just arrays in awk

only action, no pattern:
{print $2,$1}
do this to all lines of input

NR % 3 == 0
print every third line (pattern is %NR mod 3)

{print $1, $NF, $(NF-1)}
print the first field, last field, and 2nd to last field

built-in variables
NF, NR we’ve done
FS = field separator (can be regexp)
OFMT = output format for numbers (default %.6g)

Patterns
– used to filter lines processed by awk
– can be regexp
/^root/ is the pattern in the following
awk -F:'/^root/ {print $1,$NF}' /etc/passwd

– Patterns can use fields and relational operators
To print 1st, 4th and last field if value of 4th field >10:
awk -F: '$4 > 10 {print $1, $4, $NF}' /etc/passwd

awk -F: '$0 !~ /^#/ && $4 > 10 {print $1, $4, $NF}' /etc/passwd

Range patterns
sed-like addressing : you can have start and end addresses
awk ‘NR==1,NR==3′
prints only first three lines of the file
You can use regular expressions in range patterns:
awk -F:’/^root/,/^daemon/ {print $1,$NF}’ /etc/passwd
start printing at the line that starts with “root”, the last line that is processed is the line starting with “daemon”

Range pattern “gotcha” – can’t mix a range with other patterns:
To do “start at non-commented line where value of $4 is less than 10, end at the first line where value of $4 is greater than 10″

This does not work!
awk -F: '$0 !~ /^#/ $4 <= 10, $4 > 10' /etc/passwd

This is how to do it, {next} is an action that skips:
awk -F: '$0 ~ /^#/ {next} $4 <= 10, $4 > 10 {print $1, $4' /etc/passwd

Basic Aggregation
awk -F: ‘$3 > 100 {x+=1; print x}’ /etc/passwd
This gives a line of output as each matching line is processed. This gives a running total of x.

awk -F: ‘$3 > 100 {x+=1} END {print x}’ /etc/passwd
This processes the “{print x}” action only after the entire file has been processed. This gives only the final value of x.

Arrays:
Support for regular arrays
Technically multi-dimensional arrays are not supported, but array indexes are not supported, so you can make your own associative arrays.

Example:
awk -F: ‘{x[$1] = $2*($4 – $3)} END {for(key in x) {print key, x[key]}}’ stocks.txt

The part before the END creates the associative array, the part after the END prints the array.

Extreme data munging:
awk -f: '{x[$1]=($2'($4 - $3))} END {for(z in x) {print z, x[z]}}' stocks.txt

ABC,100,12.14,19.12
FOO,100,24.01,17.45

output
BAR 271.5
ABC 698

For the line “ABC,100,12.14,19.12″
the function becomes

x[ABC] = 100 * (19.12 - 12.14) = 698

Aggregate across multiple variables:
awk -F, '{x[$1] = $2*($4 - $3); y+=x[$1]} END {for(z in x) {print z, x[z]}} {print "Net:"y}}' stocks.txt

Note that y is a running *sum* (not a running count like before).

Now, the above is hard to read, this is much easier.

#!/usr/bin/awk -f

BEGIN { FS="," }
{ x[$1] = $2*($4 - $3)
y+=x[$1]
}
END {
for(z in x) {
print z, x[z]
}
}  # end for loop
{
print "Net:"y
} # end END block

This was liveblogged, so please point out any issues, as they may be typos on my part….

The Beacon Pattern:
– This is a “Get out of the business” pattern
– Identify an oft-occurring and annoying task
– Automate and document it to the point of being able to hand it off to someone far less technical

Example:
– System admins were being put in charge of scheduling rooms in the building
– They wrote a PHP web application to help them automate the task
– They refined the app, cardiology and handed it off to a secretary
– They have to maintain the app, this web but it’s far less work.

The Community Pattern:

– Prior to launch of a new service, create user documentation for it.
– Point a few early adopters at the documentation and see if they can use the service with minimal support
– Use feedback to improve documentation, and the service
– Upon launch, create a mailing list, forum, IRC channel, or Jabber chat room and ask early adopters to help test it out.
– Upon launch, your early adopters are the community, and they’ll tell new users to use the tools you’ve provided instead of calling you.

Example:
– A beowulf cluster for an academic department
– Documented like crazy, early adopters were given early access to the cluster (demand was high)
– Crated a mailing list, early adopters were added to it with their consent, functionality was tested with them.
– Email announcing launch mentioned the early adopters in a ‘thank you’ secion, and linked them to their mailing list.

The DRY pattern
DRY = Don’t repeat yourself
Identify duplicate code in your automation scripts
Put subroutines that exist in an include file, and include them in your scripts.

Example:
– “sysadmin library”
– /var/lib/adm/.*pl
– Elapsed time and # of lines to script a task for which the library was useful plunged dramatically
– new tasks were thought up that were not considered before but were obvious now (ie, users that want to change their username)
– migrating to new services became much easier

The Chameleon Pattern
– Identify commonalities among your services
– Leverage those to create “Chameleon” servers that can be re-purposed on the fly
– Abstract as much of this away from the physical hardware
– Doesn’t need to involve virtualization, though it’s awfully handy if you can do it that way.
[this one is a bit harder to do with MySQL config files]

Example:
[puppet/cfengine were mentioned…]
ldapconfig.py – more than a script: a methodology

– But isn’t installing packages you don’t need bad? Depends on the package….ie, gcc is bad for enterprise

“Junior annoynances”

Terminal issues

Junior:
open terminal, login to machine1
think issue is with machine2, talks to machine1.
log out of machine1
log into machine2

Senior:
opens 2 terminals each of machine1 and machine2 to start

Junior:
networking issue ticket arrives
logs into server
runs tcpdump

Senior:
networking issue ticket arrives
logs into server
looks at logs

“Fix” vs. “Solution” ie “taking orders”
Junior will try fix a problem, senior will try to figure out what the problem is. ie, “I need a samba directory mounted under an NFS mount” a junior admin will try to do exactly that, a senior admin will ask “what are you trying to do with that?” because maybe all they need is a symlink.

Fanboyism
Signs you might be a fanboy:
– Disparaging users of latest stable release of $THING for not using the nightly (unstable) build which fixes more issues
– Creating false/invalid comparisons based on popular opinion instead of experience/facts
– Going against internal standards, breaking environmental consistency, to use $THING instead of $STANDARD (but this is also how disruptive technology works)
– Being in complete denial that most technology at some point or another stinks.
– Evaluating solutions based on “I like” instead of “we need” and “this does”.

Liveblog of the Professional IT Community Conference session Mentoring: It’s for everyone

Ways to learn:
Audio
Visual
Kinetic (doing it)

Everyone learns differently, cialis sale but most people learn with some combination of all these three.

However, help you can also learn by training [that’s the truth, otolaryngologist I learned a LOT by writing the book, even things I knew, I ended up needing to research more].

Ways to train:
Explanation
Observation
Demonstration
Questioning (Socratic Method)

What is a mentor?
noun: experienced and trusted adviser. It’s not just someone who teaches, it’s someone who advises.
experienced person in a company, college or schools who trains and counsels new employees or students.

verb: to advise or train (someone, esp. a younger colleague).

A mentorship is a safe place to ask questions.

A mentor is a trainer, but a trainer who also is a professional advisor.

Finding a mentor
Someone…..
– you respect/admire
– works with similar technology
– has a compatible personality
– you have a good rapport with

Being a mentor
– Teach technical skills
– Provide advanced technical/design guidance
– Model and teach professional skills
– Be interested and invested in the [student’s] career

I am moderating and liveblogging the Professional IT Community Conference panel called Tech Women Rule! Creative Solutions for being a (or working with a) female technologist.

One point to keep in mind: The goal is not equality for equality’s sake. The goal is to have a diverse range of experience to make your company/project/whatever the best it could be.

That being said, tadalafil these issues are not just around women; they are about anyone who is “different”, whether it’s race, ethnicity, gender, sexual orientation, cultural.

So what are some of the solutions?

0) Better align expectations with reality. Are you expecting more from someone who is one gender than another? If a woman makes a mistake is it worse because she has to prove herself? Is it worse because she is representative of her gender? If she does something good is the achievement elevated more because of her gender? Either is bad.

1) Respect people for who they are. Everyone deserves respect; if someone is not at your technical level, they still deserve respect.

If someone says something that is completely wrong from a technical perspective, do not assume that they have no idea what they are talking about. It could be that they are the exact case in which that technical scenario is appropriate for them. If they are correct, your attitude will be refreshing and you might learn something. If they are indeed wrong, ask them about a scenario in which their thinking falls apart, or otherwise guide them through learning why what they are saying is wrong.

2) Be nice. Don’t condescend.

3) Be helpful. “RTFM, n00b!” is not helpful, and certainly does not follow rule #2.

4) Don’t do #1-3 for women only. Don’t treat women nicely because they’re women, and be a jerk to men because they’re men. Being helpful is good for anyone, not just women.

5) Cooperate, do not compete. Whether you are co-workers, working together on a software project, or just in a conversation, the game of “one-upping” another is a lot less useful than working together.

6) When hiring or when in an interview, concentrate on skills, not knowledge. “Skills” refers to attributes such as their ability to listen, how judgmental they are about a legacy system, whether they are open to new ideas, whether they disdain anything that is not cutting edge, and even technical skills such as thinking patterns, research patterns, algorithms, etc.

If someone says “I don’t know” in an interview, ask them “how would you go about figuring it out?” If someone says “I think it’s x and y” ask “how would you confirm/test that?” If a backup failed, do they start the backup over or do they try to figure out why it failed?

Are they thorough? Do they follow through? It is a lot easier to teach knowledge than it is to teach something like “debugging skills”.

7) Specifically encourage people to speak up. Train yourself to NOTICE when folks are not speaking up, and ask them if they have any suggestions or ideas.

8) If you are running an IT conference, specifically ask qualified women you know to speak, not about “women in IT”. If you hear of an IT conference, tell specific women you know that you think they would be a great speaker. Get women to speak at local user groups to get practice in a less intimidating space.

Resources:
Read HOWTO Encourage Women in Linux

You Just Don’t Understand
The Male Mind at Work
How to succeed in business without a penis: a guide for working women (this is a humor book)

Join and/or send messages to Systers, the world’s largest e-mail list of women in computing and technology fields.

In his advice is “do not change sort_buffer_size from the default.”

Baron did not explain the logic behind his reasoning, information pills he handwaves that “people utterly ruin their server performance and stability with it, check ” but does not explain how changing the sort_buffer_size kills performance and stability. Regardless of how respected and knowledgeable the source, NEVER take any advice that tells you what to do or how to do it without understanding WHY.

This article will explain the “why” of Baron’s point, and it will also talk more about understanding why, an integral part against the “Battle against any guess.” Baron’s recommendation to leave sort_buffer_size as the default is just as bad as all the advice given to change the sort_buffer_size, because all that advice (including Baron’s) does not explain the underlying causes.

First, I explain the sort_buffer_size issue. The sort buffer size, as the name implies, is a memory buffer used when ordering is needed (usually for GROUP BY and ORDER BY clauses, when the index used for the filter/join does not follow the GROUP/ORDER BY order). Increasing the sort_buffer_size means allowing more memory to be used for the sorting process.

Increasing the sort_buffer_size usually improves performance because more memory is used in sorting. It can be detrimental to performance because the full size of the sort buffer is allocated for each thread that needs to do a sort, even if that sort does not need a very large sort buffer.

A better optimization would be to change the schema and/or queries so that all that sorting is not necessary. Increasing the sort_buffer_size gives you a false sense of security that your server is performing better. Your server is performing the same tasks, only faster — the best optimization is to make the tasks smaller or eliminate some tasks. If you can have queries without so much sorting, that’s a much better optimization than changing sort_buffer_size.

That being said, increasing the sort_buffer_size is a perfectly acceptable stop-gap solution that can be implemented RIGHT NOW (it’s a dynamic variable), while you examine your queries by doing a query review with a tool such as mk-query-digest. This is indeed what Pythian does — and, by the way, not only do we recommend that course of action, but we explain it to you and help you find and optimize the queries in question.

That all assumes that having lots of sorts that require lots of memory is a bad thing. It may be that you have tuned your queries and schema such that you have eliminated as many sorts as you can, but some may remain. An intensive data mining server is a good example of a situation in which permanently increasing the sort_buffer_size may be the right solution.

Now that we have the specifics of this situation out of the way, let’s look at the Battle Against Any Guess. This is a movement against guessing games. Understanding what you are doing is essential; in the case of sort_buffer_size, you can believe that you know what you are doing by increasing sort_buffer_size. However, the real solution to the problem lies in changing the queries, not changing the memory patterns.

There is a 6-page description of the “Battle against any guess” in the Northern California Oracle User Group’s May Journal, starting on page 13. The examples are specific to Oracle, but the points made are sound even if you do not know Oracle well. For example:

Blindly implementing best practices is nothing different from guesswork; we are applying some past-proven solutions without measuring how they stand against our requirements, and without testing whether they bring us any closer to the targets we have. Industry has become so obsessed with best practices that we commonly see projects in which reviewing an environment for compliance with best practices is the ultimate goal.

One good reason you need to know *why* is also mentioned in the article: The second danger of best practices is that they easily become myths. The technology keeps improving and issues addressed by certain best practices might not be relevant anymore in the next software version.

So, even from respected folks like Baron or myself, do not take advice on face value. Ask why, understand why, and then think if there is another level. It is not always easy; often you think you understand but really you miss that other level – such as with sort_buffer_size.

In but there are many negatives.

I have a hard time trying to figure out the deeper meaning behind Peter’s post, allergist given that Percona writes a storage engine for MySQL, information pills XtraDB. Does this mean that Percona will stop developing XtraDB? Does this mean that the Percona Server will diverge farther and farther away from MySQL so that they’re not compatible any more and migrating from MySQL to Percona Server is very difficult?

Or maybe it’s just that Peter is saying one thing and doing the opposite; which just seems wrong because that would be blatant hypocrisy on Percona’s part.

(This idea was a comment on the blog post but seems to be trapped in the spam filter, so I’m posting it; apologies if the comment comes through eventually….)

My own opinion of the issue: Peter is factually correct with what he says. However, it’s nice to have the framework and be allowed to use more than one storage engine, or use exclusively one storage engine that’s not MyISAM.

I have been talking with a group of folks who have been making a product that has lots of free functionality, bronchi seeing and managing the MySQL config file (/etc/my.cnf), approved seeing and managing accounts, a small dashboard of overall health graphs, and more.

With this free tool you can look at and manage local and remote databases. It supports ssh tunneling, including ssh using password-protected ssh keys. It’s pretty neat, and I have been working with the product manager to add features. I think this took will become the de facto standard for centralized GUI administration of MySQL.

The tool is
MySQL workbench….Surprise! One of the best new features for the administrator is that you can now create an administration connection for an existing workbench connection with a click of a button, instead of having to enter in all that information again.

I use the “developer” version, 5.2.21. Note that the 5.1 version does not have administration capabilities.

A MySQL user group member saw that I use and I can even connect directly to Cygwin with an icon.

But Poderosa is not the tool I wanted to mention….Another user group member mentioned makes a nicely tabbed browsing window where you can open sessions by double-clicking the connections, sildenafil which are now listed on the right-hand side.

See screenshot below:

I have not played with other features such as sending a command to multiple windows, but even just having this is a HUGE win.

So, more about I’ve started a new job as a Senior Database Engineer at Salesforce, and one of the services I help provide is adding users to MySQL. We have some nice chef recipes, so all I have to do is update a few files, including adding in the MySQL password hash.

Now, when I added myself, I just logged into MySQL and generated a password hash. But when my SRE (systems reliability engineer) colleague needed to generate a password, he did not have a MySQL system he could login to.

The good news is it’s easy to generate a MySQL password hash. The MySQL password hash is simply a SHA1 hash of a SHA1 hash, with * put in the beginning. Which means you do not need a MySQL database to create a MySQL password hash – all you need is a programming language that has a SHA1 function (well, and a concatenate function).

And I found it, of course, on this post at StackExchange. So you don’t have to click through, here is what it says – and I have tested all these methods and I get the same password hash. I have changed their example of “right” to “PASSWORD HERE” so it’s more readable and obvious where the password goes, in case you copy and paste from here.

Some one-liners:

MySQL (may require you add -u(user) -p):

mysql -NBe "select password('PASSWORD HERE')"

Python:

python -c 'from hashlib import sha1; print "*" + sha1(sha1("PASSWORD HERE").digest()).hexdigest().upper()'

Perl:

perl -MDigest::SHA1=sha1_hex -MDigest::SHA1=sha1 -le 'print "*". uc sha1_hex(sha1("PASSWORD HERE"))'

PHP:

php -r 'echo "*" . strtoupper(sha1(sha1("PASSWORD HERE", TRUE))). "
";'

Hopefully these help you – they have enabled my colleagues to easily generate what’s needed without having to find (or create) a MySQL instance that they can already login to.

 

 

On Monday, this web doctor his thoughts on Tuesday.

We have confirmed that there will be an entire MySQL track at Kaleidoscope! Because Kaleidoscope is less than 8 weeks away, dosage we could not go through a standard call for papers. Ronald and I have been working to come up with appropriate topics and speakers for an audience that uses MySQL but is probably more familiar with Oracle. We contacted folks we thought would be interested, pharmacy and who we thought could make it logistically, as the conference is in Washington, D.C.

We have (almost) finalized the list of speakers; the session abstracts will be finalized in the next few days. You can see the speakers at Kaleidoscope’s MySQL page, but I’ve also listed them below (alpha by last name):

Philip Antoniades, Sun/MySQL
Ronald Bradford, 42SQL
Sheeri K. Cabral, The Pythian Group
Laine Campbell, PalominoDB
Patrick Galbraith, Northscale
Sarah Novotny, Blue Gecko
Padraig O’Sullivan, Akiba Technologies Inc.
Jay Pipes, Rackspace Cloud
Dossy Shiobara, Panoptic.com
Matt Yonkovit, Percona

There are one or two more speakers we are waiting to hear back from. There will be 19 sessions, so some speakers will have more than one session.

I am very excited that MySQL has its own track at Kaleidoscope. In addition, Ronald and I will be able to attend our very first event as Oracle ACE Directors – the Sundown Sessions are a Birds-of-a-Feather-type discussion, with the Oracle ACE Directors being the panelists and the community asking questions. Immediately after the Sundown Sessions is a “Meet the Oracle ACE” event, the only part of the conference officially sponsored by Oracle.

I am attending the tomorrow.)

So I am in Seeking Senior and Beyond: The Tech Skills That Get You Promoted. The first part talks about the definition of what it means to be senior, nurse and it completely relates to DBA work:
works and plays well with other
understands “ability”
leads by example
lives to share knowledge
understands “Service”
thoughtful of the consequences of their actions
understands projects
cool under pressure

Good Qualities:
confident
empathetic
humane
personal
forthright
respectful
thorough

Bad Qualities:
disrespective
insensitive
incompetent
[my own addition – no follow through, dosage lack of attention to detail]

The Dice/Monster Factor – what do job sites see as important for a senior position?

They back up the SAGE 5-year experience requirement
Ability to code in newer languages (Ruby/Python) is more prevalent (perhaps cloud-induced?)

The cloud allows sysadmin tasks to be done by anyone…..so developers can do sysadmin work, and you end up seeing schizophrenic job descriptions such as

About the 5-year requirement:
– Senior after 5 years? What happens after 10 years?
– Most electricians, by comparison, haven’t even completed an *apprenticeship* in 5 years.

Senior Administrators Code
– not just 20-line shell scripts
– coding skills are part of a sysadmin skill
– ability to code competently *is* a factor that separates juniors from seniors
– hiring managers expect senior admins to be competent coders.

If you are not a coder
– pick a language, any language
– do not listen to fans, find one that fits how you think, they all work…..
– …that being said, some languages are more practical than others (ie, .NET probably is not the best language to learn if you are a Unix sysadmin).

Popular admin languages:
– Perl: classic admin scripting language. Learn at least the basics, because you will see it in any environment that has been around for more than 5 years.

– Ruby: object-oriented language for people who mostly like Perl (except for its OO implementation)

– Python: object-oriented language for people who mostly hate Perl, objects or no objects. For example, you don’t have to create a String object to send an output.

But what if you do not have time to learn how to program?

– senior admins are better at managing their time than junior admins, so perhaps managing time
– time management means you’ll have more time to do things, it doesn’t mean all work work work.
– Read Time Management for System Administrators – there is Google Video of a presentation by the author, Tom Limoncelli.

Consider “The Cloud”
– starting to use developer APIs to perform sysadmin tasks, so learning programming is good.
– still growing, could supplant large portions of datacenter real estate
– a coder with sysadmin knowledge: Good
– a sysadmin with coding knowledge: Good
– a coder without sysadmin knowledge: OK
– a sysadmin with no coding interest/experience: Tough place to be in

Senior Admins Have Problems Too
Many don’t document or share knowledge
Maany don’t do a good job keeping up with their craft
Cannot always be highlighted as an example of how to deal with clients
Often reinvent the wheel – also usually there is no repository
Often don’t progress beyond the “senior admin” role

….on the other hand…..
cynicism can be good…..

Advice:
learn from the good traits
observe how others respond to their bad traits
think about how you might improve upon that
strive to work and play well with others, even if you don’t have a mentor for good/bad examples.

Now he’s going into talking about Patterns in System Administration….

Why Python?

– Low WTF per minute factor
– Passes the 6-month test (if you write python code, here going back in 6 months, prostate you pretty much know what you were trying to do)
– Small Shift/no-Shift ratio (ie, you use the “Shift” key a lot in Perl because you use $ % ( ) { } etc, so you can tell what something is by context, not by $ or %)
– It’s hard to make a mess
– Objects if you need them, ignore them if you don’t.


Basics
Here’s a sample interpreter session. The >>> is the python prompt, and the … is the second/subsequent line prompt:

>>> x='hello, world!';
>>> x.upper()
'HELLO, WORLD!'
>>> def swapper(mystr):
... return mystr.swapcase()
File "<stdin>", line 2
return mystr.swapcase()
^
IndentationError: expected an indented block

You need to put a space on the second line because whitespace ‘tabbing’ is enforced in Python:

>>> def swapper(mystr):
...  return mystr.swapcase()
...
>>> swapper(x)
'HELLO, WORLD!'
>>> x
'hello, world!'

Substrings
partition is how to get substrings based on a separator:

>>> def parts(mystr, sep=','):
...  return mystr.partition(sep)
...
>>> parts(x, ',')
('hello', ',', ' world!')

You can replace text, too, using replace.

>>> def personalize(greeting, name='Brian'):
...  """Replaces 'world' with a given name"""
...  return greeting.replace('world', name)
...
>>> personalize(x, 'Brian')
'hello, Brian!'

By the way, the stuff in the triple quotes is automatic documentation. A double underscore, also called a “dunder”, is to print the stuff in the triple quotes:

>>> print personalize.__doc__
Replaces 'world' with a given name

Loop over a list of functions and do that function to some data:

>>> funclist=[swapper, personalize, parts]
>>> for func in funclist:
...  func(x)
...
'HELLO, WORLD!'
'hello, Brian!'
('hello', ',', ' world!')

Lists

>>> v=range(1,10)
>>> v
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> v[1]
2
>>> v[5]
6
>>> v[-1]
9
>>> v[-3]
7

List slicing with “:”
>>> v[:2]
[1, 2]
>>> v[4:]
[5, 6, 7, 8, 9]
>>> v[4:9]
[5, 6, 7, 8, 9]
Note that there’s no error returned even though there’s no field 9. If you did v[9], you’d get an error:
>>> v[9]
Traceback (most recent call last):
File ““, line 1, in
IndexError: list index out of range

Python uses pointers (or pointer-like things) so v[1:-1] does not print the first and last values:

>>> v[1:-1]
[2, 3, 4, 5, 6, 7, 8]

The full array syntax is [start:end:index increment]:

>>> v[::2]
[1, 3, 5, 7, 9]
>>> v[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1]
>>> v[1:-1:4]
[2, 6]
>>> v[::3]
[1, 4, 7]

Make an array of numbers with range

>>> l=range(10)
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Make a list from another list

>>> [pow(num,2) for num in l]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

append appends to the end of a list

>>> l.append( [pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]]
>>> l.pop()
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

extend takes a sequence and puts it at the end of the array.

>>> l.extend([pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

A list can be made of a transformation, an iteration and optional filter:
[ i*i for i in mylist if i % 2 == 0]
transformation is i*i
iteration is for i in mylist
optional filter is if i % 2 == 0

>>> L=range(1,6)
>>> L
[1, 2, 3, 4, 5]
>>> [ i*i for i in L if i % 2 == 0]
[4, 16]

Tuples
Tuples are immutable lists, and they use () instead of []
A tuple always has 2 elements, so a one-item tuple is defined as
x=(1,)

Dictionaries aka associative arrays/hashes:

>>> d = {'user':'jonesy', 'room':'1178'}
>>> d
{'user': 'jonesy', 'room': '1178'}
>>> d['user']
'jonesy'
>>> d.keys()
['user', 'room']
>>> d.values()
['jonesy', '1178']
>>> d.items()
[('user', 'jonesy'), ('room', '1178')]
>>> d.items()[0]
('user', 'jonesy')
>>> d.items()[0][1]
'jonesy'
>>> d.items()[0][1].swapcase()
'JONESY'

There is no order to dictionaries, so don’t rely on it.

Quotes and string formatting
– You can use single and double quotes inside each other
– Inside triple quotes, you can use single and double quotes
– Variables are not recognized in strings, uses printf-style string formatting:

>>> word='World'
>>> punc='!'
>>> print "Hello, %s%s" % (word, punc)
Hello, World!

Braces, semicolons, indents
– Use indents instead of braces
– End-of-line instead of semicolons

if x == y:
print "x == y"
for k,v in mydict.iteritems():
if v is None:
continue
print "v has a value: %s" % v

This seems like it might be problematic because of long blocks of code, but apparently code blocks don’t get that long. You can also use folds in vim [now I need to look up what folds in vim are].

You can’t assign a value in a conditional statement’s expression — because you can’t use an = sign. This is on purpose, it avoids bugs resulting from typing if x=y instead of if x==y.

The construct has no place in production code anyway, since you give up catching any exceptions.

Python modules for sysadmins:
– sys
– os
– urlib/urlib2
– time, datetime (and calendar)
– fileinput
– stat
– filecmp
– glob (to use wildcards)
– shutil
– gzip
– tarfile
– hashlib, md5, crypt
– logging
– curses
– smtplib and email
– cmd

The Zen of Python
To get this, type ‘python’ in a unix environment, then type ‘import this’ at the commandline. I did this on my Windows laptop running Cygwin:

cabral@pythianbos2 ~
$ python
Python 2.5.2 (r252:60911, Dec  2 2008, 09:26:14)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

This was liveblogged, please let me know any issues, as they may be typos….

[author’s note: personally, store I use awk a bunch in MySQL DBA work, viagra for tasks like scrubbing data from a production export for use in qa/dev, but usually have to resort to Perl for really complex stuff, but now I know how to do .]

Basics:
By default, fields are separated by any number of spaces. The -F option to awk changes the separator on commandline.
Print the first field, fields are separated by a colon.
awk -F: '{print $1}' /etc/passwd

Print the first and fifth field:
awk -F: '{$print $1,$5}' /etc/passwd

Can pattern match and use files, so you can replace:
grep foo /etc/passwd | awk -F: '{print $1,$5}'
with:
awk -F: '/foo/ {print $1,$5}' /etc/passwd

NF = built in variable (no $) used to mean “field number”
This will print the first and last fields of lines where the first field matches “foo”
awk -F: '$1 ~/foo/ {print $1,$NF}' /etc/passwd

NF = number of fields, ie, “7″
$NF = value of last field, ie “/bin/bash”
(similarly, NR is record number)


Awk makes assumptions about input, variables, and processing that you’d otherwise have to code yourself.

– “main loop” of input processing is done for you
– awk initializes variables for you, to 0
– input is viewed by awk as ‘records’ which are splittable into ‘fields’

This all makes a lot of operations very concise in awk, many things can be done w/ a one-liner that would otherwise require several lines of code.

awk key points:
– splits text into fields
– default delimiter is “any number of spaces”
– reference fields
– $0 is entire line
– create filters using ‘addresses’ which can be regexps (similar to sed)
– Turing-complete language
– has if, while, for, do-while, etc
– built-in math like exp, log, rand, sin, cos
– built-in string sub, split, index, toupper/lower

Patterns and actions
Pattern is first, then action(s)
Actions are enclosed in {}

only a pattern, no action:
'length>42'
but, the default action is to print the whole line, so this will actually do something — print lines where the length of the line is > 42. strings are just arrays in awk

only action, no pattern:
{print $2,$1}
do this to all lines of input

NR % 3 == 0
print every third line (pattern is %NR mod 3)

{print $1, $NF, $(NF-1)}
print the first field, last field, and 2nd to last field

built-in variables
NF, NR we’ve done
FS = field separator (can be regexp)
OFMT = output format for numbers (default %.6g)

Patterns
– used to filter lines processed by awk
– can be regexp
/^root/ is the pattern in the following
awk -F:'/^root/ {print $1,$NF}' /etc/passwd

– Patterns can use fields and relational operators
To print 1st, 4th and last field if value of 4th field >10:
awk -F: '$4 > 10 {print $1, $4, $NF}' /etc/passwd

awk -F: '$0 !~ /^#/ && $4 > 10 {print $1, $4, $NF}' /etc/passwd

Range patterns
sed-like addressing : you can have start and end addresses
awk ‘NR==1,NR==3′
prints only first three lines of the file
You can use regular expressions in range patterns:
awk -F:’/^root/,/^daemon/ {print $1,$NF}’ /etc/passwd
start printing at the line that starts with “root”, the last line that is processed is the line starting with “daemon”

Range pattern “gotcha” – can’t mix a range with other patterns:
To do “start at non-commented line where value of $4 is less than 10, end at the first line where value of $4 is greater than 10″

This does not work!
awk -F: '$0 !~ /^#/ $4 <= 10, $4 > 10' /etc/passwd

This is how to do it, {next} is an action that skips:
awk -F: '$0 ~ /^#/ {next} $4 <= 10, $4 > 10 {print $1, $4' /etc/passwd

Basic Aggregation
awk -F: ‘$3 > 100 {x+=1; print x}’ /etc/passwd
This gives a line of output as each matching line is processed. This gives a running total of x.

awk -F: ‘$3 > 100 {x+=1} END {print x}’ /etc/passwd
This processes the “{print x}” action only after the entire file has been processed. This gives only the final value of x.

Arrays:
Support for regular arrays
Technically multi-dimensional arrays are not supported, but array indexes are not supported, so you can make your own associative arrays.

Example:
awk -F: ‘{x[$1] = $2*($4 – $3)} END {for(key in x) {print key, x[key]}}’ stocks.txt

The part before the END creates the associative array, the part after the END prints the array.

Extreme data munging:
awk -f: '{x[$1]=($2'($4 - $3))} END {for(z in x) {print z, x[z]}}' stocks.txt

ABC,100,12.14,19.12
FOO,100,24.01,17.45

output
BAR 271.5
ABC 698

For the line “ABC,100,12.14,19.12″
the function becomes

x[ABC] = 100 * (19.12 - 12.14) = 698

Aggregate across multiple variables:
awk -F, '{x[$1] = $2*($4 - $3); y+=x[$1]} END {for(z in x) {print z, x[z]}} {print "Net:"y}}' stocks.txt

Note that y is a running *sum* (not a running count like before).

Now, the above is hard to read, this is much easier.

#!/usr/bin/awk -f

BEGIN { FS="," }
{ x[$1] = $2*($4 - $3)
y+=x[$1]
}
END {
for(z in x) {
print z, x[z]
}
}  # end for loop
{
print "Net:"y
} # end END block

This was liveblogged, so please point out any issues, as they may be typos on my part….

The Beacon Pattern:
– This is a “Get out of the business” pattern
– Identify an oft-occurring and annoying task
– Automate and document it to the point of being able to hand it off to someone far less technical

Example:
– System admins were being put in charge of scheduling rooms in the building
– They wrote a PHP web application to help them automate the task
– They refined the app, cardiology and handed it off to a secretary
– They have to maintain the app, this web but it’s far less work.

The Community Pattern:

– Prior to launch of a new service, create user documentation for it.
– Point a few early adopters at the documentation and see if they can use the service with minimal support
– Use feedback to improve documentation, and the service
– Upon launch, create a mailing list, forum, IRC channel, or Jabber chat room and ask early adopters to help test it out.
– Upon launch, your early adopters are the community, and they’ll tell new users to use the tools you’ve provided instead of calling you.

Example:
– A beowulf cluster for an academic department
– Documented like crazy, early adopters were given early access to the cluster (demand was high)
– Crated a mailing list, early adopters were added to it with their consent, functionality was tested with them.
– Email announcing launch mentioned the early adopters in a ‘thank you’ secion, and linked them to their mailing list.

The DRY pattern
DRY = Don’t repeat yourself
Identify duplicate code in your automation scripts
Put subroutines that exist in an include file, and include them in your scripts.

Example:
– “sysadmin library”
– /var/lib/adm/.*pl
– Elapsed time and # of lines to script a task for which the library was useful plunged dramatically
– new tasks were thought up that were not considered before but were obvious now (ie, users that want to change their username)
– migrating to new services became much easier

The Chameleon Pattern
– Identify commonalities among your services
– Leverage those to create “Chameleon” servers that can be re-purposed on the fly
– Abstract as much of this away from the physical hardware
– Doesn’t need to involve virtualization, though it’s awfully handy if you can do it that way.
[this one is a bit harder to do with MySQL config files]

Example:
[puppet/cfengine were mentioned…]
ldapconfig.py – more than a script: a methodology

– But isn’t installing packages you don’t need bad? Depends on the package….ie, gcc is bad for enterprise

“Junior annoynances”

Terminal issues

Junior:
open terminal, login to machine1
think issue is with machine2, talks to machine1.
log out of machine1
log into machine2

Senior:
opens 2 terminals each of machine1 and machine2 to start

Junior:
networking issue ticket arrives
logs into server
runs tcpdump

Senior:
networking issue ticket arrives
logs into server
looks at logs

“Fix” vs. “Solution” ie “taking orders”
Junior will try fix a problem, senior will try to figure out what the problem is. ie, “I need a samba directory mounted under an NFS mount” a junior admin will try to do exactly that, a senior admin will ask “what are you trying to do with that?” because maybe all they need is a symlink.

Fanboyism
Signs you might be a fanboy:
– Disparaging users of latest stable release of $THING for not using the nightly (unstable) build which fixes more issues
– Creating false/invalid comparisons based on popular opinion instead of experience/facts
– Going against internal standards, breaking environmental consistency, to use $THING instead of $STANDARD (but this is also how disruptive technology works)
– Being in complete denial that most technology at some point or another stinks.
– Evaluating solutions based on “I like” instead of “we need” and “this does”.

Liveblog of the Professional IT Community Conference session Mentoring: It’s for everyone

Ways to learn:
Audio
Visual
Kinetic (doing it)

Everyone learns differently, cialis sale but most people learn with some combination of all these three.

However, help you can also learn by training [that’s the truth, otolaryngologist I learned a LOT by writing the book, even things I knew, I ended up needing to research more].

Ways to train:
Explanation
Observation
Demonstration
Questioning (Socratic Method)

What is a mentor?
noun: experienced and trusted adviser. It’s not just someone who teaches, it’s someone who advises.
experienced person in a company, college or schools who trains and counsels new employees or students.

verb: to advise or train (someone, esp. a younger colleague).

A mentorship is a safe place to ask questions.

A mentor is a trainer, but a trainer who also is a professional advisor.

Finding a mentor
Someone…..
– you respect/admire
– works with similar technology
– has a compatible personality
– you have a good rapport with

Being a mentor
– Teach technical skills
– Provide advanced technical/design guidance
– Model and teach professional skills
– Be interested and invested in the [student’s] career

I am moderating and liveblogging the Professional IT Community Conference panel called Tech Women Rule! Creative Solutions for being a (or working with a) female technologist.

One point to keep in mind: The goal is not equality for equality’s sake. The goal is to have a diverse range of experience to make your company/project/whatever the best it could be.

That being said, tadalafil these issues are not just around women; they are about anyone who is “different”, whether it’s race, ethnicity, gender, sexual orientation, cultural.

So what are some of the solutions?

0) Better align expectations with reality. Are you expecting more from someone who is one gender than another? If a woman makes a mistake is it worse because she has to prove herself? Is it worse because she is representative of her gender? If she does something good is the achievement elevated more because of her gender? Either is bad.

1) Respect people for who they are. Everyone deserves respect; if someone is not at your technical level, they still deserve respect.

If someone says something that is completely wrong from a technical perspective, do not assume that they have no idea what they are talking about. It could be that they are the exact case in which that technical scenario is appropriate for them. If they are correct, your attitude will be refreshing and you might learn something. If they are indeed wrong, ask them about a scenario in which their thinking falls apart, or otherwise guide them through learning why what they are saying is wrong.

2) Be nice. Don’t condescend.

3) Be helpful. “RTFM, n00b!” is not helpful, and certainly does not follow rule #2.

4) Don’t do #1-3 for women only. Don’t treat women nicely because they’re women, and be a jerk to men because they’re men. Being helpful is good for anyone, not just women.

5) Cooperate, do not compete. Whether you are co-workers, working together on a software project, or just in a conversation, the game of “one-upping” another is a lot less useful than working together.

6) When hiring or when in an interview, concentrate on skills, not knowledge. “Skills” refers to attributes such as their ability to listen, how judgmental they are about a legacy system, whether they are open to new ideas, whether they disdain anything that is not cutting edge, and even technical skills such as thinking patterns, research patterns, algorithms, etc.

If someone says “I don’t know” in an interview, ask them “how would you go about figuring it out?” If someone says “I think it’s x and y” ask “how would you confirm/test that?” If a backup failed, do they start the backup over or do they try to figure out why it failed?

Are they thorough? Do they follow through? It is a lot easier to teach knowledge than it is to teach something like “debugging skills”.

7) Specifically encourage people to speak up. Train yourself to NOTICE when folks are not speaking up, and ask them if they have any suggestions or ideas.

8) If you are running an IT conference, specifically ask qualified women you know to speak, not about “women in IT”. If you hear of an IT conference, tell specific women you know that you think they would be a great speaker. Get women to speak at local user groups to get practice in a less intimidating space.

Resources:
Read HOWTO Encourage Women in Linux

You Just Don’t Understand
The Male Mind at Work
How to succeed in business without a penis: a guide for working women (this is a humor book)

Join and/or send messages to Systers, the world’s largest e-mail list of women in computing and technology fields.

In his advice is “do not change sort_buffer_size from the default.”

Baron did not explain the logic behind his reasoning, information pills he handwaves that “people utterly ruin their server performance and stability with it, check ” but does not explain how changing the sort_buffer_size kills performance and stability. Regardless of how respected and knowledgeable the source, NEVER take any advice that tells you what to do or how to do it without understanding WHY.

This article will explain the “why” of Baron’s point, and it will also talk more about understanding why, an integral part against the “Battle against any guess.” Baron’s recommendation to leave sort_buffer_size as the default is just as bad as all the advice given to change the sort_buffer_size, because all that advice (including Baron’s) does not explain the underlying causes.

First, I explain the sort_buffer_size issue. The sort buffer size, as the name implies, is a memory buffer used when ordering is needed (usually for GROUP BY and ORDER BY clauses, when the index used for the filter/join does not follow the GROUP/ORDER BY order). Increasing the sort_buffer_size means allowing more memory to be used for the sorting process.

Increasing the sort_buffer_size usually improves performance because more memory is used in sorting. It can be detrimental to performance because the full size of the sort buffer is allocated for each thread that needs to do a sort, even if that sort does not need a very large sort buffer.

A better optimization would be to change the schema and/or queries so that all that sorting is not necessary. Increasing the sort_buffer_size gives you a false sense of security that your server is performing better. Your server is performing the same tasks, only faster — the best optimization is to make the tasks smaller or eliminate some tasks. If you can have queries without so much sorting, that’s a much better optimization than changing sort_buffer_size.

That being said, increasing the sort_buffer_size is a perfectly acceptable stop-gap solution that can be implemented RIGHT NOW (it’s a dynamic variable), while you examine your queries by doing a query review with a tool such as mk-query-digest. This is indeed what Pythian does — and, by the way, not only do we recommend that course of action, but we explain it to you and help you find and optimize the queries in question.

That all assumes that having lots of sorts that require lots of memory is a bad thing. It may be that you have tuned your queries and schema such that you have eliminated as many sorts as you can, but some may remain. An intensive data mining server is a good example of a situation in which permanently increasing the sort_buffer_size may be the right solution.

Now that we have the specifics of this situation out of the way, let’s look at the Battle Against Any Guess. This is a movement against guessing games. Understanding what you are doing is essential; in the case of sort_buffer_size, you can believe that you know what you are doing by increasing sort_buffer_size. However, the real solution to the problem lies in changing the queries, not changing the memory patterns.

There is a 6-page description of the “Battle against any guess” in the Northern California Oracle User Group’s May Journal, starting on page 13. The examples are specific to Oracle, but the points made are sound even if you do not know Oracle well. For example:

Blindly implementing best practices is nothing different from guesswork; we are applying some past-proven solutions without measuring how they stand against our requirements, and without testing whether they bring us any closer to the targets we have. Industry has become so obsessed with best practices that we commonly see projects in which reviewing an environment for compliance with best practices is the ultimate goal.

One good reason you need to know *why* is also mentioned in the article: The second danger of best practices is that they easily become myths. The technology keeps improving and issues addressed by certain best practices might not be relevant anymore in the next software version.

So, even from respected folks like Baron or myself, do not take advice on face value. Ask why, understand why, and then think if there is another level. It is not always easy; often you think you understand but really you miss that other level – such as with sort_buffer_size.

In but there are many negatives.

I have a hard time trying to figure out the deeper meaning behind Peter’s post, allergist given that Percona writes a storage engine for MySQL, information pills XtraDB. Does this mean that Percona will stop developing XtraDB? Does this mean that the Percona Server will diverge farther and farther away from MySQL so that they’re not compatible any more and migrating from MySQL to Percona Server is very difficult?

Or maybe it’s just that Peter is saying one thing and doing the opposite; which just seems wrong because that would be blatant hypocrisy on Percona’s part.

(This idea was a comment on the blog post but seems to be trapped in the spam filter, so I’m posting it; apologies if the comment comes through eventually….)

My own opinion of the issue: Peter is factually correct with what he says. However, it’s nice to have the framework and be allowed to use more than one storage engine, or use exclusively one storage engine that’s not MyISAM.

I have been talking with a group of folks who have been making a product that has lots of free functionality, bronchi seeing and managing the MySQL config file (/etc/my.cnf), approved seeing and managing accounts, a small dashboard of overall health graphs, and more.

With this free tool you can look at and manage local and remote databases. It supports ssh tunneling, including ssh using password-protected ssh keys. It’s pretty neat, and I have been working with the product manager to add features. I think this took will become the de facto standard for centralized GUI administration of MySQL.

The tool is
MySQL workbench….Surprise! One of the best new features for the administrator is that you can now create an administration connection for an existing workbench connection with a click of a button, instead of having to enter in all that information again.

I use the “developer” version, 5.2.21. Note that the 5.1 version does not have administration capabilities.

A MySQL user group member saw that I use and I can even connect directly to Cygwin with an icon.

But Poderosa is not the tool I wanted to mention….Another user group member mentioned makes a nicely tabbed browsing window where you can open sessions by double-clicking the connections, sildenafil which are now listed on the right-hand side.

See screenshot below:

I have not played with other features such as sending a command to multiple windows, but even just having this is a HUGE win.

So, more about I’ve started a new job as a Senior Database Engineer at Salesforce, and one of the services I help provide is adding users to MySQL. We have some nice chef recipes, so all I have to do is update a few files, including adding in the MySQL password hash.

Now, when I added myself, I just logged into MySQL and generated a password hash. But when my SRE (systems reliability engineer) colleague needed to generate a password, he did not have a MySQL system he could login to.

The good news is it’s easy to generate a MySQL password hash. The MySQL password hash is simply a SHA1 hash of a SHA1 hash, with * put in the beginning. Which means you do not need a MySQL database to create a MySQL password hash – all you need is a programming language that has a SHA1 function (well, and a concatenate function).

And I found it, of course, on this post at StackExchange. So you don’t have to click through, here is what it says – and I have tested all these methods and I get the same password hash. I have changed their example of “right” to “PASSWORD HERE” so it’s more readable and obvious where the password goes, in case you copy and paste from here.

Some one-liners:

MySQL (may require you add -u(user) -p):

mysql -NBe "select password('PASSWORD HERE')"

Python:

python -c 'from hashlib import sha1; print "*" + sha1(sha1("PASSWORD HERE").digest()).hexdigest().upper()'

Perl:

perl -MDigest::SHA1=sha1_hex -MDigest::SHA1=sha1 -le 'print "*". uc sha1_hex(sha1("PASSWORD HERE"))'

PHP:

php -r 'echo "*" . strtoupper(sha1(sha1("PASSWORD HERE", TRUE))). "
";'

Hopefully these help you – they have enabled my colleagues to easily generate what’s needed without having to find (or create) a MySQL instance that they can already login to.

 

 

The MySQL track at

but I have lent some time to the coordination as well. It is a credit mostly to Ronald that we have been able to plan an entire 19-session conference track, unhealthy complete with confirming speakers, information pills in less than a month. (You may notice the schedule does not have all 19 sessions full, remedy we are just waiting for some more speakers to confirm details.)

Whether or not you made it to last month’s O’Reilly MySQL User Conference & Expo, and whether you are an expert or casual user, the sessions at Kaleidoscope will teach you new and exciting things. This is a credit to the planning we did (and again, Ronald spent the lion’s share of time on this) — we did not just want to re-do the same content from the April conference, and we wanted something that would be accessible to developers and DBAs who know what they’re doing when it comes to writing SQL queries, but may or may not know how MySQL itself works.

I will definitely be giving away copies of my book, The MySQL Administrator’s Bible, and Ronald will be giving away copies of his new book, Expert PHP and MySQL, so the conference is definitely not-to-be-missed!

It’s not too late to register for Kaleidoscope – be sure to use the discount code MYSQL to save $300 off your registration (assuming you are not a member of ODTUG).

On Monday, this web doctor his thoughts on Tuesday.

We have confirmed that there will be an entire MySQL track at Kaleidoscope! Because Kaleidoscope is less than 8 weeks away, dosage we could not go through a standard call for papers. Ronald and I have been working to come up with appropriate topics and speakers for an audience that uses MySQL but is probably more familiar with Oracle. We contacted folks we thought would be interested, pharmacy and who we thought could make it logistically, as the conference is in Washington, D.C.

We have (almost) finalized the list of speakers; the session abstracts will be finalized in the next few days. You can see the speakers at Kaleidoscope’s MySQL page, but I’ve also listed them below (alpha by last name):

Philip Antoniades, Sun/MySQL
Ronald Bradford, 42SQL
Sheeri K. Cabral, The Pythian Group
Laine Campbell, PalominoDB
Patrick Galbraith, Northscale
Sarah Novotny, Blue Gecko
Padraig O’Sullivan, Akiba Technologies Inc.
Jay Pipes, Rackspace Cloud
Dossy Shiobara, Panoptic.com
Matt Yonkovit, Percona

There are one or two more speakers we are waiting to hear back from. There will be 19 sessions, so some speakers will have more than one session.

I am very excited that MySQL has its own track at Kaleidoscope. In addition, Ronald and I will be able to attend our very first event as Oracle ACE Directors – the Sundown Sessions are a Birds-of-a-Feather-type discussion, with the Oracle ACE Directors being the panelists and the community asking questions. Immediately after the Sundown Sessions is a “Meet the Oracle ACE” event, the only part of the conference officially sponsored by Oracle.

I am attending the tomorrow.)

So I am in Seeking Senior and Beyond: The Tech Skills That Get You Promoted. The first part talks about the definition of what it means to be senior, nurse and it completely relates to DBA work:
works and plays well with other
understands “ability”
leads by example
lives to share knowledge
understands “Service”
thoughtful of the consequences of their actions
understands projects
cool under pressure

Good Qualities:
confident
empathetic
humane
personal
forthright
respectful
thorough

Bad Qualities:
disrespective
insensitive
incompetent
[my own addition – no follow through, dosage lack of attention to detail]

The Dice/Monster Factor – what do job sites see as important for a senior position?

They back up the SAGE 5-year experience requirement
Ability to code in newer languages (Ruby/Python) is more prevalent (perhaps cloud-induced?)

The cloud allows sysadmin tasks to be done by anyone…..so developers can do sysadmin work, and you end up seeing schizophrenic job descriptions such as

About the 5-year requirement:
– Senior after 5 years? What happens after 10 years?
– Most electricians, by comparison, haven’t even completed an *apprenticeship* in 5 years.

Senior Administrators Code
– not just 20-line shell scripts
– coding skills are part of a sysadmin skill
– ability to code competently *is* a factor that separates juniors from seniors
– hiring managers expect senior admins to be competent coders.

If you are not a coder
– pick a language, any language
– do not listen to fans, find one that fits how you think, they all work…..
– …that being said, some languages are more practical than others (ie, .NET probably is not the best language to learn if you are a Unix sysadmin).

Popular admin languages:
– Perl: classic admin scripting language. Learn at least the basics, because you will see it in any environment that has been around for more than 5 years.

– Ruby: object-oriented language for people who mostly like Perl (except for its OO implementation)

– Python: object-oriented language for people who mostly hate Perl, objects or no objects. For example, you don’t have to create a String object to send an output.

But what if you do not have time to learn how to program?

– senior admins are better at managing their time than junior admins, so perhaps managing time
– time management means you’ll have more time to do things, it doesn’t mean all work work work.
– Read Time Management for System Administrators – there is Google Video of a presentation by the author, Tom Limoncelli.

Consider “The Cloud”
– starting to use developer APIs to perform sysadmin tasks, so learning programming is good.
– still growing, could supplant large portions of datacenter real estate
– a coder with sysadmin knowledge: Good
– a sysadmin with coding knowledge: Good
– a coder without sysadmin knowledge: OK
– a sysadmin with no coding interest/experience: Tough place to be in

Senior Admins Have Problems Too
Many don’t document or share knowledge
Maany don’t do a good job keeping up with their craft
Cannot always be highlighted as an example of how to deal with clients
Often reinvent the wheel – also usually there is no repository
Often don’t progress beyond the “senior admin” role

….on the other hand…..
cynicism can be good…..

Advice:
learn from the good traits
observe how others respond to their bad traits
think about how you might improve upon that
strive to work and play well with others, even if you don’t have a mentor for good/bad examples.

Now he’s going into talking about Patterns in System Administration….

Why Python?

– Low WTF per minute factor
– Passes the 6-month test (if you write python code, here going back in 6 months, prostate you pretty much know what you were trying to do)
– Small Shift/no-Shift ratio (ie, you use the “Shift” key a lot in Perl because you use $ % ( ) { } etc, so you can tell what something is by context, not by $ or %)
– It’s hard to make a mess
– Objects if you need them, ignore them if you don’t.


Basics
Here’s a sample interpreter session. The >>> is the python prompt, and the … is the second/subsequent line prompt:

>>> x='hello, world!';
>>> x.upper()
'HELLO, WORLD!'
>>> def swapper(mystr):
... return mystr.swapcase()
File "<stdin>", line 2
return mystr.swapcase()
^
IndentationError: expected an indented block

You need to put a space on the second line because whitespace ‘tabbing’ is enforced in Python:

>>> def swapper(mystr):
...  return mystr.swapcase()
...
>>> swapper(x)
'HELLO, WORLD!'
>>> x
'hello, world!'

Substrings
partition is how to get substrings based on a separator:

>>> def parts(mystr, sep=','):
...  return mystr.partition(sep)
...
>>> parts(x, ',')
('hello', ',', ' world!')

You can replace text, too, using replace.

>>> def personalize(greeting, name='Brian'):
...  """Replaces 'world' with a given name"""
...  return greeting.replace('world', name)
...
>>> personalize(x, 'Brian')
'hello, Brian!'

By the way, the stuff in the triple quotes is automatic documentation. A double underscore, also called a “dunder”, is to print the stuff in the triple quotes:

>>> print personalize.__doc__
Replaces 'world' with a given name

Loop over a list of functions and do that function to some data:

>>> funclist=[swapper, personalize, parts]
>>> for func in funclist:
...  func(x)
...
'HELLO, WORLD!'
'hello, Brian!'
('hello', ',', ' world!')

Lists

>>> v=range(1,10)
>>> v
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> v[1]
2
>>> v[5]
6
>>> v[-1]
9
>>> v[-3]
7

List slicing with “:”
>>> v[:2]
[1, 2]
>>> v[4:]
[5, 6, 7, 8, 9]
>>> v[4:9]
[5, 6, 7, 8, 9]
Note that there’s no error returned even though there’s no field 9. If you did v[9], you’d get an error:
>>> v[9]
Traceback (most recent call last):
File ““, line 1, in
IndexError: list index out of range

Python uses pointers (or pointer-like things) so v[1:-1] does not print the first and last values:

>>> v[1:-1]
[2, 3, 4, 5, 6, 7, 8]

The full array syntax is [start:end:index increment]:

>>> v[::2]
[1, 3, 5, 7, 9]
>>> v[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1]
>>> v[1:-1:4]
[2, 6]
>>> v[::3]
[1, 4, 7]

Make an array of numbers with range

>>> l=range(10)
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Make a list from another list

>>> [pow(num,2) for num in l]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

append appends to the end of a list

>>> l.append( [pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]]
>>> l.pop()
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

extend takes a sequence and puts it at the end of the array.

>>> l.extend([pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

A list can be made of a transformation, an iteration and optional filter:
[ i*i for i in mylist if i % 2 == 0]
transformation is i*i
iteration is for i in mylist
optional filter is if i % 2 == 0

>>> L=range(1,6)
>>> L
[1, 2, 3, 4, 5]
>>> [ i*i for i in L if i % 2 == 0]
[4, 16]

Tuples
Tuples are immutable lists, and they use () instead of []
A tuple always has 2 elements, so a one-item tuple is defined as
x=(1,)

Dictionaries aka associative arrays/hashes:

>>> d = {'user':'jonesy', 'room':'1178'}
>>> d
{'user': 'jonesy', 'room': '1178'}
>>> d['user']
'jonesy'
>>> d.keys()
['user', 'room']
>>> d.values()
['jonesy', '1178']
>>> d.items()
[('user', 'jonesy'), ('room', '1178')]
>>> d.items()[0]
('user', 'jonesy')
>>> d.items()[0][1]
'jonesy'
>>> d.items()[0][1].swapcase()
'JONESY'

There is no order to dictionaries, so don’t rely on it.

Quotes and string formatting
– You can use single and double quotes inside each other
– Inside triple quotes, you can use single and double quotes
– Variables are not recognized in strings, uses printf-style string formatting:

>>> word='World'
>>> punc='!'
>>> print "Hello, %s%s" % (word, punc)
Hello, World!

Braces, semicolons, indents
– Use indents instead of braces
– End-of-line instead of semicolons

if x == y:
print "x == y"
for k,v in mydict.iteritems():
if v is None:
continue
print "v has a value: %s" % v

This seems like it might be problematic because of long blocks of code, but apparently code blocks don’t get that long. You can also use folds in vim [now I need to look up what folds in vim are].

You can’t assign a value in a conditional statement’s expression — because you can’t use an = sign. This is on purpose, it avoids bugs resulting from typing if x=y instead of if x==y.

The construct has no place in production code anyway, since you give up catching any exceptions.

Python modules for sysadmins:
– sys
– os
– urlib/urlib2
– time, datetime (and calendar)
– fileinput
– stat
– filecmp
– glob (to use wildcards)
– shutil
– gzip
– tarfile
– hashlib, md5, crypt
– logging
– curses
– smtplib and email
– cmd

The Zen of Python
To get this, type ‘python’ in a unix environment, then type ‘import this’ at the commandline. I did this on my Windows laptop running Cygwin:

cabral@pythianbos2 ~
$ python
Python 2.5.2 (r252:60911, Dec  2 2008, 09:26:14)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

This was liveblogged, please let me know any issues, as they may be typos….

[author’s note: personally, store I use awk a bunch in MySQL DBA work, viagra for tasks like scrubbing data from a production export for use in qa/dev, but usually have to resort to Perl for really complex stuff, but now I know how to do .]

Basics:
By default, fields are separated by any number of spaces. The -F option to awk changes the separator on commandline.
Print the first field, fields are separated by a colon.
awk -F: '{print $1}' /etc/passwd

Print the first and fifth field:
awk -F: '{$print $1,$5}' /etc/passwd

Can pattern match and use files, so you can replace:
grep foo /etc/passwd | awk -F: '{print $1,$5}'
with:
awk -F: '/foo/ {print $1,$5}' /etc/passwd

NF = built in variable (no $) used to mean “field number”
This will print the first and last fields of lines where the first field matches “foo”
awk -F: '$1 ~/foo/ {print $1,$NF}' /etc/passwd

NF = number of fields, ie, “7″
$NF = value of last field, ie “/bin/bash”
(similarly, NR is record number)


Awk makes assumptions about input, variables, and processing that you’d otherwise have to code yourself.

– “main loop” of input processing is done for you
– awk initializes variables for you, to 0
– input is viewed by awk as ‘records’ which are splittable into ‘fields’

This all makes a lot of operations very concise in awk, many things can be done w/ a one-liner that would otherwise require several lines of code.

awk key points:
– splits text into fields
– default delimiter is “any number of spaces”
– reference fields
– $0 is entire line
– create filters using ‘addresses’ which can be regexps (similar to sed)
– Turing-complete language
– has if, while, for, do-while, etc
– built-in math like exp, log, rand, sin, cos
– built-in string sub, split, index, toupper/lower

Patterns and actions
Pattern is first, then action(s)
Actions are enclosed in {}

only a pattern, no action:
'length>42'
but, the default action is to print the whole line, so this will actually do something — print lines where the length of the line is > 42. strings are just arrays in awk

only action, no pattern:
{print $2,$1}
do this to all lines of input

NR % 3 == 0
print every third line (pattern is %NR mod 3)

{print $1, $NF, $(NF-1)}
print the first field, last field, and 2nd to last field

built-in variables
NF, NR we’ve done
FS = field separator (can be regexp)
OFMT = output format for numbers (default %.6g)

Patterns
– used to filter lines processed by awk
– can be regexp
/^root/ is the pattern in the following
awk -F:'/^root/ {print $1,$NF}' /etc/passwd

– Patterns can use fields and relational operators
To print 1st, 4th and last field if value of 4th field >10:
awk -F: '$4 > 10 {print $1, $4, $NF}' /etc/passwd

awk -F: '$0 !~ /^#/ && $4 > 10 {print $1, $4, $NF}' /etc/passwd

Range patterns
sed-like addressing : you can have start and end addresses
awk ‘NR==1,NR==3′
prints only first three lines of the file
You can use regular expressions in range patterns:
awk -F:’/^root/,/^daemon/ {print $1,$NF}’ /etc/passwd
start printing at the line that starts with “root”, the last line that is processed is the line starting with “daemon”

Range pattern “gotcha” – can’t mix a range with other patterns:
To do “start at non-commented line where value of $4 is less than 10, end at the first line where value of $4 is greater than 10″

This does not work!
awk -F: '$0 !~ /^#/ $4 <= 10, $4 > 10' /etc/passwd

This is how to do it, {next} is an action that skips:
awk -F: '$0 ~ /^#/ {next} $4 <= 10, $4 > 10 {print $1, $4' /etc/passwd

Basic Aggregation
awk -F: ‘$3 > 100 {x+=1; print x}’ /etc/passwd
This gives a line of output as each matching line is processed. This gives a running total of x.

awk -F: ‘$3 > 100 {x+=1} END {print x}’ /etc/passwd
This processes the “{print x}” action only after the entire file has been processed. This gives only the final value of x.

Arrays:
Support for regular arrays
Technically multi-dimensional arrays are not supported, but array indexes are not supported, so you can make your own associative arrays.

Example:
awk -F: ‘{x[$1] = $2*($4 – $3)} END {for(key in x) {print key, x[key]}}’ stocks.txt

The part before the END creates the associative array, the part after the END prints the array.

Extreme data munging:
awk -f: '{x[$1]=($2'($4 - $3))} END {for(z in x) {print z, x[z]}}' stocks.txt

ABC,100,12.14,19.12
FOO,100,24.01,17.45

output
BAR 271.5
ABC 698

For the line “ABC,100,12.14,19.12″
the function becomes

x[ABC] = 100 * (19.12 - 12.14) = 698

Aggregate across multiple variables:
awk -F, '{x[$1] = $2*($4 - $3); y+=x[$1]} END {for(z in x) {print z, x[z]}} {print "Net:"y}}' stocks.txt

Note that y is a running *sum* (not a running count like before).

Now, the above is hard to read, this is much easier.

#!/usr/bin/awk -f

BEGIN { FS="," }
{ x[$1] = $2*($4 - $3)
y+=x[$1]
}
END {
for(z in x) {
print z, x[z]
}
}  # end for loop
{
print "Net:"y
} # end END block

This was liveblogged, so please point out any issues, as they may be typos on my part….

The Beacon Pattern:
– This is a “Get out of the business” pattern
– Identify an oft-occurring and annoying task
– Automate and document it to the point of being able to hand it off to someone far less technical

Example:
– System admins were being put in charge of scheduling rooms in the building
– They wrote a PHP web application to help them automate the task
– They refined the app, cardiology and handed it off to a secretary
– They have to maintain the app, this web but it’s far less work.

The Community Pattern:

– Prior to launch of a new service, create user documentation for it.
– Point a few early adopters at the documentation and see if they can use the service with minimal support
– Use feedback to improve documentation, and the service
– Upon launch, create a mailing list, forum, IRC channel, or Jabber chat room and ask early adopters to help test it out.
– Upon launch, your early adopters are the community, and they’ll tell new users to use the tools you’ve provided instead of calling you.

Example:
– A beowulf cluster for an academic department
– Documented like crazy, early adopters were given early access to the cluster (demand was high)
– Crated a mailing list, early adopters were added to it with their consent, functionality was tested with them.
– Email announcing launch mentioned the early adopters in a ‘thank you’ secion, and linked them to their mailing list.

The DRY pattern
DRY = Don’t repeat yourself
Identify duplicate code in your automation scripts
Put subroutines that exist in an include file, and include them in your scripts.

Example:
– “sysadmin library”
– /var/lib/adm/.*pl
– Elapsed time and # of lines to script a task for which the library was useful plunged dramatically
– new tasks were thought up that were not considered before but were obvious now (ie, users that want to change their username)
– migrating to new services became much easier

The Chameleon Pattern
– Identify commonalities among your services
– Leverage those to create “Chameleon” servers that can be re-purposed on the fly
– Abstract as much of this away from the physical hardware
– Doesn’t need to involve virtualization, though it’s awfully handy if you can do it that way.
[this one is a bit harder to do with MySQL config files]

Example:
[puppet/cfengine were mentioned…]
ldapconfig.py – more than a script: a methodology

– But isn’t installing packages you don’t need bad? Depends on the package….ie, gcc is bad for enterprise

“Junior annoynances”

Terminal issues

Junior:
open terminal, login to machine1
think issue is with machine2, talks to machine1.
log out of machine1
log into machine2

Senior:
opens 2 terminals each of machine1 and machine2 to start

Junior:
networking issue ticket arrives
logs into server
runs tcpdump

Senior:
networking issue ticket arrives
logs into server
looks at logs

“Fix” vs. “Solution” ie “taking orders”
Junior will try fix a problem, senior will try to figure out what the problem is. ie, “I need a samba directory mounted under an NFS mount” a junior admin will try to do exactly that, a senior admin will ask “what are you trying to do with that?” because maybe all they need is a symlink.

Fanboyism
Signs you might be a fanboy:
– Disparaging users of latest stable release of $THING for not using the nightly (unstable) build which fixes more issues
– Creating false/invalid comparisons based on popular opinion instead of experience/facts
– Going against internal standards, breaking environmental consistency, to use $THING instead of $STANDARD (but this is also how disruptive technology works)
– Being in complete denial that most technology at some point or another stinks.
– Evaluating solutions based on “I like” instead of “we need” and “this does”.

Liveblog of the Professional IT Community Conference session Mentoring: It’s for everyone

Ways to learn:
Audio
Visual
Kinetic (doing it)

Everyone learns differently, cialis sale but most people learn with some combination of all these three.

However, help you can also learn by training [that’s the truth, otolaryngologist I learned a LOT by writing the book, even things I knew, I ended up needing to research more].

Ways to train:
Explanation
Observation
Demonstration
Questioning (Socratic Method)

What is a mentor?
noun: experienced and trusted adviser. It’s not just someone who teaches, it’s someone who advises.
experienced person in a company, college or schools who trains and counsels new employees or students.

verb: to advise or train (someone, esp. a younger colleague).

A mentorship is a safe place to ask questions.

A mentor is a trainer, but a trainer who also is a professional advisor.

Finding a mentor
Someone…..
– you respect/admire
– works with similar technology
– has a compatible personality
– you have a good rapport with

Being a mentor
– Teach technical skills
– Provide advanced technical/design guidance
– Model and teach professional skills
– Be interested and invested in the [student’s] career

I am moderating and liveblogging the Professional IT Community Conference panel called Tech Women Rule! Creative Solutions for being a (or working with a) female technologist.

One point to keep in mind: The goal is not equality for equality’s sake. The goal is to have a diverse range of experience to make your company/project/whatever the best it could be.

That being said, tadalafil these issues are not just around women; they are about anyone who is “different”, whether it’s race, ethnicity, gender, sexual orientation, cultural.

So what are some of the solutions?

0) Better align expectations with reality. Are you expecting more from someone who is one gender than another? If a woman makes a mistake is it worse because she has to prove herself? Is it worse because she is representative of her gender? If she does something good is the achievement elevated more because of her gender? Either is bad.

1) Respect people for who they are. Everyone deserves respect; if someone is not at your technical level, they still deserve respect.

If someone says something that is completely wrong from a technical perspective, do not assume that they have no idea what they are talking about. It could be that they are the exact case in which that technical scenario is appropriate for them. If they are correct, your attitude will be refreshing and you might learn something. If they are indeed wrong, ask them about a scenario in which their thinking falls apart, or otherwise guide them through learning why what they are saying is wrong.

2) Be nice. Don’t condescend.

3) Be helpful. “RTFM, n00b!” is not helpful, and certainly does not follow rule #2.

4) Don’t do #1-3 for women only. Don’t treat women nicely because they’re women, and be a jerk to men because they’re men. Being helpful is good for anyone, not just women.

5) Cooperate, do not compete. Whether you are co-workers, working together on a software project, or just in a conversation, the game of “one-upping” another is a lot less useful than working together.

6) When hiring or when in an interview, concentrate on skills, not knowledge. “Skills” refers to attributes such as their ability to listen, how judgmental they are about a legacy system, whether they are open to new ideas, whether they disdain anything that is not cutting edge, and even technical skills such as thinking patterns, research patterns, algorithms, etc.

If someone says “I don’t know” in an interview, ask them “how would you go about figuring it out?” If someone says “I think it’s x and y” ask “how would you confirm/test that?” If a backup failed, do they start the backup over or do they try to figure out why it failed?

Are they thorough? Do they follow through? It is a lot easier to teach knowledge than it is to teach something like “debugging skills”.

7) Specifically encourage people to speak up. Train yourself to NOTICE when folks are not speaking up, and ask them if they have any suggestions or ideas.

8) If you are running an IT conference, specifically ask qualified women you know to speak, not about “women in IT”. If you hear of an IT conference, tell specific women you know that you think they would be a great speaker. Get women to speak at local user groups to get practice in a less intimidating space.

Resources:
Read HOWTO Encourage Women in Linux

You Just Don’t Understand
The Male Mind at Work
How to succeed in business without a penis: a guide for working women (this is a humor book)

Join and/or send messages to Systers, the world’s largest e-mail list of women in computing and technology fields.

In his advice is “do not change sort_buffer_size from the default.”

Baron did not explain the logic behind his reasoning, information pills he handwaves that “people utterly ruin their server performance and stability with it, check ” but does not explain how changing the sort_buffer_size kills performance and stability. Regardless of how respected and knowledgeable the source, NEVER take any advice that tells you what to do or how to do it without understanding WHY.

This article will explain the “why” of Baron’s point, and it will also talk more about understanding why, an integral part against the “Battle against any guess.” Baron’s recommendation to leave sort_buffer_size as the default is just as bad as all the advice given to change the sort_buffer_size, because all that advice (including Baron’s) does not explain the underlying causes.

First, I explain the sort_buffer_size issue. The sort buffer size, as the name implies, is a memory buffer used when ordering is needed (usually for GROUP BY and ORDER BY clauses, when the index used for the filter/join does not follow the GROUP/ORDER BY order). Increasing the sort_buffer_size means allowing more memory to be used for the sorting process.

Increasing the sort_buffer_size usually improves performance because more memory is used in sorting. It can be detrimental to performance because the full size of the sort buffer is allocated for each thread that needs to do a sort, even if that sort does not need a very large sort buffer.

A better optimization would be to change the schema and/or queries so that all that sorting is not necessary. Increasing the sort_buffer_size gives you a false sense of security that your server is performing better. Your server is performing the same tasks, only faster — the best optimization is to make the tasks smaller or eliminate some tasks. If you can have queries without so much sorting, that’s a much better optimization than changing sort_buffer_size.

That being said, increasing the sort_buffer_size is a perfectly acceptable stop-gap solution that can be implemented RIGHT NOW (it’s a dynamic variable), while you examine your queries by doing a query review with a tool such as mk-query-digest. This is indeed what Pythian does — and, by the way, not only do we recommend that course of action, but we explain it to you and help you find and optimize the queries in question.

That all assumes that having lots of sorts that require lots of memory is a bad thing. It may be that you have tuned your queries and schema such that you have eliminated as many sorts as you can, but some may remain. An intensive data mining server is a good example of a situation in which permanently increasing the sort_buffer_size may be the right solution.

Now that we have the specifics of this situation out of the way, let’s look at the Battle Against Any Guess. This is a movement against guessing games. Understanding what you are doing is essential; in the case of sort_buffer_size, you can believe that you know what you are doing by increasing sort_buffer_size. However, the real solution to the problem lies in changing the queries, not changing the memory patterns.

There is a 6-page description of the “Battle against any guess” in the Northern California Oracle User Group’s May Journal, starting on page 13. The examples are specific to Oracle, but the points made are sound even if you do not know Oracle well. For example:

Blindly implementing best practices is nothing different from guesswork; we are applying some past-proven solutions without measuring how they stand against our requirements, and without testing whether they bring us any closer to the targets we have. Industry has become so obsessed with best practices that we commonly see projects in which reviewing an environment for compliance with best practices is the ultimate goal.

One good reason you need to know *why* is also mentioned in the article: The second danger of best practices is that they easily become myths. The technology keeps improving and issues addressed by certain best practices might not be relevant anymore in the next software version.

So, even from respected folks like Baron or myself, do not take advice on face value. Ask why, understand why, and then think if there is another level. It is not always easy; often you think you understand but really you miss that other level – such as with sort_buffer_size.

In but there are many negatives.

I have a hard time trying to figure out the deeper meaning behind Peter’s post, allergist given that Percona writes a storage engine for MySQL, information pills XtraDB. Does this mean that Percona will stop developing XtraDB? Does this mean that the Percona Server will diverge farther and farther away from MySQL so that they’re not compatible any more and migrating from MySQL to Percona Server is very difficult?

Or maybe it’s just that Peter is saying one thing and doing the opposite; which just seems wrong because that would be blatant hypocrisy on Percona’s part.

(This idea was a comment on the blog post but seems to be trapped in the spam filter, so I’m posting it; apologies if the comment comes through eventually….)

My own opinion of the issue: Peter is factually correct with what he says. However, it’s nice to have the framework and be allowed to use more than one storage engine, or use exclusively one storage engine that’s not MyISAM.

I have been talking with a group of folks who have been making a product that has lots of free functionality, bronchi seeing and managing the MySQL config file (/etc/my.cnf), approved seeing and managing accounts, a small dashboard of overall health graphs, and more.

With this free tool you can look at and manage local and remote databases. It supports ssh tunneling, including ssh using password-protected ssh keys. It’s pretty neat, and I have been working with the product manager to add features. I think this took will become the de facto standard for centralized GUI administration of MySQL.

The tool is
MySQL workbench….Surprise! One of the best new features for the administrator is that you can now create an administration connection for an existing workbench connection with a click of a button, instead of having to enter in all that information again.

I use the “developer” version, 5.2.21. Note that the 5.1 version does not have administration capabilities.

A MySQL user group member saw that I use and I can even connect directly to Cygwin with an icon.

But Poderosa is not the tool I wanted to mention….Another user group member mentioned makes a nicely tabbed browsing window where you can open sessions by double-clicking the connections, sildenafil which are now listed on the right-hand side.

See screenshot below:

I have not played with other features such as sending a command to multiple windows, but even just having this is a HUGE win.

So, more about I’ve started a new job as a Senior Database Engineer at Salesforce, and one of the services I help provide is adding users to MySQL. We have some nice chef recipes, so all I have to do is update a few files, including adding in the MySQL password hash.

Now, when I added myself, I just logged into MySQL and generated a password hash. But when my SRE (systems reliability engineer) colleague needed to generate a password, he did not have a MySQL system he could login to.

The good news is it’s easy to generate a MySQL password hash. The MySQL password hash is simply a SHA1 hash of a SHA1 hash, with * put in the beginning. Which means you do not need a MySQL database to create a MySQL password hash – all you need is a programming language that has a SHA1 function (well, and a concatenate function).

And I found it, of course, on this post at StackExchange. So you don’t have to click through, here is what it says – and I have tested all these methods and I get the same password hash. I have changed their example of “right” to “PASSWORD HERE” so it’s more readable and obvious where the password goes, in case you copy and paste from here.

Some one-liners:

MySQL (may require you add -u(user) -p):

mysql -NBe "select password('PASSWORD HERE')"

Python:

python -c 'from hashlib import sha1; print "*" + sha1(sha1("PASSWORD HERE").digest()).hexdigest().upper()'

Perl:

perl -MDigest::SHA1=sha1_hex -MDigest::SHA1=sha1 -le 'print "*". uc sha1_hex(sha1("PASSWORD HERE"))'

PHP:

php -r 'echo "*" . strtoupper(sha1(sha1("PASSWORD HERE", TRUE))). "
";'

Hopefully these help you – they have enabled my colleagues to easily generate what’s needed without having to find (or create) a MySQL instance that they can already login to.

 

 

The MySQL track at

but I have lent some time to the coordination as well. It is a credit mostly to Ronald that we have been able to plan an entire 19-session conference track, unhealthy complete with confirming speakers, information pills in less than a month. (You may notice the schedule does not have all 19 sessions full, remedy we are just waiting for some more speakers to confirm details.)

Whether or not you made it to last month’s O’Reilly MySQL User Conference & Expo, and whether you are an expert or casual user, the sessions at Kaleidoscope will teach you new and exciting things. This is a credit to the planning we did (and again, Ronald spent the lion’s share of time on this) — we did not just want to re-do the same content from the April conference, and we wanted something that would be accessible to developers and DBAs who know what they’re doing when it comes to writing SQL queries, but may or may not know how MySQL itself works.

I will definitely be giving away copies of my book, The MySQL Administrator’s Bible, and Ronald will be giving away copies of his new book, Expert PHP and MySQL, so the conference is definitely not-to-be-missed!

It’s not too late to register for Kaleidoscope – be sure to use the discount code MYSQL to save $300 off your registration (assuming you are not a member of ODTUG).

By now you know that there is a and I organized the schedule at the last minute (Ronald did a lot of the work!). It was difficult to fill a schedule with 19 sessions that are either 1 hour or 1.5 hours long, search and to do it I ended up with three presentations.

At each presentation I will be giving away a copy of The MySQL Administrator’s Bible, ed so be sure to show up! All MySQL track sessions are in Maryland C, and all times are Eastern.

On Monday, June 28th from 4 pm – 5:30 pm I will be presenting “What do you mean, SQL Syntax Error?”, a presentation about how MySQL’s SQL syntax extends and deviates from the ANSI/ISO SQL:2003 standard. There is an 80-page PDF accompaniment that will be given out for free during this session.

On Tuesday, June 29th from 11 am to 12 noon I will be presenting Importing and Exporting Data with MySQL, about the many tools to load and bulk load data, and how to export data for regular and bulk loads. I will also be going over which storage engines are particularly well-suited for bulk loading, and the caveats to watch out for. This session is useful for those who know MySQL as well as those asking the question, “What’s the equivalent of Oracle’s SQL Loader for MySQL?”

On Wednesday, June 30th from 8:30 am to 9:30 am I will be presenting Navigating MySQL Stored Procedures & Functions, Views and Triggers, which covers all the ways stored procedures, stored functions, views and triggers can be used, including a highlight of Oracle differences.

I hope to see you there!

On Monday, this web doctor his thoughts on Tuesday.

We have confirmed that there will be an entire MySQL track at Kaleidoscope! Because Kaleidoscope is less than 8 weeks away, dosage we could not go through a standard call for papers. Ronald and I have been working to come up with appropriate topics and speakers for an audience that uses MySQL but is probably more familiar with Oracle. We contacted folks we thought would be interested, pharmacy and who we thought could make it logistically, as the conference is in Washington, D.C.

We have (almost) finalized the list of speakers; the session abstracts will be finalized in the next few days. You can see the speakers at Kaleidoscope’s MySQL page, but I’ve also listed them below (alpha by last name):

Philip Antoniades, Sun/MySQL
Ronald Bradford, 42SQL
Sheeri K. Cabral, The Pythian Group
Laine Campbell, PalominoDB
Patrick Galbraith, Northscale
Sarah Novotny, Blue Gecko
Padraig O’Sullivan, Akiba Technologies Inc.
Jay Pipes, Rackspace Cloud
Dossy Shiobara, Panoptic.com
Matt Yonkovit, Percona

There are one or two more speakers we are waiting to hear back from. There will be 19 sessions, so some speakers will have more than one session.

I am very excited that MySQL has its own track at Kaleidoscope. In addition, Ronald and I will be able to attend our very first event as Oracle ACE Directors – the Sundown Sessions are a Birds-of-a-Feather-type discussion, with the Oracle ACE Directors being the panelists and the community asking questions. Immediately after the Sundown Sessions is a “Meet the Oracle ACE” event, the only part of the conference officially sponsored by Oracle.

I am attending the tomorrow.)

So I am in Seeking Senior and Beyond: The Tech Skills That Get You Promoted. The first part talks about the definition of what it means to be senior, nurse and it completely relates to DBA work:
works and plays well with other
understands “ability”
leads by example
lives to share knowledge
understands “Service”
thoughtful of the consequences of their actions
understands projects
cool under pressure

Good Qualities:
confident
empathetic
humane
personal
forthright
respectful
thorough

Bad Qualities:
disrespective
insensitive
incompetent
[my own addition – no follow through, dosage lack of attention to detail]

The Dice/Monster Factor – what do job sites see as important for a senior position?

They back up the SAGE 5-year experience requirement
Ability to code in newer languages (Ruby/Python) is more prevalent (perhaps cloud-induced?)

The cloud allows sysadmin tasks to be done by anyone…..so developers can do sysadmin work, and you end up seeing schizophrenic job descriptions such as

About the 5-year requirement:
– Senior after 5 years? What happens after 10 years?
– Most electricians, by comparison, haven’t even completed an *apprenticeship* in 5 years.

Senior Administrators Code
– not just 20-line shell scripts
– coding skills are part of a sysadmin skill
– ability to code competently *is* a factor that separates juniors from seniors
– hiring managers expect senior admins to be competent coders.

If you are not a coder
– pick a language, any language
– do not listen to fans, find one that fits how you think, they all work…..
– …that being said, some languages are more practical than others (ie, .NET probably is not the best language to learn if you are a Unix sysadmin).

Popular admin languages:
– Perl: classic admin scripting language. Learn at least the basics, because you will see it in any environment that has been around for more than 5 years.

– Ruby: object-oriented language for people who mostly like Perl (except for its OO implementation)

– Python: object-oriented language for people who mostly hate Perl, objects or no objects. For example, you don’t have to create a String object to send an output.

But what if you do not have time to learn how to program?

– senior admins are better at managing their time than junior admins, so perhaps managing time
– time management means you’ll have more time to do things, it doesn’t mean all work work work.
– Read Time Management for System Administrators – there is Google Video of a presentation by the author, Tom Limoncelli.

Consider “The Cloud”
– starting to use developer APIs to perform sysadmin tasks, so learning programming is good.
– still growing, could supplant large portions of datacenter real estate
– a coder with sysadmin knowledge: Good
– a sysadmin with coding knowledge: Good
– a coder without sysadmin knowledge: OK
– a sysadmin with no coding interest/experience: Tough place to be in

Senior Admins Have Problems Too
Many don’t document or share knowledge
Maany don’t do a good job keeping up with their craft
Cannot always be highlighted as an example of how to deal with clients
Often reinvent the wheel – also usually there is no repository
Often don’t progress beyond the “senior admin” role

….on the other hand…..
cynicism can be good…..

Advice:
learn from the good traits
observe how others respond to their bad traits
think about how you might improve upon that
strive to work and play well with others, even if you don’t have a mentor for good/bad examples.

Now he’s going into talking about Patterns in System Administration….

Why Python?

– Low WTF per minute factor
– Passes the 6-month test (if you write python code, here going back in 6 months, prostate you pretty much know what you were trying to do)
– Small Shift/no-Shift ratio (ie, you use the “Shift” key a lot in Perl because you use $ % ( ) { } etc, so you can tell what something is by context, not by $ or %)
– It’s hard to make a mess
– Objects if you need them, ignore them if you don’t.


Basics
Here’s a sample interpreter session. The >>> is the python prompt, and the … is the second/subsequent line prompt:

>>> x='hello, world!';
>>> x.upper()
'HELLO, WORLD!'
>>> def swapper(mystr):
... return mystr.swapcase()
File "<stdin>", line 2
return mystr.swapcase()
^
IndentationError: expected an indented block

You need to put a space on the second line because whitespace ‘tabbing’ is enforced in Python:

>>> def swapper(mystr):
...  return mystr.swapcase()
...
>>> swapper(x)
'HELLO, WORLD!'
>>> x
'hello, world!'

Substrings
partition is how to get substrings based on a separator:

>>> def parts(mystr, sep=','):
...  return mystr.partition(sep)
...
>>> parts(x, ',')
('hello', ',', ' world!')

You can replace text, too, using replace.

>>> def personalize(greeting, name='Brian'):
...  """Replaces 'world' with a given name"""
...  return greeting.replace('world', name)
...
>>> personalize(x, 'Brian')
'hello, Brian!'

By the way, the stuff in the triple quotes is automatic documentation. A double underscore, also called a “dunder”, is to print the stuff in the triple quotes:

>>> print personalize.__doc__
Replaces 'world' with a given name

Loop over a list of functions and do that function to some data:

>>> funclist=[swapper, personalize, parts]
>>> for func in funclist:
...  func(x)
...
'HELLO, WORLD!'
'hello, Brian!'
('hello', ',', ' world!')

Lists

>>> v=range(1,10)
>>> v
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> v[1]
2
>>> v[5]
6
>>> v[-1]
9
>>> v[-3]
7

List slicing with “:”
>>> v[:2]
[1, 2]
>>> v[4:]
[5, 6, 7, 8, 9]
>>> v[4:9]
[5, 6, 7, 8, 9]
Note that there’s no error returned even though there’s no field 9. If you did v[9], you’d get an error:
>>> v[9]
Traceback (most recent call last):
File ““, line 1, in
IndexError: list index out of range

Python uses pointers (or pointer-like things) so v[1:-1] does not print the first and last values:

>>> v[1:-1]
[2, 3, 4, 5, 6, 7, 8]

The full array syntax is [start:end:index increment]:

>>> v[::2]
[1, 3, 5, 7, 9]
>>> v[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1]
>>> v[1:-1:4]
[2, 6]
>>> v[::3]
[1, 4, 7]

Make an array of numbers with range

>>> l=range(10)
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Make a list from another list

>>> [pow(num,2) for num in l]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

append appends to the end of a list

>>> l.append( [pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]]
>>> l.pop()
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

extend takes a sequence and puts it at the end of the array.

>>> l.extend([pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

A list can be made of a transformation, an iteration and optional filter:
[ i*i for i in mylist if i % 2 == 0]
transformation is i*i
iteration is for i in mylist
optional filter is if i % 2 == 0

>>> L=range(1,6)
>>> L
[1, 2, 3, 4, 5]
>>> [ i*i for i in L if i % 2 == 0]
[4, 16]

Tuples
Tuples are immutable lists, and they use () instead of []
A tuple always has 2 elements, so a one-item tuple is defined as
x=(1,)

Dictionaries aka associative arrays/hashes:

>>> d = {'user':'jonesy', 'room':'1178'}
>>> d
{'user': 'jonesy', 'room': '1178'}
>>> d['user']
'jonesy'
>>> d.keys()
['user', 'room']
>>> d.values()
['jonesy', '1178']
>>> d.items()
[('user', 'jonesy'), ('room', '1178')]
>>> d.items()[0]
('user', 'jonesy')
>>> d.items()[0][1]
'jonesy'
>>> d.items()[0][1].swapcase()
'JONESY'

There is no order to dictionaries, so don’t rely on it.

Quotes and string formatting
– You can use single and double quotes inside each other
– Inside triple quotes, you can use single and double quotes
– Variables are not recognized in strings, uses printf-style string formatting:

>>> word='World'
>>> punc='!'
>>> print "Hello, %s%s" % (word, punc)
Hello, World!

Braces, semicolons, indents
– Use indents instead of braces
– End-of-line instead of semicolons

if x == y:
print "x == y"
for k,v in mydict.iteritems():
if v is None:
continue
print "v has a value: %s" % v

This seems like it might be problematic because of long blocks of code, but apparently code blocks don’t get that long. You can also use folds in vim [now I need to look up what folds in vim are].

You can’t assign a value in a conditional statement’s expression — because you can’t use an = sign. This is on purpose, it avoids bugs resulting from typing if x=y instead of if x==y.

The construct has no place in production code anyway, since you give up catching any exceptions.

Python modules for sysadmins:
– sys
– os
– urlib/urlib2
– time, datetime (and calendar)
– fileinput
– stat
– filecmp
– glob (to use wildcards)
– shutil
– gzip
– tarfile
– hashlib, md5, crypt
– logging
– curses
– smtplib and email
– cmd

The Zen of Python
To get this, type ‘python’ in a unix environment, then type ‘import this’ at the commandline. I did this on my Windows laptop running Cygwin:

cabral@pythianbos2 ~
$ python
Python 2.5.2 (r252:60911, Dec  2 2008, 09:26:14)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

This was liveblogged, please let me know any issues, as they may be typos….

[author’s note: personally, store I use awk a bunch in MySQL DBA work, viagra for tasks like scrubbing data from a production export for use in qa/dev, but usually have to resort to Perl for really complex stuff, but now I know how to do .]

Basics:
By default, fields are separated by any number of spaces. The -F option to awk changes the separator on commandline.
Print the first field, fields are separated by a colon.
awk -F: '{print $1}' /etc/passwd

Print the first and fifth field:
awk -F: '{$print $1,$5}' /etc/passwd

Can pattern match and use files, so you can replace:
grep foo /etc/passwd | awk -F: '{print $1,$5}'
with:
awk -F: '/foo/ {print $1,$5}' /etc/passwd

NF = built in variable (no $) used to mean “field number”
This will print the first and last fields of lines where the first field matches “foo”
awk -F: '$1 ~/foo/ {print $1,$NF}' /etc/passwd

NF = number of fields, ie, “7″
$NF = value of last field, ie “/bin/bash”
(similarly, NR is record number)


Awk makes assumptions about input, variables, and processing that you’d otherwise have to code yourself.

– “main loop” of input processing is done for you
– awk initializes variables for you, to 0
– input is viewed by awk as ‘records’ which are splittable into ‘fields’

This all makes a lot of operations very concise in awk, many things can be done w/ a one-liner that would otherwise require several lines of code.

awk key points:
– splits text into fields
– default delimiter is “any number of spaces”
– reference fields
– $0 is entire line
– create filters using ‘addresses’ which can be regexps (similar to sed)
– Turing-complete language
– has if, while, for, do-while, etc
– built-in math like exp, log, rand, sin, cos
– built-in string sub, split, index, toupper/lower

Patterns and actions
Pattern is first, then action(s)
Actions are enclosed in {}

only a pattern, no action:
'length>42'
but, the default action is to print the whole line, so this will actually do something — print lines where the length of the line is > 42. strings are just arrays in awk

only action, no pattern:
{print $2,$1}
do this to all lines of input

NR % 3 == 0
print every third line (pattern is %NR mod 3)

{print $1, $NF, $(NF-1)}
print the first field, last field, and 2nd to last field

built-in variables
NF, NR we’ve done
FS = field separator (can be regexp)
OFMT = output format for numbers (default %.6g)

Patterns
– used to filter lines processed by awk
– can be regexp
/^root/ is the pattern in the following
awk -F:'/^root/ {print $1,$NF}' /etc/passwd

– Patterns can use fields and relational operators
To print 1st, 4th and last field if value of 4th field >10:
awk -F: '$4 > 10 {print $1, $4, $NF}' /etc/passwd

awk -F: '$0 !~ /^#/ && $4 > 10 {print $1, $4, $NF}' /etc/passwd

Range patterns
sed-like addressing : you can have start and end addresses
awk ‘NR==1,NR==3′
prints only first three lines of the file
You can use regular expressions in range patterns:
awk -F:’/^root/,/^daemon/ {print $1,$NF}’ /etc/passwd
start printing at the line that starts with “root”, the last line that is processed is the line starting with “daemon”

Range pattern “gotcha” – can’t mix a range with other patterns:
To do “start at non-commented line where value of $4 is less than 10, end at the first line where value of $4 is greater than 10″

This does not work!
awk -F: '$0 !~ /^#/ $4 <= 10, $4 > 10' /etc/passwd

This is how to do it, {next} is an action that skips:
awk -F: '$0 ~ /^#/ {next} $4 <= 10, $4 > 10 {print $1, $4' /etc/passwd

Basic Aggregation
awk -F: ‘$3 > 100 {x+=1; print x}’ /etc/passwd
This gives a line of output as each matching line is processed. This gives a running total of x.

awk -F: ‘$3 > 100 {x+=1} END {print x}’ /etc/passwd
This processes the “{print x}” action only after the entire file has been processed. This gives only the final value of x.

Arrays:
Support for regular arrays
Technically multi-dimensional arrays are not supported, but array indexes are not supported, so you can make your own associative arrays.

Example:
awk -F: ‘{x[$1] = $2*($4 – $3)} END {for(key in x) {print key, x[key]}}’ stocks.txt

The part before the END creates the associative array, the part after the END prints the array.

Extreme data munging:
awk -f: '{x[$1]=($2'($4 - $3))} END {for(z in x) {print z, x[z]}}' stocks.txt

ABC,100,12.14,19.12
FOO,100,24.01,17.45

output
BAR 271.5
ABC 698

For the line “ABC,100,12.14,19.12″
the function becomes

x[ABC] = 100 * (19.12 - 12.14) = 698

Aggregate across multiple variables:
awk -F, '{x[$1] = $2*($4 - $3); y+=x[$1]} END {for(z in x) {print z, x[z]}} {print "Net:"y}}' stocks.txt

Note that y is a running *sum* (not a running count like before).

Now, the above is hard to read, this is much easier.

#!/usr/bin/awk -f

BEGIN { FS="," }
{ x[$1] = $2*($4 - $3)
y+=x[$1]
}
END {
for(z in x) {
print z, x[z]
}
}  # end for loop
{
print "Net:"y
} # end END block

This was liveblogged, so please point out any issues, as they may be typos on my part….

The Beacon Pattern:
– This is a “Get out of the business” pattern
– Identify an oft-occurring and annoying task
– Automate and document it to the point of being able to hand it off to someone far less technical

Example:
– System admins were being put in charge of scheduling rooms in the building
– They wrote a PHP web application to help them automate the task
– They refined the app, cardiology and handed it off to a secretary
– They have to maintain the app, this web but it’s far less work.

The Community Pattern:

– Prior to launch of a new service, create user documentation for it.
– Point a few early adopters at the documentation and see if they can use the service with minimal support
– Use feedback to improve documentation, and the service
– Upon launch, create a mailing list, forum, IRC channel, or Jabber chat room and ask early adopters to help test it out.
– Upon launch, your early adopters are the community, and they’ll tell new users to use the tools you’ve provided instead of calling you.

Example:
– A beowulf cluster for an academic department
– Documented like crazy, early adopters were given early access to the cluster (demand was high)
– Crated a mailing list, early adopters were added to it with their consent, functionality was tested with them.
– Email announcing launch mentioned the early adopters in a ‘thank you’ secion, and linked them to their mailing list.

The DRY pattern
DRY = Don’t repeat yourself
Identify duplicate code in your automation scripts
Put subroutines that exist in an include file, and include them in your scripts.

Example:
– “sysadmin library”
– /var/lib/adm/.*pl
– Elapsed time and # of lines to script a task for which the library was useful plunged dramatically
– new tasks were thought up that were not considered before but were obvious now (ie, users that want to change their username)
– migrating to new services became much easier

The Chameleon Pattern
– Identify commonalities among your services
– Leverage those to create “Chameleon” servers that can be re-purposed on the fly
– Abstract as much of this away from the physical hardware
– Doesn’t need to involve virtualization, though it’s awfully handy if you can do it that way.
[this one is a bit harder to do with MySQL config files]

Example:
[puppet/cfengine were mentioned…]
ldapconfig.py – more than a script: a methodology

– But isn’t installing packages you don’t need bad? Depends on the package….ie, gcc is bad for enterprise

“Junior annoynances”

Terminal issues

Junior:
open terminal, login to machine1
think issue is with machine2, talks to machine1.
log out of machine1
log into machine2

Senior:
opens 2 terminals each of machine1 and machine2 to start

Junior:
networking issue ticket arrives
logs into server
runs tcpdump

Senior:
networking issue ticket arrives
logs into server
looks at logs

“Fix” vs. “Solution” ie “taking orders”
Junior will try fix a problem, senior will try to figure out what the problem is. ie, “I need a samba directory mounted under an NFS mount” a junior admin will try to do exactly that, a senior admin will ask “what are you trying to do with that?” because maybe all they need is a symlink.

Fanboyism
Signs you might be a fanboy:
– Disparaging users of latest stable release of $THING for not using the nightly (unstable) build which fixes more issues
– Creating false/invalid comparisons based on popular opinion instead of experience/facts
– Going against internal standards, breaking environmental consistency, to use $THING instead of $STANDARD (but this is also how disruptive technology works)
– Being in complete denial that most technology at some point or another stinks.
– Evaluating solutions based on “I like” instead of “we need” and “this does”.

Liveblog of the Professional IT Community Conference session Mentoring: It’s for everyone

Ways to learn:
Audio
Visual
Kinetic (doing it)

Everyone learns differently, cialis sale but most people learn with some combination of all these three.

However, help you can also learn by training [that’s the truth, otolaryngologist I learned a LOT by writing the book, even things I knew, I ended up needing to research more].

Ways to train:
Explanation
Observation
Demonstration
Questioning (Socratic Method)

What is a mentor?
noun: experienced and trusted adviser. It’s not just someone who teaches, it’s someone who advises.
experienced person in a company, college or schools who trains and counsels new employees or students.

verb: to advise or train (someone, esp. a younger colleague).

A mentorship is a safe place to ask questions.

A mentor is a trainer, but a trainer who also is a professional advisor.

Finding a mentor
Someone…..
– you respect/admire
– works with similar technology
– has a compatible personality
– you have a good rapport with

Being a mentor
– Teach technical skills
– Provide advanced technical/design guidance
– Model and teach professional skills
– Be interested and invested in the [student’s] career

I am moderating and liveblogging the Professional IT Community Conference panel called Tech Women Rule! Creative Solutions for being a (or working with a) female technologist.

One point to keep in mind: The goal is not equality for equality’s sake. The goal is to have a diverse range of experience to make your company/project/whatever the best it could be.

That being said, tadalafil these issues are not just around women; they are about anyone who is “different”, whether it’s race, ethnicity, gender, sexual orientation, cultural.

So what are some of the solutions?

0) Better align expectations with reality. Are you expecting more from someone who is one gender than another? If a woman makes a mistake is it worse because she has to prove herself? Is it worse because she is representative of her gender? If she does something good is the achievement elevated more because of her gender? Either is bad.

1) Respect people for who they are. Everyone deserves respect; if someone is not at your technical level, they still deserve respect.

If someone says something that is completely wrong from a technical perspective, do not assume that they have no idea what they are talking about. It could be that they are the exact case in which that technical scenario is appropriate for them. If they are correct, your attitude will be refreshing and you might learn something. If they are indeed wrong, ask them about a scenario in which their thinking falls apart, or otherwise guide them through learning why what they are saying is wrong.

2) Be nice. Don’t condescend.

3) Be helpful. “RTFM, n00b!” is not helpful, and certainly does not follow rule #2.

4) Don’t do #1-3 for women only. Don’t treat women nicely because they’re women, and be a jerk to men because they’re men. Being helpful is good for anyone, not just women.

5) Cooperate, do not compete. Whether you are co-workers, working together on a software project, or just in a conversation, the game of “one-upping” another is a lot less useful than working together.

6) When hiring or when in an interview, concentrate on skills, not knowledge. “Skills” refers to attributes such as their ability to listen, how judgmental they are about a legacy system, whether they are open to new ideas, whether they disdain anything that is not cutting edge, and even technical skills such as thinking patterns, research patterns, algorithms, etc.

If someone says “I don’t know” in an interview, ask them “how would you go about figuring it out?” If someone says “I think it’s x and y” ask “how would you confirm/test that?” If a backup failed, do they start the backup over or do they try to figure out why it failed?

Are they thorough? Do they follow through? It is a lot easier to teach knowledge than it is to teach something like “debugging skills”.

7) Specifically encourage people to speak up. Train yourself to NOTICE when folks are not speaking up, and ask them if they have any suggestions or ideas.

8) If you are running an IT conference, specifically ask qualified women you know to speak, not about “women in IT”. If you hear of an IT conference, tell specific women you know that you think they would be a great speaker. Get women to speak at local user groups to get practice in a less intimidating space.

Resources:
Read HOWTO Encourage Women in Linux

You Just Don’t Understand
The Male Mind at Work
How to succeed in business without a penis: a guide for working women (this is a humor book)

Join and/or send messages to Systers, the world’s largest e-mail list of women in computing and technology fields.

In his advice is “do not change sort_buffer_size from the default.”

Baron did not explain the logic behind his reasoning, information pills he handwaves that “people utterly ruin their server performance and stability with it, check ” but does not explain how changing the sort_buffer_size kills performance and stability. Regardless of how respected and knowledgeable the source, NEVER take any advice that tells you what to do or how to do it without understanding WHY.

This article will explain the “why” of Baron’s point, and it will also talk more about understanding why, an integral part against the “Battle against any guess.” Baron’s recommendation to leave sort_buffer_size as the default is just as bad as all the advice given to change the sort_buffer_size, because all that advice (including Baron’s) does not explain the underlying causes.

First, I explain the sort_buffer_size issue. The sort buffer size, as the name implies, is a memory buffer used when ordering is needed (usually for GROUP BY and ORDER BY clauses, when the index used for the filter/join does not follow the GROUP/ORDER BY order). Increasing the sort_buffer_size means allowing more memory to be used for the sorting process.

Increasing the sort_buffer_size usually improves performance because more memory is used in sorting. It can be detrimental to performance because the full size of the sort buffer is allocated for each thread that needs to do a sort, even if that sort does not need a very large sort buffer.

A better optimization would be to change the schema and/or queries so that all that sorting is not necessary. Increasing the sort_buffer_size gives you a false sense of security that your server is performing better. Your server is performing the same tasks, only faster — the best optimization is to make the tasks smaller or eliminate some tasks. If you can have queries without so much sorting, that’s a much better optimization than changing sort_buffer_size.

That being said, increasing the sort_buffer_size is a perfectly acceptable stop-gap solution that can be implemented RIGHT NOW (it’s a dynamic variable), while you examine your queries by doing a query review with a tool such as mk-query-digest. This is indeed what Pythian does — and, by the way, not only do we recommend that course of action, but we explain it to you and help you find and optimize the queries in question.

That all assumes that having lots of sorts that require lots of memory is a bad thing. It may be that you have tuned your queries and schema such that you have eliminated as many sorts as you can, but some may remain. An intensive data mining server is a good example of a situation in which permanently increasing the sort_buffer_size may be the right solution.

Now that we have the specifics of this situation out of the way, let’s look at the Battle Against Any Guess. This is a movement against guessing games. Understanding what you are doing is essential; in the case of sort_buffer_size, you can believe that you know what you are doing by increasing sort_buffer_size. However, the real solution to the problem lies in changing the queries, not changing the memory patterns.

There is a 6-page description of the “Battle against any guess” in the Northern California Oracle User Group’s May Journal, starting on page 13. The examples are specific to Oracle, but the points made are sound even if you do not know Oracle well. For example:

Blindly implementing best practices is nothing different from guesswork; we are applying some past-proven solutions without measuring how they stand against our requirements, and without testing whether they bring us any closer to the targets we have. Industry has become so obsessed with best practices that we commonly see projects in which reviewing an environment for compliance with best practices is the ultimate goal.

One good reason you need to know *why* is also mentioned in the article: The second danger of best practices is that they easily become myths. The technology keeps improving and issues addressed by certain best practices might not be relevant anymore in the next software version.

So, even from respected folks like Baron or myself, do not take advice on face value. Ask why, understand why, and then think if there is another level. It is not always easy; often you think you understand but really you miss that other level – such as with sort_buffer_size.

In but there are many negatives.

I have a hard time trying to figure out the deeper meaning behind Peter’s post, allergist given that Percona writes a storage engine for MySQL, information pills XtraDB. Does this mean that Percona will stop developing XtraDB? Does this mean that the Percona Server will diverge farther and farther away from MySQL so that they’re not compatible any more and migrating from MySQL to Percona Server is very difficult?

Or maybe it’s just that Peter is saying one thing and doing the opposite; which just seems wrong because that would be blatant hypocrisy on Percona’s part.

(This idea was a comment on the blog post but seems to be trapped in the spam filter, so I’m posting it; apologies if the comment comes through eventually….)

My own opinion of the issue: Peter is factually correct with what he says. However, it’s nice to have the framework and be allowed to use more than one storage engine, or use exclusively one storage engine that’s not MyISAM.

I have been talking with a group of folks who have been making a product that has lots of free functionality, bronchi seeing and managing the MySQL config file (/etc/my.cnf), approved seeing and managing accounts, a small dashboard of overall health graphs, and more.

With this free tool you can look at and manage local and remote databases. It supports ssh tunneling, including ssh using password-protected ssh keys. It’s pretty neat, and I have been working with the product manager to add features. I think this took will become the de facto standard for centralized GUI administration of MySQL.

The tool is
MySQL workbench….Surprise! One of the best new features for the administrator is that you can now create an administration connection for an existing workbench connection with a click of a button, instead of having to enter in all that information again.

I use the “developer” version, 5.2.21. Note that the 5.1 version does not have administration capabilities.

A MySQL user group member saw that I use and I can even connect directly to Cygwin with an icon.

But Poderosa is not the tool I wanted to mention….Another user group member mentioned makes a nicely tabbed browsing window where you can open sessions by double-clicking the connections, sildenafil which are now listed on the right-hand side.

See screenshot below:

I have not played with other features such as sending a command to multiple windows, but even just having this is a HUGE win.

So, more about I’ve started a new job as a Senior Database Engineer at Salesforce, and one of the services I help provide is adding users to MySQL. We have some nice chef recipes, so all I have to do is update a few files, including adding in the MySQL password hash.

Now, when I added myself, I just logged into MySQL and generated a password hash. But when my SRE (systems reliability engineer) colleague needed to generate a password, he did not have a MySQL system he could login to.

The good news is it’s easy to generate a MySQL password hash. The MySQL password hash is simply a SHA1 hash of a SHA1 hash, with * put in the beginning. Which means you do not need a MySQL database to create a MySQL password hash – all you need is a programming language that has a SHA1 function (well, and a concatenate function).

And I found it, of course, on this post at StackExchange. So you don’t have to click through, here is what it says – and I have tested all these methods and I get the same password hash. I have changed their example of “right” to “PASSWORD HERE” so it’s more readable and obvious where the password goes, in case you copy and paste from here.

Some one-liners:

MySQL (may require you add -u(user) -p):

mysql -NBe "select password('PASSWORD HERE')"

Python:

python -c 'from hashlib import sha1; print "*" + sha1(sha1("PASSWORD HERE").digest()).hexdigest().upper()'

Perl:

perl -MDigest::SHA1=sha1_hex -MDigest::SHA1=sha1 -le 'print "*". uc sha1_hex(sha1("PASSWORD HERE"))'

PHP:

php -r 'echo "*" . strtoupper(sha1(sha1("PASSWORD HERE", TRUE))). "
";'

Hopefully these help you – they have enabled my colleagues to easily generate what’s needed without having to find (or create) a MySQL instance that they can already login to.

 

 

The MySQL track at

but I have lent some time to the coordination as well. It is a credit mostly to Ronald that we have been able to plan an entire 19-session conference track, unhealthy complete with confirming speakers, information pills in less than a month. (You may notice the schedule does not have all 19 sessions full, remedy we are just waiting for some more speakers to confirm details.)

Whether or not you made it to last month’s O’Reilly MySQL User Conference & Expo, and whether you are an expert or casual user, the sessions at Kaleidoscope will teach you new and exciting things. This is a credit to the planning we did (and again, Ronald spent the lion’s share of time on this) — we did not just want to re-do the same content from the April conference, and we wanted something that would be accessible to developers and DBAs who know what they’re doing when it comes to writing SQL queries, but may or may not know how MySQL itself works.

I will definitely be giving away copies of my book, The MySQL Administrator’s Bible, and Ronald will be giving away copies of his new book, Expert PHP and MySQL, so the conference is definitely not-to-be-missed!

It’s not too late to register for Kaleidoscope – be sure to use the discount code MYSQL to save $300 off your registration (assuming you are not a member of ODTUG).

By now you know that there is a and I organized the schedule at the last minute (Ronald did a lot of the work!). It was difficult to fill a schedule with 19 sessions that are either 1 hour or 1.5 hours long, search and to do it I ended up with three presentations.

At each presentation I will be giving away a copy of The MySQL Administrator’s Bible, ed so be sure to show up! All MySQL track sessions are in Maryland C, and all times are Eastern.

On Monday, June 28th from 4 pm – 5:30 pm I will be presenting “What do you mean, SQL Syntax Error?”, a presentation about how MySQL’s SQL syntax extends and deviates from the ANSI/ISO SQL:2003 standard. There is an 80-page PDF accompaniment that will be given out for free during this session.

On Tuesday, June 29th from 11 am to 12 noon I will be presenting Importing and Exporting Data with MySQL, about the many tools to load and bulk load data, and how to export data for regular and bulk loads. I will also be going over which storage engines are particularly well-suited for bulk loading, and the caveats to watch out for. This session is useful for those who know MySQL as well as those asking the question, “What’s the equivalent of Oracle’s SQL Loader for MySQL?”

On Wednesday, June 30th from 8:30 am to 9:30 am I will be presenting Navigating MySQL Stored Procedures & Functions, Views and Triggers, which covers all the ways stored procedures, stored functions, views and triggers can be used, including a highlight of Oracle differences.

I hope to see you there!

OpenSQLCamp is less than 4 months away, visit web for getting the ball rolling and making the reservation at MIT. Using MIT means that we will have *free* reliable wireless guest access and projects.

OpenSQL Camp is a free unconference for people interested in open source databases (MySQL, buy information pills SQLite, sales Postgres, Drizzle), including non-relational databases, database alternatives like NoSQL stores, and database tools such as Gearman. We are not focusing on any one project, and hope to see representatives from a variety of open source database projects attend. As usual I am one of the main organizers of Open SQL Camp (in previous years, Baron Schwartz, Selena Deckelmann and Eric Day have been main organizers too; this year Bradley Kuzsmaul is the other main organizer). The target audience are users and developers, but others are encouraged to attend too. There will be both presentations and hackathons, with plenty of opportunities to learn, contribute, and collaborate!

I have updated the main Boston 2010 page at http://opensqlcamp.org/Events/Boston2010/ with travel and logistics information, including links to:

Register — it’s free and easy, and you can always change your mind later!

Maybe you have an idea for a session you would like to see, or a session you would like to give? If so, you can note it on the sessions page. This will give everyone a sense of what type of presentations will be there. I have started by putting 2 sessions I am willing to give and a third at the bottom for one I’d like to see, to give everyone an idea of both types of descriptions.

Probably the most important link right now is the way we keep OpenSQLCamp free for all attendees – sponsor or donate to the conference! Any donation amount is accepted, and all donations are tax-exempt to the fullest extent of the law. Businesses and organizations will be listed as sponsors if they make a donation of $250 or more, and individuals will be listed as sponsors if they make a donation of $100 or more. More information on sponsor benefits, including where to send a graphic to, at the link.

There is a preliminary schedule, up until the conference itself it will only show the agenda of the conference — how many rooms and what time the presentations are supposed to be. During and after the conference we will update this schedule page with the titles, presenters and links to any notes/videos/audio taken.

If you have any questions, please do not hesitate to ask on the mailing list or by posting a comment here.

On Monday, this web doctor his thoughts on Tuesday.

We have confirmed that there will be an entire MySQL track at Kaleidoscope! Because Kaleidoscope is less than 8 weeks away, dosage we could not go through a standard call for papers. Ronald and I have been working to come up with appropriate topics and speakers for an audience that uses MySQL but is probably more familiar with Oracle. We contacted folks we thought would be interested, pharmacy and who we thought could make it logistically, as the conference is in Washington, D.C.

We have (almost) finalized the list of speakers; the session abstracts will be finalized in the next few days. You can see the speakers at Kaleidoscope’s MySQL page, but I’ve also listed them below (alpha by last name):

Philip Antoniades, Sun/MySQL
Ronald Bradford, 42SQL
Sheeri K. Cabral, The Pythian Group
Laine Campbell, PalominoDB
Patrick Galbraith, Northscale
Sarah Novotny, Blue Gecko
Padraig O’Sullivan, Akiba Technologies Inc.
Jay Pipes, Rackspace Cloud
Dossy Shiobara, Panoptic.com
Matt Yonkovit, Percona

There are one or two more speakers we are waiting to hear back from. There will be 19 sessions, so some speakers will have more than one session.

I am very excited that MySQL has its own track at Kaleidoscope. In addition, Ronald and I will be able to attend our very first event as Oracle ACE Directors – the Sundown Sessions are a Birds-of-a-Feather-type discussion, with the Oracle ACE Directors being the panelists and the community asking questions. Immediately after the Sundown Sessions is a “Meet the Oracle ACE” event, the only part of the conference officially sponsored by Oracle.

I am attending the tomorrow.)

So I am in Seeking Senior and Beyond: The Tech Skills That Get You Promoted. The first part talks about the definition of what it means to be senior, nurse and it completely relates to DBA work:
works and plays well with other
understands “ability”
leads by example
lives to share knowledge
understands “Service”
thoughtful of the consequences of their actions
understands projects
cool under pressure

Good Qualities:
confident
empathetic
humane
personal
forthright
respectful
thorough

Bad Qualities:
disrespective
insensitive
incompetent
[my own addition – no follow through, dosage lack of attention to detail]

The Dice/Monster Factor – what do job sites see as important for a senior position?

They back up the SAGE 5-year experience requirement
Ability to code in newer languages (Ruby/Python) is more prevalent (perhaps cloud-induced?)

The cloud allows sysadmin tasks to be done by anyone…..so developers can do sysadmin work, and you end up seeing schizophrenic job descriptions such as

About the 5-year requirement:
– Senior after 5 years? What happens after 10 years?
– Most electricians, by comparison, haven’t even completed an *apprenticeship* in 5 years.

Senior Administrators Code
– not just 20-line shell scripts
– coding skills are part of a sysadmin skill
– ability to code competently *is* a factor that separates juniors from seniors
– hiring managers expect senior admins to be competent coders.

If you are not a coder
– pick a language, any language
– do not listen to fans, find one that fits how you think, they all work…..
– …that being said, some languages are more practical than others (ie, .NET probably is not the best language to learn if you are a Unix sysadmin).

Popular admin languages:
– Perl: classic admin scripting language. Learn at least the basics, because you will see it in any environment that has been around for more than 5 years.

– Ruby: object-oriented language for people who mostly like Perl (except for its OO implementation)

– Python: object-oriented language for people who mostly hate Perl, objects or no objects. For example, you don’t have to create a String object to send an output.

But what if you do not have time to learn how to program?

– senior admins are better at managing their time than junior admins, so perhaps managing time
– time management means you’ll have more time to do things, it doesn’t mean all work work work.
– Read Time Management for System Administrators – there is Google Video of a presentation by the author, Tom Limoncelli.

Consider “The Cloud”
– starting to use developer APIs to perform sysadmin tasks, so learning programming is good.
– still growing, could supplant large portions of datacenter real estate
– a coder with sysadmin knowledge: Good
– a sysadmin with coding knowledge: Good
– a coder without sysadmin knowledge: OK
– a sysadmin with no coding interest/experience: Tough place to be in

Senior Admins Have Problems Too
Many don’t document or share knowledge
Maany don’t do a good job keeping up with their craft
Cannot always be highlighted as an example of how to deal with clients
Often reinvent the wheel – also usually there is no repository
Often don’t progress beyond the “senior admin” role

….on the other hand…..
cynicism can be good…..

Advice:
learn from the good traits
observe how others respond to their bad traits
think about how you might improve upon that
strive to work and play well with others, even if you don’t have a mentor for good/bad examples.

Now he’s going into talking about Patterns in System Administration….

Why Python?

– Low WTF per minute factor
– Passes the 6-month test (if you write python code, here going back in 6 months, prostate you pretty much know what you were trying to do)
– Small Shift/no-Shift ratio (ie, you use the “Shift” key a lot in Perl because you use $ % ( ) { } etc, so you can tell what something is by context, not by $ or %)
– It’s hard to make a mess
– Objects if you need them, ignore them if you don’t.


Basics
Here’s a sample interpreter session. The >>> is the python prompt, and the … is the second/subsequent line prompt:

>>> x='hello, world!';
>>> x.upper()
'HELLO, WORLD!'
>>> def swapper(mystr):
... return mystr.swapcase()
File "<stdin>", line 2
return mystr.swapcase()
^
IndentationError: expected an indented block

You need to put a space on the second line because whitespace ‘tabbing’ is enforced in Python:

>>> def swapper(mystr):
...  return mystr.swapcase()
...
>>> swapper(x)
'HELLO, WORLD!'
>>> x
'hello, world!'

Substrings
partition is how to get substrings based on a separator:

>>> def parts(mystr, sep=','):
...  return mystr.partition(sep)
...
>>> parts(x, ',')
('hello', ',', ' world!')

You can replace text, too, using replace.

>>> def personalize(greeting, name='Brian'):
...  """Replaces 'world' with a given name"""
...  return greeting.replace('world', name)
...
>>> personalize(x, 'Brian')
'hello, Brian!'

By the way, the stuff in the triple quotes is automatic documentation. A double underscore, also called a “dunder”, is to print the stuff in the triple quotes:

>>> print personalize.__doc__
Replaces 'world' with a given name

Loop over a list of functions and do that function to some data:

>>> funclist=[swapper, personalize, parts]
>>> for func in funclist:
...  func(x)
...
'HELLO, WORLD!'
'hello, Brian!'
('hello', ',', ' world!')

Lists

>>> v=range(1,10)
>>> v
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> v[1]
2
>>> v[5]
6
>>> v[-1]
9
>>> v[-3]
7

List slicing with “:”
>>> v[:2]
[1, 2]
>>> v[4:]
[5, 6, 7, 8, 9]
>>> v[4:9]
[5, 6, 7, 8, 9]
Note that there’s no error returned even though there’s no field 9. If you did v[9], you’d get an error:
>>> v[9]
Traceback (most recent call last):
File ““, line 1, in
IndexError: list index out of range

Python uses pointers (or pointer-like things) so v[1:-1] does not print the first and last values:

>>> v[1:-1]
[2, 3, 4, 5, 6, 7, 8]

The full array syntax is [start:end:index increment]:

>>> v[::2]
[1, 3, 5, 7, 9]
>>> v[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1]
>>> v[1:-1:4]
[2, 6]
>>> v[::3]
[1, 4, 7]

Make an array of numbers with range

>>> l=range(10)
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Make a list from another list

>>> [pow(num,2) for num in l]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

append appends to the end of a list

>>> l.append( [pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]]
>>> l.pop()
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

extend takes a sequence and puts it at the end of the array.

>>> l.extend([pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

A list can be made of a transformation, an iteration and optional filter:
[ i*i for i in mylist if i % 2 == 0]
transformation is i*i
iteration is for i in mylist
optional filter is if i % 2 == 0

>>> L=range(1,6)
>>> L
[1, 2, 3, 4, 5]
>>> [ i*i for i in L if i % 2 == 0]
[4, 16]

Tuples
Tuples are immutable lists, and they use () instead of []
A tuple always has 2 elements, so a one-item tuple is defined as
x=(1,)

Dictionaries aka associative arrays/hashes:

>>> d = {'user':'jonesy', 'room':'1178'}
>>> d
{'user': 'jonesy', 'room': '1178'}
>>> d['user']
'jonesy'
>>> d.keys()
['user', 'room']
>>> d.values()
['jonesy', '1178']
>>> d.items()
[('user', 'jonesy'), ('room', '1178')]
>>> d.items()[0]
('user', 'jonesy')
>>> d.items()[0][1]
'jonesy'
>>> d.items()[0][1].swapcase()
'JONESY'

There is no order to dictionaries, so don’t rely on it.

Quotes and string formatting
– You can use single and double quotes inside each other
– Inside triple quotes, you can use single and double quotes
– Variables are not recognized in strings, uses printf-style string formatting:

>>> word='World'
>>> punc='!'
>>> print "Hello, %s%s" % (word, punc)
Hello, World!

Braces, semicolons, indents
– Use indents instead of braces
– End-of-line instead of semicolons

if x == y:
print "x == y"
for k,v in mydict.iteritems():
if v is None:
continue
print "v has a value: %s" % v

This seems like it might be problematic because of long blocks of code, but apparently code blocks don’t get that long. You can also use folds in vim [now I need to look up what folds in vim are].

You can’t assign a value in a conditional statement’s expression — because you can’t use an = sign. This is on purpose, it avoids bugs resulting from typing if x=y instead of if x==y.

The construct has no place in production code anyway, since you give up catching any exceptions.

Python modules for sysadmins:
– sys
– os
– urlib/urlib2
– time, datetime (and calendar)
– fileinput
– stat
– filecmp
– glob (to use wildcards)
– shutil
– gzip
– tarfile
– hashlib, md5, crypt
– logging
– curses
– smtplib and email
– cmd

The Zen of Python
To get this, type ‘python’ in a unix environment, then type ‘import this’ at the commandline. I did this on my Windows laptop running Cygwin:

cabral@pythianbos2 ~
$ python
Python 2.5.2 (r252:60911, Dec  2 2008, 09:26:14)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

This was liveblogged, please let me know any issues, as they may be typos….

[author’s note: personally, store I use awk a bunch in MySQL DBA work, viagra<