Mozilla DB News, 14 Sept – GOOOOAAAALLLLLLs, and a lot of Bugzilla work

In 11 days I will be heading to NagiosWorld, and from there I go straight to MySQL Connect, and that is the end of September. So this week we have been focusing on wrapping up our 3rd quarter goals (and making new goals for the 4th quarter of 2012). We have also seemingly done a bunch of work on different aspects of the Bugzilla database clusters. This week, the database team has:

  • Built out new Bugzilla production database cluster in a failover data center, including monitoring.
  • Analyzed a full days’ worth of general and slow query logs for each database in the Bugzilla cluster so we can make optimization recommendations. Optimization recommendations are a goal for the 4th quarter, so this is pre-work so we can get a good metric for the goal.
  • Successfully implemented consistency checks for Bugzilla using pt-table-checksum, and a Nagios check from PalominoDB that we updated (the patch has been sent back to PalominoDB).
  • Upgraded the Bugzilla staging cluster to MariaDB 5.5.
  • Converted all of the Affiliates tables to UTF-8 instead of latin1. character-set-server has been set to utf8 for a while, but there were several legacy tables causing problems.
  • Created a new read-only slave cluster for Addons, so we can use it for version checking. This will be useful in our eventual change to the Mozilla Marketplace, which will house apps for our mobile platform, Firefox OS.
  • Assisted other folks on the Systems team in their goal to not have any users on any machines that are not in puppet. The DB team only had a few non-standard users on a few machines, but we did our part. 1200 machines were audited and I’m proud of my Systems teammates for getting through it all – we now have no non-puppetized users!
  • Rebuilt and patched the database servers holding our internal Puppet Dashboard data. The servers were deprecated last month, we were looking for a Puppet Dashboard replacement, but realized that there was no other appropriate dashboard for puppet, so we are back to using Puppet Dashboard.
  • Converted one of our development database clusters to use innodb_file_per_table and put it into puppet (most of our clusters are innodb_file_per_table at this point).
  • Upgraded one of our multi-use database clusters to use Percona’s patched MySQL 5.1 and put it into puppet.
  • Fixed some slow query log copies that were using a root account to use a less-privileged (but still in LDAP) account for slow query log copies.
  • Made staging and dev databases and users for Input, Mozilla’s primary user feedback application.
  • Fixed a replication issue on one of our multi-use database clusters – someone was trying to add foreign keys that already existed on the slave.
  • Created a new database for a new imaging service on Buildbot, Mozilla’s continuous integration tool.
  • Finished updating the MySQL ACL’s to include our failover load balancer, and to exclude the ACL’s from the previous data center that we finished moving from in May. This was a q3 goal; Postgres ACL’s still need to be looked at.
  • Pushed a large Mozilla rapid beta (aka “Mobeta”) update on our Postgres databases.
  • Helped debug an issue where the database behind basket, our newsletter subscription service, did not seem to be updating (it was a code issue, not a db issue).
  • Fixed an issue where a large transaction were causing our backup server to stop replicating a service. The transaction was more than 20G in size, but was on a database that was set with replicate_ignore_database, so it could be safely ignored.
  • Exported a list of Mozillian e-mails to help organize the community around Thunderbird.
  • Fixed an issue where replication on a backup server died due to “out of resources”.
  • Made a bug to check consistency for the Addons and Support database clusters using pt-table-checksum. This is a q4 goal.
  • Made bugs to get rid of MySQL 5.0 for a dozen clusters (yes, there are exactly 12 clusters still on MySQL 5.0 that we manage). 1 definitely can be decommissioned, 2 others may be decommissioned or may be upgraded to Percona’s patched MySQL 5.1, and the rest will be upgraded. These are q4 goals, so if all goes well, hopefully by the end of the year, the databases that the DB team is responsible for will not have any MySQL 5.0 servers. (next step, upgrade to MariaDB 5.5 in 2013!)

In 11 days I will be heading to NagiosWorld, and from there I go straight to MySQL Connect, and that is the end of September. So this week we have been focusing on wrapping up our 3rd quarter goals (and making new goals for the 4th quarter of 2012). We have also seemingly done a bunch of work on different aspects of the Bugzilla database clusters. This week, the database team has:

  • Built out new Bugzilla production database cluster in a failover data center, including monitoring.
  • Analyzed a full days’ worth of general and slow query logs for each database in the Bugzilla cluster so we can make optimization recommendations. Optimization recommendations are a goal for the 4th quarter, so this is pre-work so we can get a good metric for the goal.
  • Successfully implemented consistency checks for Bugzilla using pt-table-checksum, and a Nagios check from PalominoDB that we updated (the patch has been sent back to PalominoDB).
  • Upgraded the Bugzilla staging cluster to MariaDB 5.5.
  • Converted all of the Affiliates tables to UTF-8 instead of latin1. character-set-server has been set to utf8 for a while, but there were several legacy tables causing problems.
  • Created a new read-only slave cluster for Addons, so we can use it for version checking. This will be useful in our eventual change to the Mozilla Marketplace, which will house apps for our mobile platform, Firefox OS.
  • Assisted other folks on the Systems team in their goal to not have any users on any machines that are not in puppet. The DB team only had a few non-standard users on a few machines, but we did our part. 1200 machines were audited and I’m proud of my Systems teammates for getting through it all – we now have no non-puppetized users!
  • Rebuilt and patched the database servers holding our internal Puppet Dashboard data. The servers were deprecated last month, we were looking for a Puppet Dashboard replacement, but realized that there was no other appropriate dashboard for puppet, so we are back to using Puppet Dashboard.
  • Converted one of our development database clusters to use innodb_file_per_table and put it into puppet (most of our clusters are innodb_file_per_table at this point).
  • Upgraded one of our multi-use database clusters to use Percona’s patched MySQL 5.1 and put it into puppet.
  • Fixed some slow query log copies that were using a root account to use a less-privileged (but still in LDAP) account for slow query log copies.
  • Made staging and dev databases and users for Input, Mozilla’s primary user feedback application.
  • Fixed a replication issue on one of our multi-use database clusters – someone was trying to add foreign keys that already existed on the slave.
  • Created a new database for a new imaging service on Buildbot, Mozilla’s continuous integration tool.
  • Finished updating the MySQL ACL’s to include our failover load balancer, and to exclude the ACL’s from the previous data center that we finished moving from in May. This was a q3 goal; Postgres ACL’s still need to be looked at.
  • Pushed a large Mozilla rapid beta (aka “Mobeta”) update on our Postgres databases.
  • Helped debug an issue where the database behind basket, our newsletter subscription service, did not seem to be updating (it was a code issue, not a db issue).
  • Fixed an issue where a large transaction were causing our backup server to stop replicating a service. The transaction was more than 20G in size, but was on a database that was set with replicate_ignore_database, so it could be safely ignored.
  • Exported a list of Mozillian e-mails to help organize the community around Thunderbird.
  • Fixed an issue where replication on a backup server died due to “out of resources”.
  • Made a bug to check consistency for the Addons and Support database clusters using pt-table-checksum. This is a q4 goal.
  • Made bugs to get rid of MySQL 5.0 for a dozen clusters (yes, there are exactly 12 clusters still on MySQL 5.0 that we manage). 1 definitely can be decommissioned, 2 others may be decommissioned or may be upgraded to Percona’s patched MySQL 5.1, and the rest will be upgraded. These are q4 goals, so if all goes well, hopefully by the end of the year, the databases that the DB team is responsible for will not have any MySQL 5.0 servers. (next step, upgrade to MariaDB 5.5 in 2013!)