Brief Outage for Phoenix Data Center Chassis

One of the chassis in the PHX1 datacenter was experiencing issues which took many services, including those on the generic web cluster offline and degraded others for approximately half an hour. Fixing the issue took approximately 15 minutes. Services should be back to normal.

For reference, the following web services were either downgraded, or unavailable:

generic cluster (contains many web apps)

bouncer
elasticsearch
etherpad
graphite
hangprocessor
input
input-celery
openshift
plugins and plugins memcached
puppetmaster
rabbit
socorro memcache

If you have any questions or concerns please address them to helpdesk@mozilla.com.

Bugzilla Feeling Slow?

We have been experiencing intermittent Bugzilla slowness since Wednesday, June 12th 2013 at 5 pm UTC (10 am US/Pacific time). We have been working throughout the weekend to pinpoint the cause of this irregular, but noticeable, issue. The problem is performance only, there have been no reports and no evidence of data or functionality loss. We will release additional information as we have it.

Update 18 Jun 2013 18:40 pm UTC: The Phoenix chassis outage was completely unrelated to this Bugzilla slowness. Bugzilla is in a different data center and neither caused nor affected the chassis problem, and the only effect the chassis problem had was to pull resources away from figuring out and fixing the bugzilla issue.

2013 SkySQL and MariaDB Solutions Videos Are Online!

I do not recall seeing an announcement about it, but I went looking for the videos today and lo and behold, they were up! Forgive me if I missed a post about it….but if you also missed it, here they are:

2013 SkySQL and MariaDB Solutions Day for the MySQL Database videos

A Different Spin On the max_allowed_packet Problem

Back in November, talking about a different type of max_allowed_packet problem.

See, an application had put data into the database, but could not retrieve it without getting max_allowed_packet. With the help of some really smart community folks (named Jesper Hansen, Brandon Johnson and Shane Bester), we determined that MySQL actually has 2 different max_allowed_packet settings: client and server.

When you change the max_allowed_packet variable, you are changing the server variable if it is in [mysqld] and the client variable if it is in [client] or [mysql] or whatever client you have. As far as we can tell, there’s no way to actually view what the client variable is, as looking at both the session and global max_allowed_packet variable shows you the server variable.

If max_allowed_packet is not set by the client, it defaults to 16M. The proposed solution is to allow it to be increased for non-interactive clients, and the bug has been verified as a “feature request”, though it has not been implemented yet.

ulimits and upgrading from Oracle MySQL 5.0 to Percona patched MySQL 5.1

After upgrading to Percona’s patched MySQL 5.1*, end users were having connectivity problems, and reporting errors such as:

OperationalError: (2003, "Can't connect to MySQL server on 'db-amo-ro' (110)")

TimeoutError: Request timed out after 5.000000 seconds

OperationalError: (1135, "Can't create a new thread (errno 11); if you are not out of available memory, you can consult the manual for a possible OS-dependent bug")

We had these same problems a while back, before increasing ulimit settings in /etc/sysconfig/mysqld. Oracle’s MySQL startup script specifically sources this file:

[ -e /etc/sysconfig/$prog ] && . /etc/sysconfig/$prog

However, we saw these errors again when we upgraded to Percona’s MySQL 5.1. At first we thought that it was because Oracle’s startup script is /etc/init.d/mysqld and Percona’s is named /etc/init.d/mysql (so we would put ulimits in /etc/sysconfig/mysql). However, when we looked, we saw that Percona’s startup script does NOT source anything in /etc/sysconfig.

So then we put the following in /etc/security/limits.d/99-nproc-mysql.conf:
root soft nproc 32768
root hard nproc 65535

We restarted MySQL and all was good. Even though we are long past having this problem, I thought it was important enough to blog about.

* We finished upgrading all of our servers to MySQL 5.1 at the end of 2012. We ran into this interesting snag that I wanted to blog about, even though we’re in the middle of upgrading to MySQL 5.5 right now (and by the end of the year, we will upgrade to MySQL 5.6 – the performance schema stuff is definitely something we want to utilize).

MySQL User Group Video – Determinism and Databases

The May Boston MySQL User Group featured John Hugg of VoltDB talking about determinism and databases. I have uploaded the hour-long video to and those who are not exactly sure what “determinism and databases” means will learn a lot.

Enjoy!

(As always, videos are free on YouTube with no login or attempt to solicit your e-mail address or any other information)

Upgrading support.mozilla.org databases

A while ago (November 2012 to be exact), we upgraded the support.mozilla.org databases from Percona 5.1 to MariaDB 5.5 (the next step, happening soon, is upgrading them to Oracle’s MySQL 5.6). One of the engineers and I had a conversation where he mentioned that “one of our worst performing views on SUMO is doing waaaayyy better with the upgraded databases”, that it “seems more stable” and that “I stopped receiving ‘MySQL went away or disconnected emails’ which came in once in a while.”

It’s always nice to see upgrades actually making a difference. In our case we saw a lot less CPU wait, though that might also be partially due to tuning the memory settings on the machines and adding in another read slave to handle queries. As a result, network traffic throughput went from less than 1 Mb/sec to about 18 Mb/sec, because the machines were just handling more queries per second, period.

(I had this e-mail as a draft for a while and decided to clean it up and publish it now!)

Women in Science and Engineering (WISE) Computing Skills Boot Camp

Software Carpentry is running a 2-day software skills boot camp in Boston, for women in science, engineering, medicine, and related research
areas. Registration is $20.

Boot camps alternate short tutorials with hands-on practical exercises. You are taught tools and concepts you can use immediately to increase your productivity and improve confidence in your results. Topics covered include the Unix shell, version control, basic Python programming, testing, and debugging — the core skills needed to write, test and manage research software.

This boot camp is open to women at all stages of their research careers, from graduate students, post-docs, and faculty to staff scientists at hospitals and in the public, private, and non-profit sectors.

Registration is $20; to sign up, or find out more, please visit the announcement at http://software-carpentry.org/blog/2013/04/announcing-wise-bootcamp.html. If you have questions, there is an e-mail link on the announcement page.

For those curious, they are using sqlite, not MySQL or PostgreSQL, and I will be helping out with the SQL parts. There are about 2 months left but the boot camp is about 2/3 full right now, so I wanted to make sure this opportunity was spread to as many people as possible so they do not hear about it too late.

Percona Live Has No Code of Conduct

I am not at Percona Live this week because I opted to stay home after a crazy year of travel (41 talks in 11 different countries on 3 continents in the past year). However, I realized today that Percona Live has no Code of Conduct.

I will not be attending any Percona Live events until there is an acceptable Code of Conduct. MySQL is the world’s most popular open source database; the community deserves a Code of Conduct.

ETA: I have contacted Kortney, the conference organizer for Percona Live, and asked for a Code of Conduct to be put in place ASAP.

ETA: If you want to know why this is an issue, see http://adainitiative.org/what-we-do/conference-policies/

ETA: This is my personal statement, and not a statement of what any of my Mozilla colleagues may feel. Other colleagues, including employees under me, may choose to attend or even present at any events they wish. I personally do not feel comfortable at a conference with no Code of Conduct, this is not a reflection on the technical merits of any conference.

BBLISA Lightning Talks

At this month’s Back Bay LISA, :

Back Bay LISA Lightning Talks
April 2013

  • Mentoring by Matt Finnigan
    (5:07)
    Matt Finnigan gave a talk discussing the LOPSA Mentorship program. If you aren’t familiar, the mentorship program is a free service offered by LOPSA, where any admin who needs help, either with a project or just general career guidance, can sign up to be connected to someone with experience in their target area. You need to be a LOPSA member in order to be a mentor, but being a protege is open to anyone, regardless of LOPSA membership.

  • Cooking by Adam Moskowitz
    (4:31)
    Adam Moskowitz gave a talk discussing cooking for system administrators. He appealed to our sense of making things as well as our need of healthy food and good value. Adam encouraged us to try cooking, and although most people thought it was expensive to property outfit a kitchen, he reminded us that it was actually a fraction of the price of our new laptops, and the kitchen gear would last a lot longer.

  • Amazon SMS by KM Peterson
    (3:06)

    This talk is a result of KM Peterson’s search for a provider-agnostic method to send SMS messages that didn’t break the bank or involve maintaining an array of modems. He ended up setting up a script to talk to Amazon’s SMS service, and provided us example code in his slides.

  • SmartOS by Nahum Shalman
    (4:25)
    Nahum Shalman gave a really nice introduction to SmartOS, a derivative of OpenSolaris which is maintained by Joyent. Interestingly, the Linux-native KVM was ported to the SmartOS kernel, allowing creative and secure uses of jails and virtual sandboxes, all taking advantage of native ZFS, dtrace, and all kinds of delicious Solaris-y goodness.

  • MySQL and Puppet by Sheeri Cabral
    (5:04)
    Sheeri Cabral came from Mozilla to talk with us about how they’re deploying MySQL using Puppet. Her slides had example code, and she walked us through the abstracted object and up to the deployment on the actual nodes.

  • Secrets by KM Peterson
    (3:00)
    KM Peterson’s”second talk was on Shamir’s Secret Sharing Scheme, aka ‘SSSS’. The idea behind this crypto tech is that you have a secret which you want to ensure can only be recovered by the collaboration of a minimum number of involved people – say three of your team of five. You encrypt the plaintext and generate as many keys as people you have, and tell the app how many should be required to release the information. To pull the data out, you provide any of the generated keys, as long as the number of different keys meets the minimum determined when the data was encrypted.

  • Stick Destroyer by John Jarvis
    (3:01)
    John Jarvis talked to us about a creative use for his Raspberry Pi – he securely erases flash media using Stick Destroyer. He rigged up a light so that you have a nice visual indicator of when the stick is being erased, and when it’s done.

  • Sensu by Pat Cable
    (3:26)
    Pat Cable showed up to talk about Sensu, a ruby-based monitoring solution that uses AMQP queues to distribute tasks around a monitoring infrastructure that can scale out horizontally to monitor extremely large numbers of machines. It’s definitely a “next gen” monitoring solution that you should be aware of.

  • Sysadmins and Doctors by Matt Simmons
    (4:36)
    I got up in front of everyone and talked briefly about something that I’ve noticed – mainly about how I see our profession splintering, but that the splintered elements (such as network and storage administrators) aren’t actually specialties of “system administrators”, it’s much more like the specialized administrators are specialist doctors, and system administrators are like general practitioners. The idea is still half baked, but that’s the fun of a lightning talk, right? I didn’t offer any answers, but I asked a lot of questions.

Enjoy!