My Thoughts About MySQL 5.6

Last week I posted a but that was before the schedule itself was up, hospital so now I can present a list of session-by-session talks for developers who are building their schedules.

So here’s a guide to MySQL Connect for administrators, this with times. Note that these are handpicked from what I think administrators would be interested in. There are many more sessions than the ones listed here, so head on over to the Schedule Builder to build your own schedule:

Saturday, September 29th:
9-10:30 am
MySQL Connect Keynote: The State of the Dolphin by Tomas Ulin, VP and Edward Screven, Chief Corporate Architect, both of Oracle. I am pretty excited to see where Oracle is taking MySQL next!

11:30 am – 12:30 pm
There is a session if you want to learn What’s New In MySQL 5.6. Everyone thinks that there will be a new 5.6 release out (though we are all wondering if it will be a DMR, beta or release candidate release), so this will be a great session to go to, to learn about any new features released.

MySQL Optimizer Overview by Olav Sandstå. Get in depth as to how the optimizer works, so you have the knowledge to tune your server and queries.

There’s also Ronald Bradford’s session on Lessons from Managing 500+ MySQL Instances. Ronald always has great tips and tricks to make administering MySQL less painful and avoid problems.

1:00 – 2:00 pm
If you are a beginner, you will want to attend the Hands-on Lab Getting started with MySQL presented by Gillian Gunson and Alfredo Kojima, to learn the MySQL architecture, how to install and configure the MySQL server, and how to query and back up the database. You will also learn about error messages, accounts, datatypes, simple SQL statements and how to import data into and export it from the MySQL server. And remember, you are doing this all in front of a computer, because this is a hands-on lab. This hands on lab runs from 1-3:30 pm, so there is plenty of time to learn and do a lot!

Even if you are not a beginner, you are sure to learn some great Replication Tips and Tricks from Mats Kindahl. Mats will present a bag of useful tips and tricks related to the MySQL 5.5 GA and MySQL 5.6 development milestone releases, including multisource replication, using logs for auditing, handling filtering, examining the binary log, using relay slaves, splitting the replication stream, and handling failover.

I am personally excited for Rick’s Rules of Thumb by Rick James. Rick is always a great speaker and I learn so much from him!

2:30 – 3:30 pm
Backups are the single most important maintenance tool for MySQL. Hema Sridharam and Svetlana Smirnova present Save your Data: How to Make MySQL Backups. There are several tools to perform MySQL backups, including mysqldump, Oracle’s MySQL Enterprise Backup, third-party applications, and OS methods.

Henrik Ingo will speak about Evaluating MySQL High-Availability Alternatives, including replication, MySQL Cluster, DRBD, Tungsten and Galera. He will speak about the trade-offs of each method and why you might want to use each one, so you can decide what’s best for your environment.

And of course there’s Peter Zaitsev’s Optimizing MySQL Configuration, which is not-to-be missed!

4 – 5 pm
I am personally interested in Patrick Galbraith’s Database Resources On Demand, covering the concept of DBaaS, how an organization can use it, and what it means for DBAs for management and for developers in how they use database resources. Among the other topics it addresses is how open source technologies such as OpenStack provide an infrastructure that can be used in DBaaS.

Lately, solid-state disk drives have been getting a lot of attention. Vadim Tkachenko will speak about MySQL and Solid-State Drives: Usage and Tuning, covering SSD internals and how they affect database performance.

If you want to get your hands dirty with MySQL Cluster, join the Hands-on Labs by Santo Leto called Get Started with MySQL Cluster, where you will learn by doing: install, configure, administer, and access MySQL Cluster.

5:30 – 6:30 pm
If you want to know what replication features are coming up in MySQL 5.6, make sure to check out Lars Thalmann talking about Enabling the New Generation of Web and Cloud Services with MySQL 5.6 Replication. This session showcases the new replication features, including group commit and multithreaded slave for high performance, crash-safe slaves and failover utilities for high availability, global transaction identifiers and annotated row-based replication [RBR] for flexibility/usability, and event checksums for data integrity.

I have to promote my own session! I’m presenting Google-Hacking MySQL, which takes an in-depth look at using white-hat Google hacking techniques to show you what the “bad guys” can do. White-hat google hacking is the good kind of hacking, where you have permission. You will learn about the following hacking strategies and how they are done: SQL injection, cross-site scripting (XSS), cross-site request forgery (CSRF), gateway vulnerabilities, and social engineering—all without violating Google’s terms of service.

If you want to migrate your systems to MySQL, there is a presentation by Sergio Andres De La Cruz Rodriguez about Migrating from Microsoft SQL Server to MySQL: The New MySQL Migration Tool.

6:30 – 8:30 pm – MySQL Connect Reception in the Continental Ballroom

And there are Birds of a Feather session (BoFs) too!

Sunday, September 30th
8:30 am – 9:30 am
MySQL Perspectives Keynote featuring Twitter‘s DB Team manager Jeremy Cole, PayPal‘s Chief Architect Daniel Austin, Verizon Wireless‘ IT Director and DB Architect/Engineer, Ash Kanagat and Shivinder Singh, who will share their experiences and perspectives. I think this is going to be fascinating, and well-worth having to wake up early to get to the venue at 8:30 am.

There will be a special panel after that, but it has not been announced yet, so it looks like there is a hole in the schedule (but there is no hole!)

10:15 am – 11:15 am
Most of us have a shortage of DBAs. If you are in that situation, you probably have more money than time on your hands, so check out Rob Young’s talk on Optimizing Security, Performance, and Availability with MySQL Enterprise Edition. Yes, MySQL Enterprise Edition costs money, but it is much easier to buy it than to hire a top-notch DBA, which are rare in this world.

If you are a beginner DBA, attend the Hands on Lab entitled Focus on MySQL Replication, taught by Sven Sandberg and Luis Soares During this hands-on lab, you will learn how to get started, how replication works, and the best practices and tools. You will also learn about architecture, advanced replication configurations and some of the new features in the MySQL 5.6 development milestone releases. This session goes until 12:45, so you have a good 2.5 hours for the hands-on work.

Calvin Sun talks about Better Availability with InnoDB Online Operations, including online operations for schema changes such as add index, drop foreign key, and rename column.

11:45 – 12:45
Linas Virbalas and Robert Hodges of Continuent talk about Replicating from MySQL to Oracle Database and Back Again.

If you are into MySQL Security, Joro Kodinov will present MySQL Security: Past and Present. Since the description includes MySQL 5.6 security features, I would have called it “Past, Present and Future”.

Thinking of deploying, or already deployed, Galera? Then do not miss Seppo Jaakola’s talk on Galera Cluster Best Practices.

1:15 – 2:15 pm
Inaam Rana presents InnoDB Performance Tuning, which includes the newer features in MySQL 5.5 and the upcoming features in MySQL 5.6, including unique InnoDB architectural elements for performance and how to tune InnoDB to achieve better performance.

There is a world of tools designed to help make MySQL administration easier, so check out Charles Bell’s hands-on lab about MySQL Utilities where you can experiment with the tools.

My particular favorite for this time slot is Oystein Gravlen’s Query Performance Comparison of MySQL 5.5 and MySQL 5.6. I cannot wait to see how much improvement there is, and why!

2:45 pm -3:45 pm
Profiling with the Performance Schema, given by Mark Leith, will teach how to set up and use Performance Schema to perform everyday profiling and performance monitoring tasks, such as: finding problem queries; researching blocked hosts; profiling I/O usage; analyzing resource usage by schema, table, or user; or tracing a session to see exactly where it spends its time.

Alexander Rubin will talk about critical performance tuning information during In-Depth Query Optimization for MySQL.

Personally I’m not a fan of use cases, which are sessions like “how X company does Y with MySQL”, but since I’m giving one entitled Database Scaling at Mozilla, I should probably promote it. And I will note that these sessions are usually well-attended – I guess people want to see how the big players do things, even though they are only appropriate for about 5% of the DBAs out there. I will note that Mozilla has relatively small databases and high traffic, so our needs are similar to more DBAs out there, and hopefully our solutions will work for them.

4:15 – 5:15 pm
Grant McAlister of Amazon.com presents Durability Is Key: How to Protect Your Data from Corruption where he describes the differences between logical and physical corruption in MySQL and shows how to best protect your MySQL database from both types of corruption.

My former coworkers, Francisco Bordenave and Marco Tusa of Pythian, are presenting on Scaling MySQL with Multimaster Synchronous Replication, where they explain how they investigated and designed an architecture based on MySQL to support an application that served shops around the globe and to scale out and scale in, based on sales seasons.

Jonathon Coombes presents the hands-on lab MySQL Security: Authentication and Audit, a hands-on lab that starts with an introduction to the authentication plug-in API and how it works, then tries an example HTTP authentication plug-in. The lab takes you thorough setting up a Pluggable Authentication Module (PAM) plug-in to access the server OS user definitions. Then you will walk through the MySQL audit plug-in API and how it works, and experiment with the Oracle audit log plug-in and various events it can log. Participants will build and experiment with their own plug-in that forwards MySQL events to the OS logging APIs (syslogd on Linux and Windows Event Log on Windows).

5:45 – 6:45 pm
Tokutek’s Bradley Kuzsmaul defines big data as “several times as large as main memory”. If you have big data, check out his talk on
Solving the Challenges of Big Databases with MySQL.

Luis Soares teaches about using replication for high availability in Scaling for the Web and Cloud with MySQL Replication

Or, get information about fulltext search with Sphinx from the horse’s mouth – Andrew Aksynoff talks about Full-Text Search with MySQL and Sphinx.

From 7 – 9 pm on Sunday, there is the Taylor Street Open House, which is JavaOne’s opening event and our closing event.

It’s going to be an amazing event with tons of technical content. I feel like I have written a lot here, but these are simply the sessions I’m having trouble choosing between, or wish I could go to. There are tons more sessions than what I’ve written about!

Last week I posted a but that was before the schedule itself was up, hospital so now I can present a list of session-by-session talks for developers who are building their schedules.

So here’s a guide to MySQL Connect for administrators, this with times. Note that these are handpicked from what I think administrators would be interested in. There are many more sessions than the ones listed here, so head on over to the Schedule Builder to build your own schedule:

Saturday, September 29th:
9-10:30 am
MySQL Connect Keynote: The State of the Dolphin by Tomas Ulin, VP and Edward Screven, Chief Corporate Architect, both of Oracle. I am pretty excited to see where Oracle is taking MySQL next!

11:30 am – 12:30 pm
There is a session if you want to learn What’s New In MySQL 5.6. Everyone thinks that there will be a new 5.6 release out (though we are all wondering if it will be a DMR, beta or release candidate release), so this will be a great session to go to, to learn about any new features released.

MySQL Optimizer Overview by Olav Sandstå. Get in depth as to how the optimizer works, so you have the knowledge to tune your server and queries.

There’s also Ronald Bradford’s session on Lessons from Managing 500+ MySQL Instances. Ronald always has great tips and tricks to make administering MySQL less painful and avoid problems.

1:00 – 2:00 pm
If you are a beginner, you will want to attend the Hands-on Lab Getting started with MySQL presented by Gillian Gunson and Alfredo Kojima, to learn the MySQL architecture, how to install and configure the MySQL server, and how to query and back up the database. You will also learn about error messages, accounts, datatypes, simple SQL statements and how to import data into and export it from the MySQL server. And remember, you are doing this all in front of a computer, because this is a hands-on lab. This hands on lab runs from 1-3:30 pm, so there is plenty of time to learn and do a lot!

Even if you are not a beginner, you are sure to learn some great Replication Tips and Tricks from Mats Kindahl. Mats will present a bag of useful tips and tricks related to the MySQL 5.5 GA and MySQL 5.6 development milestone releases, including multisource replication, using logs for auditing, handling filtering, examining the binary log, using relay slaves, splitting the replication stream, and handling failover.

I am personally excited for Rick’s Rules of Thumb by Rick James. Rick is always a great speaker and I learn so much from him!

2:30 – 3:30 pm
Backups are the single most important maintenance tool for MySQL. Hema Sridharam and Svetlana Smirnova present Save your Data: How to Make MySQL Backups. There are several tools to perform MySQL backups, including mysqldump, Oracle’s MySQL Enterprise Backup, third-party applications, and OS methods.

Henrik Ingo will speak about Evaluating MySQL High-Availability Alternatives, including replication, MySQL Cluster, DRBD, Tungsten and Galera. He will speak about the trade-offs of each method and why you might want to use each one, so you can decide what’s best for your environment.

And of course there’s Peter Zaitsev’s Optimizing MySQL Configuration, which is not-to-be missed!

4 – 5 pm
I am personally interested in Patrick Galbraith’s Database Resources On Demand, covering the concept of DBaaS, how an organization can use it, and what it means for DBAs for management and for developers in how they use database resources. Among the other topics it addresses is how open source technologies such as OpenStack provide an infrastructure that can be used in DBaaS.

Lately, solid-state disk drives have been getting a lot of attention. Vadim Tkachenko will speak about MySQL and Solid-State Drives: Usage and Tuning, covering SSD internals and how they affect database performance.

If you want to get your hands dirty with MySQL Cluster, join the Hands-on Labs by Santo Leto called Get Started with MySQL Cluster, where you will learn by doing: install, configure, administer, and access MySQL Cluster.

5:30 – 6:30 pm
If you want to know what replication features are coming up in MySQL 5.6, make sure to check out Lars Thalmann talking about Enabling the New Generation of Web and Cloud Services with MySQL 5.6 Replication. This session showcases the new replication features, including group commit and multithreaded slave for high performance, crash-safe slaves and failover utilities for high availability, global transaction identifiers and annotated row-based replication [RBR] for flexibility/usability, and event checksums for data integrity.

I have to promote my own session! I’m presenting Google-Hacking MySQL, which takes an in-depth look at using white-hat Google hacking techniques to show you what the “bad guys” can do. White-hat google hacking is the good kind of hacking, where you have permission. You will learn about the following hacking strategies and how they are done: SQL injection, cross-site scripting (XSS), cross-site request forgery (CSRF), gateway vulnerabilities, and social engineering—all without violating Google’s terms of service.

If you want to migrate your systems to MySQL, there is a presentation by Sergio Andres De La Cruz Rodriguez about Migrating from Microsoft SQL Server to MySQL: The New MySQL Migration Tool.

6:30 – 8:30 pm – MySQL Connect Reception in the Continental Ballroom

And there are Birds of a Feather session (BoFs) too!

Sunday, September 30th
8:30 am – 9:30 am
MySQL Perspectives Keynote featuring Twitter‘s DB Team manager Jeremy Cole, PayPal‘s Chief Architect Daniel Austin, Verizon Wireless‘ IT Director and DB Architect/Engineer, Ash Kanagat and Shivinder Singh, who will share their experiences and perspectives. I think this is going to be fascinating, and well-worth having to wake up early to get to the venue at 8:30 am.

There will be a special panel after that, but it has not been announced yet, so it looks like there is a hole in the schedule (but there is no hole!)

10:15 am – 11:15 am
Most of us have a shortage of DBAs. If you are in that situation, you probably have more money than time on your hands, so check out Rob Young’s talk on Optimizing Security, Performance, and Availability with MySQL Enterprise Edition. Yes, MySQL Enterprise Edition costs money, but it is much easier to buy it than to hire a top-notch DBA, which are rare in this world.

If you are a beginner DBA, attend the Hands on Lab entitled Focus on MySQL Replication, taught by Sven Sandberg and Luis Soares During this hands-on lab, you will learn how to get started, how replication works, and the best practices and tools. You will also learn about architecture, advanced replication configurations and some of the new features in the MySQL 5.6 development milestone releases. This session goes until 12:45, so you have a good 2.5 hours for the hands-on work.

Calvin Sun talks about Better Availability with InnoDB Online Operations, including online operations for schema changes such as add index, drop foreign key, and rename column.

11:45 – 12:45
Linas Virbalas and Robert Hodges of Continuent talk about Replicating from MySQL to Oracle Database and Back Again.

If you are into MySQL Security, Joro Kodinov will present MySQL Security: Past and Present. Since the description includes MySQL 5.6 security features, I would have called it “Past, Present and Future”.

Thinking of deploying, or already deployed, Galera? Then do not miss Seppo Jaakola’s talk on Galera Cluster Best Practices.

1:15 – 2:15 pm
Inaam Rana presents InnoDB Performance Tuning, which includes the newer features in MySQL 5.5 and the upcoming features in MySQL 5.6, including unique InnoDB architectural elements for performance and how to tune InnoDB to achieve better performance.

There is a world of tools designed to help make MySQL administration easier, so check out Charles Bell’s hands-on lab about MySQL Utilities where you can experiment with the tools.

My particular favorite for this time slot is Oystein Gravlen’s Query Performance Comparison of MySQL 5.5 and MySQL 5.6. I cannot wait to see how much improvement there is, and why!

2:45 pm -3:45 pm
Profiling with the Performance Schema, given by Mark Leith, will teach how to set up and use Performance Schema to perform everyday profiling and performance monitoring tasks, such as: finding problem queries; researching blocked hosts; profiling I/O usage; analyzing resource usage by schema, table, or user; or tracing a session to see exactly where it spends its time.

Alexander Rubin will talk about critical performance tuning information during In-Depth Query Optimization for MySQL.

Personally I’m not a fan of use cases, which are sessions like “how X company does Y with MySQL”, but since I’m giving one entitled Database Scaling at Mozilla, I should probably promote it. And I will note that these sessions are usually well-attended – I guess people want to see how the big players do things, even though they are only appropriate for about 5% of the DBAs out there. I will note that Mozilla has relatively small databases and high traffic, so our needs are similar to more DBAs out there, and hopefully our solutions will work for them.

4:15 – 5:15 pm
Grant McAlister of Amazon.com presents Durability Is Key: How to Protect Your Data from Corruption where he describes the differences between logical and physical corruption in MySQL and shows how to best protect your MySQL database from both types of corruption.

My former coworkers, Francisco Bordenave and Marco Tusa of Pythian, are presenting on Scaling MySQL with Multimaster Synchronous Replication, where they explain how they investigated and designed an architecture based on MySQL to support an application that served shops around the globe and to scale out and scale in, based on sales seasons.

Jonathon Coombes presents the hands-on lab MySQL Security: Authentication and Audit, a hands-on lab that starts with an introduction to the authentication plug-in API and how it works, then tries an example HTTP authentication plug-in. The lab takes you thorough setting up a Pluggable Authentication Module (PAM) plug-in to access the server OS user definitions. Then you will walk through the MySQL audit plug-in API and how it works, and experiment with the Oracle audit log plug-in and various events it can log. Participants will build and experiment with their own plug-in that forwards MySQL events to the OS logging APIs (syslogd on Linux and Windows Event Log on Windows).

5:45 – 6:45 pm
Tokutek’s Bradley Kuzsmaul defines big data as “several times as large as main memory”. If you have big data, check out his talk on
Solving the Challenges of Big Databases with MySQL.

Luis Soares teaches about using replication for high availability in Scaling for the Web and Cloud with MySQL Replication

Or, get information about fulltext search with Sphinx from the horse’s mouth – Andrew Aksynoff talks about Full-Text Search with MySQL and Sphinx.

From 7 – 9 pm on Sunday, there is the Taylor Street Open House, which is JavaOne’s opening event and our closing event.

It’s going to be an amazing event with tons of technical content. I feel like I have written a lot here, but these are simply the sessions I’m having trouble choosing between, or wish I could go to. There are tons more sessions than what I’ve written about!

I was very excited about this session at MySQL Connect by Daniel Austin of because the description was fantastic but the talk itself could have fallen apart.

After seeing the keynote, buy I knew the talk would be fantastic. I was not disappointed.

Big myths about Big Data:
PayPal problem – “How do we manage reliable distribution of data across geographical distances?”

The first thing people think of when they think of “big data” is “NoSQL”. NoSQL provides a solution that relaxes many of the common RDBMS constraints – too slow, stuff requires complex data management like Sarbanes-Oxley (SOX), try costly to maintain, slow to change and adapt, intolerant of CAP models.

NoSQL are non-relational models usually (not always) like key-value stores. They may be batched or streaming, and they are not necessarily distributed geographically (but they are at PayPal).

Big data myth #1 – Big data = nosql
Big data refers to a common set of problems – large volumes, high rates of change – data/data models/presentation and output.
Often big data isn’t just big, it’s that it needs to be FAST too. Things like near real-time analytics, or mapping complex structures.

3 kinds of big data systems:
1) Columnar key-value systems – Hadoop, Hbase, Cassandra, PNUTs
2) Document-based – MongoDB, TerraCotta
3) Graph-based – Voldemort, FlockDB. These are the more interesting ones, the other 2 are a bit more “brute force” according to Daniel.

big data hype slide

The CAP theorem (Daniel Abadi added latency)
The nice sound byte is:
“You can’t really trade availability for consistency, because if it’s not available you have no idea if it’s consistent or now.”

Do you need a big data system?
What’s your problem – one of 2:
1) I have too much data and it’s coming in too fast to handle with any RDBMS
(e.g. sensor data)

2) I have a lot of data distributed geographically and need to be able to read and write from anywhere in near real-time. (PayPal’s problem)

If you have one of those 2 problems, you may have a problem that can be solved with NoSQL,

Myth: Big Data and NoSQL are not new ideas. DNS was the first and most successful such system, created in 1983 [Sheeri says: memcached is NoSQL – key/value store]:

big data/nosql is not new

YESQL: A counter example. The mission – develop a globally distributed db for user-related data. Here are the constraints:
– Must not fail
– Must not lose data (it’s your MONEY!!)
– Must support transactions
– Must support (some) SQL
– Must WriteRead 32-bit integer globally in 1000ms (1sec)
– Max data volume: 100 TB
– Must scale linearly with costs

Speed constraints:
Max lightspeed distance on earth’s surface – 67ms. Target – data available worldwide in 1000ms

They chose to use MySQL Cluster because:
– True HA by design
– …with fast recovery
– Supports (some) transactions
– Relational model
– In-memory architecture, which translates to high performance
– Disk storage available for non-indexed data
– APIs to make things easier. Can’t just use ODBC or JDBC for this, need high performance APIs.

There are cons to MySQL cluster:
– some semantic limitations on fields (already lifted, but weren’t when PayPal was looking for a solution)
– Size constraints (about 2 Tb) – back when Cluster couldn’t do 64-bit, so this is resolved now.
– Hardware constraints
– Higher cost/byte
– Requires reasonable data partitioning
– Higher complexity

They use circular replication/failover with cluster. They have 4 nodes, talking to each other, keeping themselves in sync. If node C fails, node B can talk to node D – that’s what this pic shows:

circular replication/failover with mysql cluster

When C comes back up you have to move it back to the *end* of the replication flow chain so it can catch up.

Availability defined – availability of the entire system is:

Built this in Amazon Web Services (AWS)
– Why AWS? Cheap and easy infrastructure-in-a-box – or so they thought!
Services used:
– EC2, CentOS 5.3, small instances for the management (mgm) and query nodes, XL instances for data – 4×4 with 24G each, each “tile” is 96G RAM)
– Elastic IPs/ELB
– EBS Volumes, used to have to use dd to move images from one AWS data center to another
– S3
– Cloudwatch for monitoring

Architectural tiles – developed in a paper with Donald Knuth. Picture on this slide:
– Never separate NDB and SQL
– 2 NDB (aka data) nodes for every SQL node for every 1 management nodes
– For scaling, bring up a new tile, not just a new machine – they use a RightScale template
– Failover first to the nearest availability zone, then to the nearest data center
– At least one replica for every availability zone
– No shared nodes
– Some management nodes are redundant, that’s OK
– AWS is network-bound at 250 Mb per second!
– Need specific ACL across availability zone boundaries
– AZ’s not uniform,
– No GSLB – global server load balancing
– Dynamic IPs
– ELB sticky sessions are unreliable – this is fixed now in AWS
– con: have to upgrade the whole tile at once

Other tech considered:
– Paxos – elegant-but-complex consensus-based messaging protocol. Used in Google Megastore, Bing metadata
– Java Query caching – queries as serialized objects – but not yet working
– Multiple ring architectures, but those are even more complicated.

System r/w performance:
– 23 & 256 byte char fields
– reads/writes/query speed vs. volume
– data replication speeds

Results:
– global replication in under 350 ms
– 256 bytes read in under 1000 ms worldwide.

Data models and query optimization
– network latency (obvious issue)
– data model requires all segments present in each geo-region
– parameterized (linked) joins – adaptive query localization (SIP) technique from Clustra – see Frazer Clement’s blog for details)

they went around the international date line the wrong way at first….commit ordering matters!
Order in which you do writes vs. reads is important! Writes don’t always happen at the same time you start them at.

Be careful:
– with “eventual consistency”-related concepts
– ACID/CAP are not really as well-defined considering how often we invoke them
– MySQL Cluster is good, b/c it has real HA, real SQL. Notable limits around fields, datatypes, but successfully competes with NoSQL for many use cases, often is better
– NoSQL has relatively low maturity, MySQL Cluster is much more mature.
– Don’t be a victim of Technological Fashion!

Fugure directions:
– Alternatives using Pacemaker, Heartbeat (using InnoDB, Yves Trudeau at Percona)
– Implement the memcached plugin – add simple connection-based persistence to preserve connections during failover
– better monitoring
– -better data node distribution

Summing up on “YESQL 0.85″:
– it works, better than expected!
– very fast
– very reliable
– reduced complexity since 0.7
– AWS poses cahallenges that private data centers might not have

Only use big data solutions when you have a REAL big data problems. Not all big data solutions are createdeuqal. What tradeoffs are important – consstency, fault tolerance, etc.
– You can achieve high performance and aviailability w/out giving up on relational models…

Maynard
Keynes on “NoSQL Databases”
In the long run, we are all dead eventually consistent).

Last week I posted a but that was before the schedule itself was up, hospital so now I can present a list of session-by-session talks for developers who are building their schedules.

So here’s a guide to MySQL Connect for administrators, this with times. Note that these are handpicked from what I think administrators would be interested in. There are many more sessions than the ones listed here, so head on over to the Schedule Builder to build your own schedule:

Saturday, September 29th:
9-10:30 am
MySQL Connect Keynote: The State of the Dolphin by Tomas Ulin, VP and Edward Screven, Chief Corporate Architect, both of Oracle. I am pretty excited to see where Oracle is taking MySQL next!

11:30 am – 12:30 pm
There is a session if you want to learn What’s New In MySQL 5.6. Everyone thinks that there will be a new 5.6 release out (though we are all wondering if it will be a DMR, beta or release candidate release), so this will be a great session to go to, to learn about any new features released.

MySQL Optimizer Overview by Olav Sandstå. Get in depth as to how the optimizer works, so you have the knowledge to tune your server and queries.

There’s also Ronald Bradford’s session on Lessons from Managing 500+ MySQL Instances. Ronald always has great tips and tricks to make administering MySQL less painful and avoid problems.

1:00 – 2:00 pm
If you are a beginner, you will want to attend the Hands-on Lab Getting started with MySQL presented by Gillian Gunson and Alfredo Kojima, to learn the MySQL architecture, how to install and configure the MySQL server, and how to query and back up the database. You will also learn about error messages, accounts, datatypes, simple SQL statements and how to import data into and export it from the MySQL server. And remember, you are doing this all in front of a computer, because this is a hands-on lab. This hands on lab runs from 1-3:30 pm, so there is plenty of time to learn and do a lot!

Even if you are not a beginner, you are sure to learn some great Replication Tips and Tricks from Mats Kindahl. Mats will present a bag of useful tips and tricks related to the MySQL 5.5 GA and MySQL 5.6 development milestone releases, including multisource replication, using logs for auditing, handling filtering, examining the binary log, using relay slaves, splitting the replication stream, and handling failover.

I am personally excited for Rick’s Rules of Thumb by Rick James. Rick is always a great speaker and I learn so much from him!

2:30 – 3:30 pm
Backups are the single most important maintenance tool for MySQL. Hema Sridharam and Svetlana Smirnova present Save your Data: How to Make MySQL Backups. There are several tools to perform MySQL backups, including mysqldump, Oracle’s MySQL Enterprise Backup, third-party applications, and OS methods.

Henrik Ingo will speak about Evaluating MySQL High-Availability Alternatives, including replication, MySQL Cluster, DRBD, Tungsten and Galera. He will speak about the trade-offs of each method and why you might want to use each one, so you can decide what’s best for your environment.

And of course there’s Peter Zaitsev’s Optimizing MySQL Configuration, which is not-to-be missed!

4 – 5 pm
I am personally interested in Patrick Galbraith’s Database Resources On Demand, covering the concept of DBaaS, how an organization can use it, and what it means for DBAs for management and for developers in how they use database resources. Among the other topics it addresses is how open source technologies such as OpenStack provide an infrastructure that can be used in DBaaS.

Lately, solid-state disk drives have been getting a lot of attention. Vadim Tkachenko will speak about MySQL and Solid-State Drives: Usage and Tuning, covering SSD internals and how they affect database performance.

If you want to get your hands dirty with MySQL Cluster, join the Hands-on Labs by Santo Leto called Get Started with MySQL Cluster, where you will learn by doing: install, configure, administer, and access MySQL Cluster.

5:30 – 6:30 pm
If you want to know what replication features are coming up in MySQL 5.6, make sure to check out Lars Thalmann talking about Enabling the New Generation of Web and Cloud Services with MySQL 5.6 Replication. This session showcases the new replication features, including group commit and multithreaded slave for high performance, crash-safe slaves and failover utilities for high availability, global transaction identifiers and annotated row-based replication [RBR] for flexibility/usability, and event checksums for data integrity.

I have to promote my own session! I’m presenting Google-Hacking MySQL, which takes an in-depth look at using white-hat Google hacking techniques to show you what the “bad guys” can do. White-hat google hacking is the good kind of hacking, where you have permission. You will learn about the following hacking strategies and how they are done: SQL injection, cross-site scripting (XSS), cross-site request forgery (CSRF), gateway vulnerabilities, and social engineering—all without violating Google’s terms of service.

If you want to migrate your systems to MySQL, there is a presentation by Sergio Andres De La Cruz Rodriguez about Migrating from Microsoft SQL Server to MySQL: The New MySQL Migration Tool.

6:30 – 8:30 pm – MySQL Connect Reception in the Continental Ballroom

And there are Birds of a Feather session (BoFs) too!

Sunday, September 30th
8:30 am – 9:30 am
MySQL Perspectives Keynote featuring Twitter‘s DB Team manager Jeremy Cole, PayPal‘s Chief Architect Daniel Austin, Verizon Wireless‘ IT Director and DB Architect/Engineer, Ash Kanagat and Shivinder Singh, who will share their experiences and perspectives. I think this is going to be fascinating, and well-worth having to wake up early to get to the venue at 8:30 am.

There will be a special panel after that, but it has not been announced yet, so it looks like there is a hole in the schedule (but there is no hole!)

10:15 am – 11:15 am
Most of us have a shortage of DBAs. If you are in that situation, you probably have more money than time on your hands, so check out Rob Young’s talk on Optimizing Security, Performance, and Availability with MySQL Enterprise Edition. Yes, MySQL Enterprise Edition costs money, but it is much easier to buy it than to hire a top-notch DBA, which are rare in this world.

If you are a beginner DBA, attend the Hands on Lab entitled Focus on MySQL Replication, taught by Sven Sandberg and Luis Soares During this hands-on lab, you will learn how to get started, how replication works, and the best practices and tools. You will also learn about architecture, advanced replication configurations and some of the new features in the MySQL 5.6 development milestone releases. This session goes until 12:45, so you have a good 2.5 hours for the hands-on work.

Calvin Sun talks about Better Availability with InnoDB Online Operations, including online operations for schema changes such as add index, drop foreign key, and rename column.

11:45 – 12:45
Linas Virbalas and Robert Hodges of Continuent talk about Replicating from MySQL to Oracle Database and Back Again.

If you are into MySQL Security, Joro Kodinov will present MySQL Security: Past and Present. Since the description includes MySQL 5.6 security features, I would have called it “Past, Present and Future”.

Thinking of deploying, or already deployed, Galera? Then do not miss Seppo Jaakola’s talk on Galera Cluster Best Practices.

1:15 – 2:15 pm
Inaam Rana presents InnoDB Performance Tuning, which includes the newer features in MySQL 5.5 and the upcoming features in MySQL 5.6, including unique InnoDB architectural elements for performance and how to tune InnoDB to achieve better performance.

There is a world of tools designed to help make MySQL administration easier, so check out Charles Bell’s hands-on lab about MySQL Utilities where you can experiment with the tools.

My particular favorite for this time slot is Oystein Gravlen’s Query Performance Comparison of MySQL 5.5 and MySQL 5.6. I cannot wait to see how much improvement there is, and why!

2:45 pm -3:45 pm
Profiling with the Performance Schema, given by Mark Leith, will teach how to set up and use Performance Schema to perform everyday profiling and performance monitoring tasks, such as: finding problem queries; researching blocked hosts; profiling I/O usage; analyzing resource usage by schema, table, or user; or tracing a session to see exactly where it spends its time.

Alexander Rubin will talk about critical performance tuning information during In-Depth Query Optimization for MySQL.

Personally I’m not a fan of use cases, which are sessions like “how X company does Y with MySQL”, but since I’m giving one entitled Database Scaling at Mozilla, I should probably promote it. And I will note that these sessions are usually well-attended – I guess people want to see how the big players do things, even though they are only appropriate for about 5% of the DBAs out there. I will note that Mozilla has relatively small databases and high traffic, so our needs are similar to more DBAs out there, and hopefully our solutions will work for them.

4:15 – 5:15 pm
Grant McAlister of Amazon.com presents Durability Is Key: How to Protect Your Data from Corruption where he describes the differences between logical and physical corruption in MySQL and shows how to best protect your MySQL database from both types of corruption.

My former coworkers, Francisco Bordenave and Marco Tusa of Pythian, are presenting on Scaling MySQL with Multimaster Synchronous Replication, where they explain how they investigated and designed an architecture based on MySQL to support an application that served shops around the globe and to scale out and scale in, based on sales seasons.

Jonathon Coombes presents the hands-on lab MySQL Security: Authentication and Audit, a hands-on lab that starts with an introduction to the authentication plug-in API and how it works, then tries an example HTTP authentication plug-in. The lab takes you thorough setting up a Pluggable Authentication Module (PAM) plug-in to access the server OS user definitions. Then you will walk through the MySQL audit plug-in API and how it works, and experiment with the Oracle audit log plug-in and various events it can log. Participants will build and experiment with their own plug-in that forwards MySQL events to the OS logging APIs (syslogd on Linux and Windows Event Log on Windows).

5:45 – 6:45 pm
Tokutek’s Bradley Kuzsmaul defines big data as “several times as large as main memory”. If you have big data, check out his talk on
Solving the Challenges of Big Databases with MySQL.

Luis Soares teaches about using replication for high availability in Scaling for the Web and Cloud with MySQL Replication

Or, get information about fulltext search with Sphinx from the horse’s mouth – Andrew Aksynoff talks about Full-Text Search with MySQL and Sphinx.

From 7 – 9 pm on Sunday, there is the Taylor Street Open House, which is JavaOne’s opening event and our closing event.

It’s going to be an amazing event with tons of technical content. I feel like I have written a lot here, but these are simply the sessions I’m having trouble choosing between, or wish I could go to. There are tons more sessions than what I’ve written about!

I was very excited about this session at MySQL Connect by Daniel Austin of because the description was fantastic but the talk itself could have fallen apart.

After seeing the keynote, buy I knew the talk would be fantastic. I was not disappointed.

Big myths about Big Data:
PayPal problem – “How do we manage reliable distribution of data across geographical distances?”

The first thing people think of when they think of “big data” is “NoSQL”. NoSQL provides a solution that relaxes many of the common RDBMS constraints – too slow, stuff requires complex data management like Sarbanes-Oxley (SOX), try costly to maintain, slow to change and adapt, intolerant of CAP models.

NoSQL are non-relational models usually (not always) like key-value stores. They may be batched or streaming, and they are not necessarily distributed geographically (but they are at PayPal).

Big data myth #1 – Big data = nosql
Big data refers to a common set of problems – large volumes, high rates of change – data/data models/presentation and output.
Often big data isn’t just big, it’s that it needs to be FAST too. Things like near real-time analytics, or mapping complex structures.

3 kinds of big data systems:
1) Columnar key-value systems – Hadoop, Hbase, Cassandra, PNUTs
2) Document-based – MongoDB, TerraCotta
3) Graph-based – Voldemort, FlockDB. These are the more interesting ones, the other 2 are a bit more “brute force” according to Daniel.

big data hype slide

The CAP theorem (Daniel Abadi added latency)
The nice sound byte is:
“You can’t really trade availability for consistency, because if it’s not available you have no idea if it’s consistent or now.”

Do you need a big data system?
What’s your problem – one of 2:
1) I have too much data and it’s coming in too fast to handle with any RDBMS
(e.g. sensor data)

2) I have a lot of data distributed geographically and need to be able to read and write from anywhere in near real-time. (PayPal’s problem)

If you have one of those 2 problems, you may have a problem that can be solved with NoSQL,

Myth: Big Data and NoSQL are not new ideas. DNS was the first and most successful such system, created in 1983 [Sheeri says: memcached is NoSQL – key/value store]:

big data/nosql is not new

YESQL: A counter example. The mission – develop a globally distributed db for user-related data. Here are the constraints:
– Must not fail
– Must not lose data (it’s your MONEY!!)
– Must support transactions
– Must support (some) SQL
– Must WriteRead 32-bit integer globally in 1000ms (1sec)
– Max data volume: 100 TB
– Must scale linearly with costs

Speed constraints:
Max lightspeed distance on earth’s surface – 67ms. Target – data available worldwide in 1000ms

They chose to use MySQL Cluster because:
– True HA by design
– …with fast recovery
– Supports (some) transactions
– Relational model
– In-memory architecture, which translates to high performance
– Disk storage available for non-indexed data
– APIs to make things easier. Can’t just use ODBC or JDBC for this, need high performance APIs.

There are cons to MySQL cluster:
– some semantic limitations on fields (already lifted, but weren’t when PayPal was looking for a solution)
– Size constraints (about 2 Tb) – back when Cluster couldn’t do 64-bit, so this is resolved now.
– Hardware constraints
– Higher cost/byte
– Requires reasonable data partitioning
– Higher complexity

They use circular replication/failover with cluster. They have 4 nodes, talking to each other, keeping themselves in sync. If node C fails, node B can talk to node D – that’s what this pic shows:

circular replication/failover with mysql cluster

When C comes back up you have to move it back to the *end* of the replication flow chain so it can catch up.

Availability defined – availability of the entire system is:

Built this in Amazon Web Services (AWS)
– Why AWS? Cheap and easy infrastructure-in-a-box – or so they thought!
Services used:
– EC2, CentOS 5.3, small instances for the management (mgm) and query nodes, XL instances for data – 4×4 with 24G each, each “tile” is 96G RAM)
– Elastic IPs/ELB
– EBS Volumes, used to have to use dd to move images from one AWS data center to another
– S3
– Cloudwatch for monitoring

Architectural tiles – developed in a paper with Donald Knuth. Picture on this slide:
– Never separate NDB and SQL
– 2 NDB (aka data) nodes for every SQL node for every 1 management nodes
– For scaling, bring up a new tile, not just a new machine – they use a RightScale template
– Failover first to the nearest availability zone, then to the nearest data center
– At least one replica for every availability zone
– No shared nodes
– Some management nodes are redundant, that’s OK
– AWS is network-bound at 250 Mb per second!
– Need specific ACL across availability zone boundaries
– AZ’s not uniform,
– No GSLB – global server load balancing
– Dynamic IPs
– ELB sticky sessions are unreliable – this is fixed now in AWS
– con: have to upgrade the whole tile at once

Other tech considered:
– Paxos – elegant-but-complex consensus-based messaging protocol. Used in Google Megastore, Bing metadata
– Java Query caching – queries as serialized objects – but not yet working
– Multiple ring architectures, but those are even more complicated.

System r/w performance:
– 23 & 256 byte char fields
– reads/writes/query speed vs. volume
– data replication speeds

Results:
– global replication in under 350 ms
– 256 bytes read in under 1000 ms worldwide.

Data models and query optimization
– network latency (obvious issue)
– data model requires all segments present in each geo-region
– parameterized (linked) joins – adaptive query localization (SIP) technique from Clustra – see Frazer Clement’s blog for details)

they went around the international date line the wrong way at first….commit ordering matters!
Order in which you do writes vs. reads is important! Writes don’t always happen at the same time you start them at.

Be careful:
– with “eventual consistency”-related concepts
– ACID/CAP are not really as well-defined considering how often we invoke them
– MySQL Cluster is good, b/c it has real HA, real SQL. Notable limits around fields, datatypes, but successfully competes with NoSQL for many use cases, often is better
– NoSQL has relatively low maturity, MySQL Cluster is much more mature.
– Don’t be a victim of Technological Fashion!

Fugure directions:
– Alternatives using Pacemaker, Heartbeat (using InnoDB, Yves Trudeau at Percona)
– Implement the memcached plugin – add simple connection-based persistence to preserve connections during failover
– better monitoring
– -better data node distribution

Summing up on “YESQL 0.85″:
– it works, better than expected!
– very fast
– very reliable
– reduced complexity since 0.7
– AWS poses cahallenges that private data centers might not have

Only use big data solutions when you have a REAL big data problems. Not all big data solutions are createdeuqal. What tradeoffs are important – consstency, fault tolerance, etc.
– You can achieve high performance and aviailability w/out giving up on relational models…

Maynard
Keynes on “NoSQL Databases”
In the long run, we are all dead eventually consistent).

If you are reading this blog post, approved you are probably not at MySQL Connect. You may have heard about today’s new release – MySQL 5.6.7. This is a release candidate quality release, and if Oracle treats MySQL like the rest of its software, that means that there will very likely be a 5.6 GA by the end of 2012.

That all being said, is MySQL 5.6 worth upgrading to, once it’s GA? Probably the most compelling reason to upgrade is InnoDB online DDL – including online add/drop indexes (including foreign keys) and online add/drop/rename columns.

There are some great InnoDB performance enhancements, which you can read about if you are inclined to look further into it. Those are interesting, but it’s hard to say how much improvement any one organization will get until they actually test their system. So I won’t go into it too much until I have had time to see if Mozilla would benefit from it. Similarly, the fact that MySQL can now support parallel threading up to 48 cores is also great – Oracle tested on a 96-core server and got 48 cores working in parallel.

One of the most commonly used SQL extensions has gotten lots of new features added – EXPLAIN. In MySQL 5.6 you can now use EXPLAIN on SELECT, UPDATE and DELETE queries. There is also a visual EXPLAIN output and the output can be stored in JSON format. Here is a simple example of the new syntax and format:

mysql> EXPLAIN FORMAT=JSON DELETE FROM dup_index WHERE id=1G
*************************** 1. row ***************************
EXPLAIN: {
   "query_block": {
     "select_id": 1,
     "table": {
       "delete": true,
       "table_name": "dup_index",
       "access_type": "range",
       "possible_keys": [
         "id",
         "id_2"
       ],
       "key": "id",
       "key_length": "5",
       "rows": 1,
       "filtered": 100,
       "attached_condition": "(`version`.`dup_index`.`id` = 1)"
     }
   }
}
1 row in set (0.00 sec)

Personally, I am pretty excited about the new security features of MySQL 5.6. The biggest one, which is a pretty big change to watch out for when upgrading, is that secure_auth defaults to on, unless you specify skip-secure-auth in the configuration. This means that when you upgrade, any user in the old password format (the password hash is 16 characters) will be blocked.

Other security features have to do with passwords – in MySQL 5.6 you can force a user to do a password change the next time they login (great for first-time logins, and no other commands will work until the password is changed), you can set a password expiration, and you can set a password strength that has to be met.

MySQL will also warn you when you set a replication password without using SSL, or when it is stored in cleartext. For example, the normal setting of replication’s username and password will generate the following 2 notes:

mysql> SHOW WARNINGSG
*************************** 1. row ***************************
Level: Note
Code: 1759
Message: Sending passwords in plain text without SSL/TLS is extremely insecure.
*************************** 2. row ***************************
Level: Note
Code: 1760
Message: Storing MySQL user name or password information in the master.info repository is not secure and is therefore not recommended. Please see the MySQL Manual for more about this issue and possible alternatives.
2 rows in set (0.00 sec)

In MySQL 5.6, you can now store replication information in a table, not just in master.info.

I am also excited about having checksums in replication. Using pt-table-checksum can get tedious, and it only finds inconsistencies after the fact, it doesn’t prevent the inconsistencies or give an error exactly when the inconsistency occurs.

Another really nice replication change is that you can control row-based binary logging so it only logs a change in a row, not the entire changed row itself. This reduces overhead in row-based replication by a lot.

There are some nice little touches that show that Oracle is going in the right direction with MySQL – for example, in MySQL 5.6, innodb_file_per_table is enabled by default. And there is a new feature that warns you with a “note” if you create a duplicate index:

mysql> ALTER TABLE dup_index ADD INDEX(id);
Query OK, 0 rows affected, 1 warning (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 1

mysql> show warningsG
*************************** 1. row ***************************
Level: Note
Code: 1831
Message: Duplicate index 'id_2' defined on the table 'version.dup_index'. This is deprecated and will be disallowed in a future release.
1 row in set (0.00 sec)

This note only appears if you make an index with the same fields as another index; if you create an index that’s a prefix subset of another index, there is no warning (e.g. if you have an index on (a,b) and create an index on (a), there is no warning). Still, it is a good step in the right direction.

By default, sql_mode is no longer blank:
mysql> show variables like 'sql_mode'G
*************************** 1. row ***************************
Variable_name: sql_mode
Value: NO_ENGINE_SUBSTITUTION
1 row in set (0.00 sec)

If you use statements like UPDATE...LIMIT x and fill up your error logs with messages that the transaction is “unsafe”. There is now a warning suppression system, so that after 50 warnings in 50 seconds, the warnings will be aggregated with X warnings in Y seconds.

Other neat features I think I will make use of are:
sync_binlog is less resource-intensive
transportable tablespaces
being able to specify locations for .ibd files
multiple InnoDB buffer pools

All in all, MySQL 5.6 is a release to look forward to. I have not covered every change in MySQL 5.6, but the major ones that I am looking forward to. Others may have different priorities and reasons for wanting to move to MySQL 5.6. You can see the full MySQL 5.6.7 changelog, or read about the major changes in MySQL 5.6.

Comments are closed.