Liveblogging at OSCON Data: Drizzle, Virtualizing and Scaling MySQL for the Future

Brian Aker presents “Drizzle, Virtualizing and Scaling MySQL for the Future” at OSCon Data 2011

http://drizzle.org

irc.freenode.net #drizzle

http://blog.krow.net

@brianaker

2005 MySQL 5.0 released – web developers wanted tons of features that were not in the release (making replication better for instance)

2008 Sun buys MySQL

2008 MySQL 6.0 is forked to become Drizzle

2009 Oracle buys Sun

2010 Drizzle developers leave Oracle

2011 First GA release, Drizzle7

MySQL’s Architecture – monolithic kernel, not very modular, lots of interdependence.

Drizzle has a microkernel, which includes a listener, parser, optimizer, executioner, storage system, logging/error reporting.

Drizzle can accept SQL and http blog streaming, and memcached and gearman can easily talk to Drizzle.

Drizzle has tried to have no “gotchas”

– If you make a new field with NOT NULL, MySQL makes new values NULL.  Drizzle does not do this.

– No hacky ALTER TABLE

– Real datetime (64 bit), including microseconds

– IPV6 (apparently this is a strong reason for people switching, to support IPV6)

– No updates that complete halfway

– Default character set is UTF-8, default collation is utf8-general (charset in latin1 by default in MySQL, collation is latin1_swedish_ci – “case insensitive” by default)

Replication

– In MySQL, replication is kind of hacky [this is my summary and opinion, but it’s basically what Brian said]

– Drizzle is Google Protocol Buffer Based

– Replicates row transformations

– Integrates with RabbitMQ, Cassandra, Memcached, Gearman — right now.

DML and MySQL binary logs analog:

– DML is stored transactionally by delta in Drizzle

– InnoDB is already logging, no need to add another log for the binary log.  So it just logs DML to the transaction log.

LibDrizzle

– supports Drizzle, MySQL, SQLite

– Asynchronous

– BSD, so Lawyer-free

What else?

– No cost authentication (pam, ldap, htaccess, …)

– Table functions (new data dictionary, including performance and thread information).  INFORMATION_SCHEMA in Drizzle is *exactly* what’s specified in the SQL standard.

– Data types – native type for UUID, boolean, all known types (except SET, because it’s broken by design)

– Assertions are in Drizzle, you can ask what the type of the result of combining multiple data types will be.

– About 80 conflicts in the Drizzle parser as opposed to about 200 in the MySQL parser

Roadmap – Drizzle7+

– Replication – faster than MySQL and also allows multiple masters.

Virtualization:

Virtualizing a database gives you about a 40% performance hit.  How can costs be cut?  In MySQL 5.0 the Instance Manager was created to solve that but it hasn’t really been worked on.  Drizzle has worked on virtualizing databases internally within Drizzle.

– So drizzle now has catalogs.  

– One catalog has its own set of users, its own schema with tables, etc.

– A catalog is its own sandbox; there is no syntax that allows you to connect from one catalog to another, so there’s no security problems.  

– Cuts the 30/40% hit from virtualizing

– Single instance maintenance – only 1 OS and 1 database to configure, unlike VMs

    – Currently only one database configuration so there’s one global config for shared memory such as innodb buffer pool, but that will change in the future.

– Still allows for I/O spread on SAN/NAS

 

In Drizzle 7.1 – Percona’s xtrabackup supports Drizzle, and ships with drizzle.  xtrabackup supports full and partial backups, no locking, single solution for point-in-time recovery in a single solution.  Because transaction log is stored in database, replication is automatically consistent with the database.  Currently does not do incremental backups with the transaction logs but that’s in the future.

DBQP:

– consolidates standard testing tasks, server/test management, reporting, REGRESSION TESTING

– extended architecture allows for complex testing scenarios

– pluggable – supports new testing tools

– randgen, sql-bench, crashme, sysbench, standard drizzle-test-run suite

– Keeping tools and testing configurations in-tree facilitates testing for everyone

– supported by SkySQL

 

Dynamic SQL/execute()

– New UTF-8 parser

– Being extended to allow for plugging in application servers.

 

>120 developers since day 1

avg 26-36 per month that commit

 

Bugs database – http://bugs.launchpad.net/drizzle

Brian Aker presents “Drizzle, Virtualizing and Scaling MySQL for the Future” at OSCon Data 2011

http://drizzle.org

irc.freenode.net #drizzle

http://blog.krow.net

@brianaker

2005 MySQL 5.0 released – web developers wanted tons of features that were not in the release (making replication better for instance)

2008 Sun buys MySQL

2008 MySQL 6.0 is forked to become Drizzle

2009 Oracle buys Sun

2010 Drizzle developers leave Oracle

2011 First GA release, Drizzle7

MySQL’s Architecture – monolithic kernel, not very modular, lots of interdependence.

Drizzle has a microkernel, which includes a listener, parser, optimizer, executioner, storage system, logging/error reporting.

Drizzle can accept SQL and http blog streaming, and memcached and gearman can easily talk to Drizzle.

Drizzle has tried to have no “gotchas”

– If you make a new field with NOT NULL, MySQL makes new values NULL.  Drizzle does not do this.

– No hacky ALTER TABLE

– Real datetime (64 bit), including microseconds

– IPV6 (apparently this is a strong reason for people switching, to support IPV6)

– No updates that complete halfway

– Default character set is UTF-8, default collation is utf8-general (charset in latin1 by default in MySQL, collation is latin1_swedish_ci – “case insensitive” by default)

Replication

– In MySQL, replication is kind of hacky [this is my summary and opinion, but it’s basically what Brian said]

– Drizzle is Google Protocol Buffer Based

– Replicates row transformations

– Integrates with RabbitMQ, Cassandra, Memcached, Gearman — right now.

DML and MySQL binary logs analog:

– DML is stored transactionally by delta in Drizzle

– InnoDB is already logging, no need to add another log for the binary log.  So it just logs DML to the transaction log.

LibDrizzle

– supports Drizzle, MySQL, SQLite

– Asynchronous

– BSD, so Lawyer-free

What else?

– No cost authentication (pam, ldap, htaccess, …)

– Table functions (new data dictionary, including performance and thread information).  INFORMATION_SCHEMA in Drizzle is *exactly* what’s specified in the SQL standard.

– Data types – native type for UUID, boolean, all known types (except SET, because it’s broken by design)

– Assertions are in Drizzle, you can ask what the type of the result of combining multiple data types will be.

– About 80 conflicts in the Drizzle parser as opposed to about 200 in the MySQL parser

Roadmap – Drizzle7+

– Replication – faster than MySQL and also allows multiple masters.

Virtualization:

Virtualizing a database gives you about a 40% performance hit.  How can costs be cut?  In MySQL 5.0 the Instance Manager was created to solve that but it hasn’t really been worked on.  Drizzle has worked on virtualizing databases internally within Drizzle.

– So drizzle now has catalogs.  

– One catalog has its own set of users, its own schema with tables, etc.

– A catalog is its own sandbox; there is no syntax that allows you to connect from one catalog to another, so there’s no security problems.  

– Cuts the 30/40% hit from virtualizing

– Single instance maintenance – only 1 OS and 1 database to configure, unlike VMs

    – Currently only one database configuration so there’s one global config for shared memory such as innodb buffer pool, but that will change in the future.

– Still allows for I/O spread on SAN/NAS

 

In Drizzle 7.1 – Percona’s xtrabackup supports Drizzle, and ships with drizzle.  xtrabackup supports full and partial backups, no locking, single solution for point-in-time recovery in a single solution.  Because transaction log is stored in database, replication is automatically consistent with the database.  Currently does not do incremental backups with the transaction logs but that’s in the future.

DBQP:

– consolidates standard testing tasks, server/test management, reporting, REGRESSION TESTING

– extended architecture allows for complex testing scenarios

– pluggable – supports new testing tools

– randgen, sql-bench, crashme, sysbench, standard drizzle-test-run suite

– Keeping tools and testing configurations in-tree facilitates testing for everyone

– supported by SkySQL

 

Dynamic SQL/execute()

– New UTF-8 parser

– Being extended to allow for plugging in application servers.

 

>120 developers since day 1

avg 26-36 per month that commit

 

Bugs database – http://bugs.launchpad.net/drizzle