Simple E-mail address validator

Formed a MySQL Quiz team
Met all the requirements for the MySQL Quiz
Took a Certification exam

everyone root for Team Prokrasti Nation!
Formed a MySQL Quiz team
Met all the requirements for the MySQL Quiz
Took a Certification exam

everyone root for Team Prokrasti Nation!
I was told that teams had to have a physical instantiation of a mascot, treatment so I said, hygiene “maybe I’ll knit something.” Well, pfizer I didn’t knit something, but I did hand-craft an origami butterfly for Team Prokrasti Nation’s mascot:

(click picture for larger image).

Oh, and I won a fun game from O’Reilly for submitting speaker evaluations.
Formed a MySQL Quiz team
Met all the requirements for the MySQL Quiz
Took a Certification exam

everyone root for Team Prokrasti Nation!
I was told that teams had to have a physical instantiation of a mascot, treatment so I said, hygiene “maybe I’ll knit something.” Well, pfizer I didn’t knit something, but I did hand-craft an origami butterfly for Team Prokrasti Nation’s mascot:

(click picture for larger image).

Oh, and I won a fun game from O’Reilly for submitting speaker evaluations.
talk by Roland Mallmann

MaxDB is older than I am, site in 1977 started at University of Berlin. Owned by SAP today. Today it’s open source under GPL, purchase or commercial license from SAP or MySQL AB.

Why Max DB is so great:
Low cost of ownership
Few config parameters
no size estimates for indvidual db objects

no reorg — space management done automatically — space no longer needed is returned immediately to the db, data occupied vs. free (holes) ration is highest as possible. This is done by matching logical pages to physical on disk with the Converter, and I/O and space management.

Space management done automatically
No reorganization is needed (ie, OPTIMIZE TABLE)
Gaps are not allowed, therefore updates and deletes are in place, and sorts happen AFTER an insertion.
Space freed is immediately returned to DB
Done by Converter, matches logical pages to physical disk.
Data is stored in B* Trees (b star tree) for almost all objects (Tables, indexes, secondary indexes, BLOBs)

Concurrent asynchronous I/O
Manages free blocks
Auto balancing of disk I/O
Savepoints
Backup Integration (including incremental)
Segmentation of the data cache
A 10 minutes cycle of changes flushed to disk
Flushing data pages to disk is spread out over the 10 minutes

Online Backup and Restore
Consistent backups, no need to apply logs
Savepoint issued before db backup, savepoint includes undo information for remaining open transactions.
Can do incremental, full data, or log backup
can restore, restore from a medium, or backup from history, or backup to a point in time.

Snapshots
Can make complete database backup
Can make a snapshot for replication
Can make incremental on master and restore snapshot on replication as a backup strategy (as long as there isn’t a newer snapshot, because then incremental backup logs are reset)

Standby Database
A standby is made possible using log shipping.
Master and slave share backup media (shared disk)
Init once with complete master backup
Redo available logs

In case of emergency: start slave, back up last log piece from master in case it hasn’t been shipped. Redo all ‘open’ log backups (should be none), redo final piece, start slave, it’s now the master!

Synchronization Manager
no permanent attention required
unattended desktop/laptop installation and operation

database snapshot functionality!

Formed a MySQL Quiz team
Met all the requirements for the MySQL Quiz
Took a Certification exam

everyone root for Team Prokrasti Nation!
I was told that teams had to have a physical instantiation of a mascot, treatment so I said, hygiene “maybe I’ll knit something.” Well, pfizer I didn’t knit something, but I did hand-craft an origami butterfly for Team Prokrasti Nation’s mascot:

(click picture for larger image).

Oh, and I won a fun game from O’Reilly for submitting speaker evaluations.
talk by Roland Mallmann

MaxDB is older than I am, site in 1977 started at University of Berlin. Owned by SAP today. Today it’s open source under GPL, purchase or commercial license from SAP or MySQL AB.

Why Max DB is so great:
Low cost of ownership
Few config parameters
no size estimates for indvidual db objects

no reorg — space management done automatically — space no longer needed is returned immediately to the db, data occupied vs. free (holes) ration is highest as possible. This is done by matching logical pages to physical on disk with the Converter, and I/O and space management.

Space management done automatically
No reorganization is needed (ie, OPTIMIZE TABLE)
Gaps are not allowed, therefore updates and deletes are in place, and sorts happen AFTER an insertion.
Space freed is immediately returned to DB
Done by Converter, matches logical pages to physical disk.
Data is stored in B* Trees (b star tree) for almost all objects (Tables, indexes, secondary indexes, BLOBs)

Concurrent asynchronous I/O
Manages free blocks
Auto balancing of disk I/O
Savepoints
Backup Integration (including incremental)
Segmentation of the data cache
A 10 minutes cycle of changes flushed to disk
Flushing data pages to disk is spread out over the 10 minutes

Online Backup and Restore
Consistent backups, no need to apply logs
Savepoint issued before db backup, savepoint includes undo information for remaining open transactions.
Can do incremental, full data, or log backup
can restore, restore from a medium, or backup from history, or backup to a point in time.

Snapshots
Can make complete database backup
Can make a snapshot for replication
Can make incremental on master and restore snapshot on replication as a backup strategy (as long as there isn’t a newer snapshot, because then incremental backup logs are reset)

Standby Database
A standby is made possible using log shipping.
Master and slave share backup media (shared disk)
Init once with complete master backup
Redo available logs

In case of emergency: start slave, back up last log piece from master in case it hasn’t been shipped. Redo all ‘open’ log backups (should be none), redo final piece, start slave, it’s now the master!

Synchronization Manager
no permanent attention required
unattended desktop/laptop installation and operation

database snapshot functionality!

Some of these may be conflicting, therapy not applicable to everyone.

1) think horizontal — everything, patient not just the web servers. Micro optimizations are boring, as or other details
2) benchmarking techniques;. Not “how fast” but “how many”. test force, not speed.
3) bigger and faster vertical scaling is the enemy.
4) horizontal scaling = add another box
5) implementation, scale your system a few times, but scale your ARCHITECTUREa dozens or hundreds of time.
6) start from the beginning with architecture implementation.
7) don’t have “The server” for anything
8) stateless good, stateful bad
9) “shared nothing” good
10) don’t keep state within app server
11) caching good.
12) generate static pages periodically, works well for not millions of pages or changes.
13) cache full output in application
14) include cookies in the “cache key” so diff browsers can get diff info too
15) use cache when this, not when that
16) use regexp to insert customized content into the cahed page
17) set Expires header to control cache times, or rewrite rule to generate page if the cached file does not exist (rails does this)
18) if content is dynamic this does not work, but great for caching “dynamic” images
19) parial pages — pre-generate static page snippets, have handler just assemble pieces.
20) cache little snippets, ie sidebar
21) don’t spend more time managing the cadche than you sav
22) cache data that’s too slow to query, fetch, calc.
23) generate page from cached data
24) use same data to generate api responss
25) moves load to web servers
26) start with things you hit all the time
27) if you don’t use it, don’t cache it, check db logs
28) don’t depend on MySQL Query cache unless it actually helps
29) local file system not so good because you copy page for every server
30) use process memory, not shared
31) mysql cache table — id is the “cache key” type is the “namespace”, metadata for things like headers for cached http responses; purge_key to make it easier to delete data from cache (make it an index, too, primary index on id,type, expire index on expire field) fields
32) why 31 fails, how do you load balance, what if mysql server died, now no cache
33) but you can use mysql scaling techniques to deal, like dual-master replication
34) use memcached, like lj, slashdot, wikipedia — memory based, linux 2.6(epoll) or FreeBsD(kqueue), low overhead for lots of cxns, no master, simple!
35) how to scale the db horizontally, use MySQL, use replication to share the load, write to one master, read from many slaves, good for heavy read apps (or insert delayed, if you don’t need to write right away) — check out “High Performance MySQL”
36) relay slave replication if too much bandwidth on the master, use a replication slave to replicate to other slaves.
37) writing does not scale with replication — all servers need to do the same writes. 5.1’s row-level replication might help.
38) so partition the data, divide and conquer. separate cluster for different data sets
39) if you can’t divide, use flexible partitioning, global server keeps track for which “cluster” has what info. auto_increment columns only in the “global master”. Aggressively cache “global master” data.
40) If you use a master-master setup like 39, then you don’t have replication slaves, no latency from commit to data being available. if you are careful you can write to both masters. Make each user always use the same master, so primary keys won’t be messed up. If one master fails, use the other one.
41) don’t be afraid of the data duplication monster. use summary tables, to avoid things like COUNT(*) and GROUP BY. do it once, put result into a table — do this periodically, or do it when the data is inserted. Or data affecting a “user” and a “group” goes into both the “user” and “group” partitions (clusters). so it’s duplicating data.
42) but you can go further, and use summary dbs! copy data into special dbs optimized for special queries, ie FULLTEXT searches, anything spanning more than one or all clusters, different dbs for different latency requirements, ie RSS feeds from a replicated slave db — RSS feeds can be late).
43) save data to multiple “partitions” like the application doing manual replication — app writes to 2 places OR last_updated and deleted columns, use triggers to add to “replication_queue” table, background program to copy data based on queue table or last_updated column
44) if you’re running oracle, move read operations to MySQL with this manual replication idea. Good way to sneak MySQL into an oracle shop.
45) make everything repeatable, build summary and load scripts so they can restart or run again — also have one trusted eata place, so summaries and copies can be (re)created from there.

BREATHE! HALFWAY THERE!!

46) use innodb because it’s more robust. except for big read-only tables, high volume streaming tables (logging), lcoked tables or INSERT DELAYED, specialized engines for special needs, and more engines in the future — but for now, InnoDB
47) Multiple MySQL instances — run diff instances for diff workloads, even if they share the same server. moving to separate hardware is easier, of course. optimize the server instance for the workload. e4asy to set up with instance manager or mysqld_multi, and there are init scripts that support the instance manager.
48) asynchronous data loading when you can — if you’re updating counts or loading logs, send updates through Spread (or whatever messaging something) to a daemon loading data. Don’t update for each request (ie, counts), do it every 1000 updates, or every few minutes. This helps if db loses net connection, the frontend keeps running! or if you want to lock tables, etc.
49) preload, dump and process — let the servers pre-process, as much as possible. dump never changing data structures to js files for the client to cache (postal data maybe), or dump to memory, or use SQLite, or BerkeleyDB and rsync to each webserver, or mysql replica on webserver
50) stored procedures are dangerous because they’re not horizontal, more work than just adding a webserver– only use if it saves the db work (ie send 5 rows to app instead of 5,000 and parsing in app)
51) reconsider persistent db connections because it requires a thread = memory, all httpd processes talk to all dbs, lots of caching might mean you don’t need main db, mysql cxns are fast so why not just reopen?
52) innodb_file_per_table, so OPTIMIZE TABLE clears unused space. innodb_buffer_pool_soze set to 80% of total mem (dedicated mysql server). innodb_flush_log_at_trx_commit, innodb_log_file_size
53) have metadata in db, store images in filesystem, but then how do you replicate? or store images in myisam tables, split up so tables don’t get bigger than 4G, so if gets corrupt fewer problems. metadata table might specify what table it’s in. include last modified date in metadata, and use in URLs to optimize caching, ie with squid: /images/$timestamp/$id.jpg
54) do everything in unicode
55) UTC for everything
56) STRICT_TRANS_TABLE so MySQL is picky about bad input and does not just turn it to NULL or zero.
57) Don’t overwork the DB — dbs don’t easily scale like web servers
58) STATELESS. don’t make cookie id’s easy to guess, or sequential, etc. don’t save state on one server only, save it on every one. put the data in the db, don’t put it in the cookie, that duplicates efforts. important data into db, so it gets saved, unimportant transient data puts in memcache, SMALL data in cookie. a shopping cart would go in db, background color goes in cookie, and last viewed items go in memcache
59) to make cookies safer, use checksums and timestamps to validate cookies. Encryption usually a waste of cycles.
60) use resources wisely. balance how you use hardware — use memory to save I/O or CPU, don’t swap memory to disk EVER.
61) do the work in parallel — split work into smaller pieces and run on different boxes. send sub-requests off as soon as possible and do other stuff in the meantime.
62) light processes for light tasks — thin proxy servers for “network buffers”, goes between the user and your heavier backend application. Use httpd with mod_proxy, mod_backhand. the proxy does the ‘net work, and fewer httpd processes are needed to do the real work, this saves memory and db connections. proxies can also server static files and cache responses. Avoid starting main app as root. Load balancing, and very important if your background processes are “heavy”. Very EASY to set up a light process. ProxyPreserveHostOn in apache 2
63) job queues — use queues, AJAX can make this easy. webserver submits job to database “queue”, first avail worker picks up first job, and sends result to queue. or ue gearman, Spread, MQ/Java Messaging Service(?)
64) log http requests to a database! log all 4xx and 5xx requests, great to see which requests are slow or fast. but only log 1-2% of all requests. Time::HiRes in Perl, microseconds from gettimeofday system call.
65) get good deals on servers http://www.siliconmechanics.com, server vendor of lj and others.

IN SUMMARY: HORIZONTAL GOOD, VERTICAL BAD

for jobs: ask@develooper.com (jobs, moonlighters, perl/mysql etc)
slides will be up at http://develooper.com/talks/
Phew! That was a lot of fast typing (60 words per minute, baby!). Ask is smart, but QUICK!!!! His slides will be VERY useful when they appear. He said there were 53 tips, but I numbered each new line (and not smartly with OL and LI) and I have more than that…
Formed a MySQL Quiz team
Met all the requirements for the MySQL Quiz
Took a Certification exam

everyone root for Team Prokrasti Nation!
I was told that teams had to have a physical instantiation of a mascot, treatment so I said, hygiene “maybe I’ll knit something.” Well, pfizer I didn’t knit something, but I did hand-craft an origami butterfly for Team Prokrasti Nation’s mascot:

(click picture for larger image).

Oh, and I won a fun game from O’Reilly for submitting speaker evaluations.
talk by Roland Mallmann

MaxDB is older than I am, site in 1977 started at University of Berlin. Owned by SAP today. Today it’s open source under GPL, purchase or commercial license from SAP or MySQL AB.

Why Max DB is so great:
Low cost of ownership
Few config parameters
no size estimates for indvidual db objects

no reorg — space management done automatically — space no longer needed is returned immediately to the db, data occupied vs. free (holes) ration is highest as possible. This is done by matching logical pages to physical on disk with the Converter, and I/O and space management.

Space management done automatically
No reorganization is needed (ie, OPTIMIZE TABLE)
Gaps are not allowed, therefore updates and deletes are in place, and sorts happen AFTER an insertion.
Space freed is immediately returned to DB
Done by Converter, matches logical pages to physical disk.
Data is stored in B* Trees (b star tree) for almost all objects (Tables, indexes, secondary indexes, BLOBs)

Concurrent asynchronous I/O
Manages free blocks
Auto balancing of disk I/O
Savepoints
Backup Integration (including incremental)
Segmentation of the data cache
A 10 minutes cycle of changes flushed to disk
Flushing data pages to disk is spread out over the 10 minutes

Online Backup and Restore
Consistent backups, no need to apply logs
Savepoint issued before db backup, savepoint includes undo information for remaining open transactions.
Can do incremental, full data, or log backup
can restore, restore from a medium, or backup from history, or backup to a point in time.

Snapshots
Can make complete database backup
Can make a snapshot for replication
Can make incremental on master and restore snapshot on replication as a backup strategy (as long as there isn’t a newer snapshot, because then incremental backup logs are reset)

Standby Database
A standby is made possible using log shipping.
Master and slave share backup media (shared disk)
Init once with complete master backup
Redo available logs

In case of emergency: start slave, back up last log piece from master in case it hasn’t been shipped. Redo all ‘open’ log backups (should be none), redo final piece, start slave, it’s now the master!

Synchronization Manager
no permanent attention required
unattended desktop/laptop installation and operation

database snapshot functionality!

Some of these may be conflicting, therapy not applicable to everyone.

1) think horizontal — everything, patient not just the web servers. Micro optimizations are boring, as or other details
2) benchmarking techniques;. Not “how fast” but “how many”. test force, not speed.
3) bigger and faster vertical scaling is the enemy.
4) horizontal scaling = add another box
5) implementation, scale your system a few times, but scale your ARCHITECTUREa dozens or hundreds of time.
6) start from the beginning with architecture implementation.
7) don’t have “The server” for anything
8) stateless good, stateful bad
9) “shared nothing” good
10) don’t keep state within app server
11) caching good.
12) generate static pages periodically, works well for not millions of pages or changes.
13) cache full output in application
14) include cookies in the “cache key” so diff browsers can get diff info too
15) use cache when this, not when that
16) use regexp to insert customized content into the cahed page
17) set Expires header to control cache times, or rewrite rule to generate page if the cached file does not exist (rails does this)
18) if content is dynamic this does not work, but great for caching “dynamic” images
19) parial pages — pre-generate static page snippets, have handler just assemble pieces.
20) cache little snippets, ie sidebar
21) don’t spend more time managing the cadche than you sav
22) cache data that’s too slow to query, fetch, calc.
23) generate page from cached data
24) use same data to generate api responss
25) moves load to web servers
26) start with things you hit all the time
27) if you don’t use it, don’t cache it, check db logs
28) don’t depend on MySQL Query cache unless it actually helps
29) local file system not so good because you copy page for every server
30) use process memory, not shared
31) mysql cache table — id is the “cache key” type is the “namespace”, metadata for things like headers for cached http responses; purge_key to make it easier to delete data from cache (make it an index, too, primary index on id,type, expire index on expire field) fields
32) why 31 fails, how do you load balance, what if mysql server died, now no cache
33) but you can use mysql scaling techniques to deal, like dual-master replication
34) use memcached, like lj, slashdot, wikipedia — memory based, linux 2.6(epoll) or FreeBsD(kqueue), low overhead for lots of cxns, no master, simple!
35) how to scale the db horizontally, use MySQL, use replication to share the load, write to one master, read from many slaves, good for heavy read apps (or insert delayed, if you don’t need to write right away) — check out “High Performance MySQL”
36) relay slave replication if too much bandwidth on the master, use a replication slave to replicate to other slaves.
37) writing does not scale with replication — all servers need to do the same writes. 5.1’s row-level replication might help.
38) so partition the data, divide and conquer. separate cluster for different data sets
39) if you can’t divide, use flexible partitioning, global server keeps track for which “cluster” has what info. auto_increment columns only in the “global master”. Aggressively cache “global master” data.
40) If you use a master-master setup like 39, then you don’t have replication slaves, no latency from commit to data being available. if you are careful you can write to both masters. Make each user always use the same master, so primary keys won’t be messed up. If one master fails, use the other one.
41) don’t be afraid of the data duplication monster. use summary tables, to avoid things like COUNT(*) and GROUP BY. do it once, put result into a table — do this periodically, or do it when the data is inserted. Or data affecting a “user” and a “group” goes into both the “user” and “group” partitions (clusters). so it’s duplicating data.
42) but you can go further, and use summary dbs! copy data into special dbs optimized for special queries, ie FULLTEXT searches, anything spanning more than one or all clusters, different dbs for different latency requirements, ie RSS feeds from a replicated slave db — RSS feeds can be late).
43) save data to multiple “partitions” like the application doing manual replication — app writes to 2 places OR last_updated and deleted columns, use triggers to add to “replication_queue” table, background program to copy data based on queue table or last_updated column
44) if you’re running oracle, move read operations to MySQL with this manual replication idea. Good way to sneak MySQL into an oracle shop.
45) make everything repeatable, build summary and load scripts so they can restart or run again — also have one trusted eata place, so summaries and copies can be (re)created from there.

BREATHE! HALFWAY THERE!!

46) use innodb because it’s more robust. except for big read-only tables, high volume streaming tables (logging), lcoked tables or INSERT DELAYED, specialized engines for special needs, and more engines in the future — but for now, InnoDB
47) Multiple MySQL instances — run diff instances for diff workloads, even if they share the same server. moving to separate hardware is easier, of course. optimize the server instance for the workload. e4asy to set up with instance manager or mysqld_multi, and there are init scripts that support the instance manager.
48) asynchronous data loading when you can — if you’re updating counts or loading logs, send updates through Spread (or whatever messaging something) to a daemon loading data. Don’t update for each request (ie, counts), do it every 1000 updates, or every few minutes. This helps if db loses net connection, the frontend keeps running! or if you want to lock tables, etc.
49) preload, dump and process — let the servers pre-process, as much as possible. dump never changing data structures to js files for the client to cache (postal data maybe), or dump to memory, or use SQLite, or BerkeleyDB and rsync to each webserver, or mysql replica on webserver
50) stored procedures are dangerous because they’re not horizontal, more work than just adding a webserver– only use if it saves the db work (ie send 5 rows to app instead of 5,000 and parsing in app)
51) reconsider persistent db connections because it requires a thread = memory, all httpd processes talk to all dbs, lots of caching might mean you don’t need main db, mysql cxns are fast so why not just reopen?
52) innodb_file_per_table, so OPTIMIZE TABLE clears unused space. innodb_buffer_pool_soze set to 80% of total mem (dedicated mysql server). innodb_flush_log_at_trx_commit, innodb_log_file_size
53) have metadata in db, store images in filesystem, but then how do you replicate? or store images in myisam tables, split up so tables don’t get bigger than 4G, so if gets corrupt fewer problems. metadata table might specify what table it’s in. include last modified date in metadata, and use in URLs to optimize caching, ie with squid: /images/$timestamp/$id.jpg
54) do everything in unicode
55) UTC for everything
56) STRICT_TRANS_TABLE so MySQL is picky about bad input and does not just turn it to NULL or zero.
57) Don’t overwork the DB — dbs don’t easily scale like web servers
58) STATELESS. don’t make cookie id’s easy to guess, or sequential, etc. don’t save state on one server only, save it on every one. put the data in the db, don’t put it in the cookie, that duplicates efforts. important data into db, so it gets saved, unimportant transient data puts in memcache, SMALL data in cookie. a shopping cart would go in db, background color goes in cookie, and last viewed items go in memcache
59) to make cookies safer, use checksums and timestamps to validate cookies. Encryption usually a waste of cycles.
60) use resources wisely. balance how you use hardware — use memory to save I/O or CPU, don’t swap memory to disk EVER.
61) do the work in parallel — split work into smaller pieces and run on different boxes. send sub-requests off as soon as possible and do other stuff in the meantime.
62) light processes for light tasks — thin proxy servers for “network buffers”, goes between the user and your heavier backend application. Use httpd with mod_proxy, mod_backhand. the proxy does the ‘net work, and fewer httpd processes are needed to do the real work, this saves memory and db connections. proxies can also server static files and cache responses. Avoid starting main app as root. Load balancing, and very important if your background processes are “heavy”. Very EASY to set up a light process. ProxyPreserveHostOn in apache 2
63) job queues — use queues, AJAX can make this easy. webserver submits job to database “queue”, first avail worker picks up first job, and sends result to queue. or ue gearman, Spread, MQ/Java Messaging Service(?)
64) log http requests to a database! log all 4xx and 5xx requests, great to see which requests are slow or fast. but only log 1-2% of all requests. Time::HiRes in Perl, microseconds from gettimeofday system call.
65) get good deals on servers http://www.siliconmechanics.com, server vendor of lj and others.

IN SUMMARY: HORIZONTAL GOOD, VERTICAL BAD

for jobs: ask@develooper.com (jobs, moonlighters, perl/mysql etc)
slides will be up at http://develooper.com/talks/
Phew! That was a lot of fast typing (60 words per minute, baby!). Ask is smart, but QUICK!!!! His slides will be VERY useful when they appear. He said there were 53 tips, but I numbered each new line (and not smartly with OL and LI) and I have more than that…

This post dedicated to Edwin DeSouza.

Un-tuned SQL or stored procedures often fail to scale as table volumes increase, plague inefficiency increases exponentially with size.

Tune SQL/stored procedures and then buy new hardware.

use EXPLAIN to help optimize queries. Also use the slow query log.

EXPLAIN EXTENDED shows sql that was actually used — ie, optimizer may rewrite query, so it’s a neat tool.

you can always give optimizer hints, but they’re not recommended — keep checking them as your app grows — STRAIGHT_JOIN, FORCE INDEX, USE INDEX, and one other one.

SHOW STATUS gives you status variables. innodb_buffer_pool_read_requests and innodb_data_read will show how much data is being read from the buffer pool vs. data.

Index isn’t always used, if more than 20% or so of rows, MySQL will use a full table scan. There’s usually a range where MySQL will choose a full table scan when an index is more appropriate, or vice versa, so that’s when you’d use hints. Hey, nobody’s perfect!

think indexes — joining tables of non-trivial size Subqueries ( [NOT] EXISTS, [NOT] IN) in WHERE clause. Use index to avoid a sort, use “covering” indexes.

Establish the best set of multi-column indexes along with singular indexes.

Derived tables (subqueries in FROM cause) can’t use an index. VIEWs with UNION or GROUP BY also can’t use index — all these use TEMPTABLE view algorithm. (temp table created, and then reads from temp table).

Sorts can be improved by increasing memory (sort_buffer_size) or using an index.

Use procedures to:

  • Avoid self joins
  • Correlated updates (subqueries accessing same data)

Performance of SQL within a stored routine that dominates the performance. When SQL is tuned, optimize the routine using traditional techniques:

  • only put what’s needed in a loop
  • stop testing when you know the answer
  • order tests by most likely first

Recursion:

  • only allowed in procedures, not functions
  • depth controlled by max_sp_recursion_depth
  • iterative alternatives are almost always faster and scaleable

TRIGGERS
non-trivial (12% at least) to even simplest trigger. No trigger should EVER contain expensive SQL, because they are done for each row.

Quest free software for MySQL — http://www.quest.com/mysql/
Formed a MySQL Quiz team
Met all the requirements for the MySQL Quiz
Took a Certification exam

everyone root for Team Prokrasti Nation!
I was told that teams had to have a physical instantiation of a mascot, treatment so I said, hygiene “maybe I’ll knit something.” Well, pfizer I didn’t knit something, but I did hand-craft an origami butterfly for Team Prokrasti Nation’s mascot:

(click picture for larger image).

Oh, and I won a fun game from O’Reilly for submitting speaker evaluations.
talk by Roland Mallmann

MaxDB is older than I am, site in 1977 started at University of Berlin. Owned by SAP today. Today it’s open source under GPL, purchase or commercial license from SAP or MySQL AB.

Why Max DB is so great:
Low cost of ownership
Few config parameters
no size estimates for indvidual db objects

no reorg — space management done automatically — space no longer needed is returned immediately to the db, data occupied vs. free (holes) ration is highest as possible. This is done by matching logical pages to physical on disk with the Converter, and I/O and space management.

Space management done automatically
No reorganization is needed (ie, OPTIMIZE TABLE)
Gaps are not allowed, therefore updates and deletes are in place, and sorts happen AFTER an insertion.
Space freed is immediately returned to DB
Done by Converter, matches logical pages to physical disk.
Data is stored in B* Trees (b star tree) for almost all objects (Tables, indexes, secondary indexes, BLOBs)

Concurrent asynchronous I/O
Manages free blocks
Auto balancing of disk I/O
Savepoints
Backup Integration (including incremental)
Segmentation of the data cache
A 10 minutes cycle of changes flushed to disk
Flushing data pages to disk is spread out over the 10 minutes

Online Backup and Restore
Consistent backups, no need to apply logs
Savepoint issued before db backup, savepoint includes undo information for remaining open transactions.
Can do incremental, full data, or log backup
can restore, restore from a medium, or backup from history, or backup to a point in time.

Snapshots
Can make complete database backup
Can make a snapshot for replication
Can make incremental on master and restore snapshot on replication as a backup strategy (as long as there isn’t a newer snapshot, because then incremental backup logs are reset)

Standby Database
A standby is made possible using log shipping.
Master and slave share backup media (shared disk)
Init once with complete master backup
Redo available logs

In case of emergency: start slave, back up last log piece from master in case it hasn’t been shipped. Redo all ‘open’ log backups (should be none), redo final piece, start slave, it’s now the master!

Synchronization Manager
no permanent attention required
unattended desktop/laptop installation and operation

database snapshot functionality!

Some of these may be conflicting, therapy not applicable to everyone.

1) think horizontal — everything, patient not just the web servers. Micro optimizations are boring, as or other details
2) benchmarking techniques;. Not “how fast” but “how many”. test force, not speed.
3) bigger and faster vertical scaling is the enemy.
4) horizontal scaling = add another box
5) implementation, scale your system a few times, but scale your ARCHITECTUREa dozens or hundreds of time.
6) start from the beginning with architecture implementation.
7) don’t have “The server” for anything
8) stateless good, stateful bad
9) “shared nothing” good
10) don’t keep state within app server
11) caching good.
12) generate static pages periodically, works well for not millions of pages or changes.
13) cache full output in application
14) include cookies in the “cache key” so diff browsers can get diff info too
15) use cache when this, not when that
16) use regexp to insert customized content into the cahed page
17) set Expires header to control cache times, or rewrite rule to generate page if the cached file does not exist (rails does this)
18) if content is dynamic this does not work, but great for caching “dynamic” images
19) parial pages — pre-generate static page snippets, have handler just assemble pieces.
20) cache little snippets, ie sidebar
21) don’t spend more time managing the cadche than you sav
22) cache data that’s too slow to query, fetch, calc.
23) generate page from cached data
24) use same data to generate api responss
25) moves load to web servers
26) start with things you hit all the time
27) if you don’t use it, don’t cache it, check db logs
28) don’t depend on MySQL Query cache unless it actually helps
29) local file system not so good because you copy page for every server
30) use process memory, not shared
31) mysql cache table — id is the “cache key” type is the “namespace”, metadata for things like headers for cached http responses; purge_key to make it easier to delete data from cache (make it an index, too, primary index on id,type, expire index on expire field) fields
32) why 31 fails, how do you load balance, what if mysql server died, now no cache
33) but you can use mysql scaling techniques to deal, like dual-master replication
34) use memcached, like lj, slashdot, wikipedia — memory based, linux 2.6(epoll) or FreeBsD(kqueue), low overhead for lots of cxns, no master, simple!
35) how to scale the db horizontally, use MySQL, use replication to share the load, write to one master, read from many slaves, good for heavy read apps (or insert delayed, if you don’t need to write right away) — check out “High Performance MySQL”
36) relay slave replication if too much bandwidth on the master, use a replication slave to replicate to other slaves.
37) writing does not scale with replication — all servers need to do the same writes. 5.1’s row-level replication might help.
38) so partition the data, divide and conquer. separate cluster for different data sets
39) if you can’t divide, use flexible partitioning, global server keeps track for which “cluster” has what info. auto_increment columns only in the “global master”. Aggressively cache “global master” data.
40) If you use a master-master setup like 39, then you don’t have replication slaves, no latency from commit to data being available. if you are careful you can write to both masters. Make each user always use the same master, so primary keys won’t be messed up. If one master fails, use the other one.
41) don’t be afraid of the data duplication monster. use summary tables, to avoid things like COUNT(*) and GROUP BY. do it once, put result into a table — do this periodically, or do it when the data is inserted. Or data affecting a “user” and a “group” goes into both the “user” and “group” partitions (clusters). so it’s duplicating data.
42) but you can go further, and use summary dbs! copy data into special dbs optimized for special queries, ie FULLTEXT searches, anything spanning more than one or all clusters, different dbs for different latency requirements, ie RSS feeds from a replicated slave db — RSS feeds can be late).
43) save data to multiple “partitions” like the application doing manual replication — app writes to 2 places OR last_updated and deleted columns, use triggers to add to “replication_queue” table, background program to copy data based on queue table or last_updated column
44) if you’re running oracle, move read operations to MySQL with this manual replication idea. Good way to sneak MySQL into an oracle shop.
45) make everything repeatable, build summary and load scripts so they can restart or run again — also have one trusted eata place, so summaries and copies can be (re)created from there.

BREATHE! HALFWAY THERE!!

46) use innodb because it’s more robust. except for big read-only tables, high volume streaming tables (logging), lcoked tables or INSERT DELAYED, specialized engines for special needs, and more engines in the future — but for now, InnoDB
47) Multiple MySQL instances — run diff instances for diff workloads, even if they share the same server. moving to separate hardware is easier, of course. optimize the server instance for the workload. e4asy to set up with instance manager or mysqld_multi, and there are init scripts that support the instance manager.
48) asynchronous data loading when you can — if you’re updating counts or loading logs, send updates through Spread (or whatever messaging something) to a daemon loading data. Don’t update for each request (ie, counts), do it every 1000 updates, or every few minutes. This helps if db loses net connection, the frontend keeps running! or if you want to lock tables, etc.
49) preload, dump and process — let the servers pre-process, as much as possible. dump never changing data structures to js files for the client to cache (postal data maybe), or dump to memory, or use SQLite, or BerkeleyDB and rsync to each webserver, or mysql replica on webserver
50) stored procedures are dangerous because they’re not horizontal, more work than just adding a webserver– only use if it saves the db work (ie send 5 rows to app instead of 5,000 and parsing in app)
51) reconsider persistent db connections because it requires a thread = memory, all httpd processes talk to all dbs, lots of caching might mean you don’t need main db, mysql cxns are fast so why not just reopen?
52) innodb_file_per_table, so OPTIMIZE TABLE clears unused space. innodb_buffer_pool_soze set to 80% of total mem (dedicated mysql server). innodb_flush_log_at_trx_commit, innodb_log_file_size
53) have metadata in db, store images in filesystem, but then how do you replicate? or store images in myisam tables, split up so tables don’t get bigger than 4G, so if gets corrupt fewer problems. metadata table might specify what table it’s in. include last modified date in metadata, and use in URLs to optimize caching, ie with squid: /images/$timestamp/$id.jpg
54) do everything in unicode
55) UTC for everything
56) STRICT_TRANS_TABLE so MySQL is picky about bad input and does not just turn it to NULL or zero.
57) Don’t overwork the DB — dbs don’t easily scale like web servers
58) STATELESS. don’t make cookie id’s easy to guess, or sequential, etc. don’t save state on one server only, save it on every one. put the data in the db, don’t put it in the cookie, that duplicates efforts. important data into db, so it gets saved, unimportant transient data puts in memcache, SMALL data in cookie. a shopping cart would go in db, background color goes in cookie, and last viewed items go in memcache
59) to make cookies safer, use checksums and timestamps to validate cookies. Encryption usually a waste of cycles.
60) use resources wisely. balance how you use hardware — use memory to save I/O or CPU, don’t swap memory to disk EVER.
61) do the work in parallel — split work into smaller pieces and run on different boxes. send sub-requests off as soon as possible and do other stuff in the meantime.
62) light processes for light tasks — thin proxy servers for “network buffers”, goes between the user and your heavier backend application. Use httpd with mod_proxy, mod_backhand. the proxy does the ‘net work, and fewer httpd processes are needed to do the real work, this saves memory and db connections. proxies can also server static files and cache responses. Avoid starting main app as root. Load balancing, and very important if your background processes are “heavy”. Very EASY to set up a light process. ProxyPreserveHostOn in apache 2
63) job queues — use queues, AJAX can make this easy. webserver submits job to database “queue”, first avail worker picks up first job, and sends result to queue. or ue gearman, Spread, MQ/Java Messaging Service(?)
64) log http requests to a database! log all 4xx and 5xx requests, great to see which requests are slow or fast. but only log 1-2% of all requests. Time::HiRes in Perl, microseconds from gettimeofday system call.
65) get good deals on servers http://www.siliconmechanics.com, server vendor of lj and others.

IN SUMMARY: HORIZONTAL GOOD, VERTICAL BAD

for jobs: ask@develooper.com (jobs, moonlighters, perl/mysql etc)
slides will be up at http://develooper.com/talks/
Phew! That was a lot of fast typing (60 words per minute, baby!). Ask is smart, but QUICK!!!! His slides will be VERY useful when they appear. He said there were 53 tips, but I numbered each new line (and not smartly with OL and LI) and I have more than that…

This post dedicated to Edwin DeSouza.

Un-tuned SQL or stored procedures often fail to scale as table volumes increase, plague inefficiency increases exponentially with size.

Tune SQL/stored procedures and then buy new hardware.

use EXPLAIN to help optimize queries. Also use the slow query log.

EXPLAIN EXTENDED shows sql that was actually used — ie, optimizer may rewrite query, so it’s a neat tool.

you can always give optimizer hints, but they’re not recommended — keep checking them as your app grows — STRAIGHT_JOIN, FORCE INDEX, USE INDEX, and one other one.

SHOW STATUS gives you status variables. innodb_buffer_pool_read_requests and innodb_data_read will show how much data is being read from the buffer pool vs. data.

Index isn’t always used, if more than 20% or so of rows, MySQL will use a full table scan. There’s usually a range where MySQL will choose a full table scan when an index is more appropriate, or vice versa, so that’s when you’d use hints. Hey, nobody’s perfect!

think indexes — joining tables of non-trivial size Subqueries ( [NOT] EXISTS, [NOT] IN) in WHERE clause. Use index to avoid a sort, use “covering” indexes.

Establish the best set of multi-column indexes along with singular indexes.

Derived tables (subqueries in FROM cause) can’t use an index. VIEWs with UNION or GROUP BY also can’t use index — all these use TEMPTABLE view algorithm. (temp table created, and then reads from temp table).

Sorts can be improved by increasing memory (sort_buffer_size) or using an index.

Use procedures to:

  • Avoid self joins
  • Correlated updates (subqueries accessing same data)

Performance of SQL within a stored routine that dominates the performance. When SQL is tuned, optimize the routine using traditional techniques:

  • only put what’s needed in a loop
  • stop testing when you know the answer
  • order tests by most likely first

Recursion:

  • only allowed in procedures, not functions
  • depth controlled by max_sp_recursion_depth
  • iterative alternatives are almost always faster and scaleable

TRIGGERS
non-trivial (12% at least) to even simplest trigger. No trigger should EVER contain expensive SQL, because they are done for each row.

Quest free software for MySQL — http://www.quest.com/mysql/
So last night, viagra approved during a break in the quiz show (where Prokrasti Nation had a good showing, case as did the other teams — Recreational Evil, neuropathist Peeps, and Safe Hex) we bid on the T-shirt that had the signatures of all the speakers at the conference. All the proceeds were to go to the EFF, so it’s a good cause.

They announced it was cash only, so I looked in my wallet. $33. Well, the bidding quickly went over that, and when it reached about $100 they said it didn’t have to be cash only. Around $300 Brian Aker said that they’d give whoever won credits in a new command, SHOW CONTRIBUTORS. Well, when they said that I knew I HAD to have my name in the source code.

I mean, dude, my NAME in the SOURCE CODE!!! But then again, this is an open source application, I could just spend some time and write a patch.

I’ve been saving for my wedding next June (14 months away) so when I bid $500, I said, “hey, I don’t need flowers for my wedding.” (My entire wedding budget is $5,000, so spending 10% of that on my name in the source code was, I felt, worth it.)

The bidding stalled at $775, so I asked, “Will MySQL match what is raised?” And indeed, if the bidding reached $1,000 then MySQL would donate $800. So then Boyd Hemphill (wearing the “practice safe hex” T-shirt) walked up to the front, plunked down $20 and said, “I’m giving cash to help make up the $225 difference. Who else will help?”

And people started giving cash, and the bidding increased. I bid $900, and Ronald Bradford bid $1,000. That was the top bid, so he won the T-shirt, but the MySQL folks were nice enough to say if I donated the $900 I was willing to, I’d also get my name in the SHOW CONTRIBUTORS function. So I did!

And that is how it happened.

In other news:

40% of the people who took an exam on Tuesday passed. That means 60% failed — which is a lot, although it was mentioned that probably many people took the tutorials, got the free exam, and just tried it, not caring if they failed or not because it was free.

I passed both certification exams, so now I’m MySQL certified! And I stumped Brian Aker with a question about what rpl_recovery_rank was, and won an ipod nano!
Formed a MySQL Quiz team
Met all the requirements for the MySQL Quiz
Took a Certification exam

everyone root for Team Prokrasti Nation!
I was told that teams had to have a physical instantiation of a mascot, treatment so I said, hygiene “maybe I’ll knit something.” Well, pfizer I didn’t knit something, but I did hand-craft an origami butterfly for Team Prokrasti Nation’s mascot:

(click picture for larger image).

Oh, and I won a fun game from O’Reilly for submitting speaker evaluations.
talk by Roland Mallmann

MaxDB is older than I am, site in 1977 started at University of Berlin. Owned by SAP today. Today it’s open source under GPL, purchase or commercial license from SAP or MySQL AB.

Why Max DB is so great:
Low cost of ownership
Few config parameters
no size estimates for indvidual db objects

no reorg — space management done automatically — space no longer needed is returned immediately to the db, data occupied vs. free (holes) ration is highest as possible. This is done by matching logical pages to physical on disk with the Converter, and I/O and space management.

Space management done automatically
No reorganization is needed (ie, OPTIMIZE TABLE)
Gaps are not allowed, therefore updates and deletes are in place, and sorts happen AFTER an insertion.
Space freed is immediately returned to DB
Done by Converter, matches logical pages to physical disk.
Data is stored in B* Trees (b star tree) for almost all objects (Tables, indexes, secondary indexes, BLOBs)

Concurrent asynchronous I/O
Manages free blocks
Auto balancing of disk I/O
Savepoints
Backup Integration (including incremental)
Segmentation of the data cache
A 10 minutes cycle of changes flushed to disk
Flushing data pages to disk is spread out over the 10 minutes

Online Backup and Restore
Consistent backups, no need to apply logs
Savepoint issued before db backup, savepoint includes undo information for remaining open transactions.
Can do incremental, full data, or log backup
can restore, restore from a medium, or backup from history, or backup to a point in time.

Snapshots
Can make complete database backup
Can make a snapshot for replication
Can make incremental on master and restore snapshot on replication as a backup strategy (as long as there isn’t a newer snapshot, because then incremental backup logs are reset)

Standby Database
A standby is made possible using log shipping.
Master and slave share backup media (shared disk)
Init once with complete master backup
Redo available logs

In case of emergency: start slave, back up last log piece from master in case it hasn’t been shipped. Redo all ‘open’ log backups (should be none), redo final piece, start slave, it’s now the master!

Synchronization Manager
no permanent attention required
unattended desktop/laptop installation and operation

database snapshot functionality!

Some of these may be conflicting, therapy not applicable to everyone.

1) think horizontal — everything, patient not just the web servers. Micro optimizations are boring, as or other details
2) benchmarking techniques;. Not “how fast” but “how many”. test force, not speed.
3) bigger and faster vertical scaling is the enemy.
4) horizontal scaling = add another box
5) implementation, scale your system a few times, but scale your ARCHITECTUREa dozens or hundreds of time.
6) start from the beginning with architecture implementation.
7) don’t have “The server” for anything
8) stateless good, stateful bad
9) “shared nothing” good
10) don’t keep state within app server
11) caching good.
12) generate static pages periodically, works well for not millions of pages or changes.
13) cache full output in application
14) include cookies in the “cache key” so diff browsers can get diff info too
15) use cache when this, not when that
16) use regexp to insert customized content into the cahed page
17) set Expires header to control cache times, or rewrite rule to generate page if the cached file does not exist (rails does this)
18) if content is dynamic this does not work, but great for caching “dynamic” images
19) parial pages — pre-generate static page snippets, have handler just assemble pieces.
20) cache little snippets, ie sidebar
21) don’t spend more time managing the cadche than you sav
22) cache data that’s too slow to query, fetch, calc.
23) generate page from cached data
24) use same data to generate api responss
25) moves load to web servers
26) start with things you hit all the time
27) if you don’t use it, don’t cache it, check db logs
28) don’t depend on MySQL Query cache unless it actually helps
29) local file system not so good because you copy page for every server
30) use process memory, not shared
31) mysql cache table — id is the “cache key” type is the “namespace”, metadata for things like headers for cached http responses; purge_key to make it easier to delete data from cache (make it an index, too, primary index on id,type, expire index on expire field) fields
32) why 31 fails, how do you load balance, what if mysql server died, now no cache
33) but you can use mysql scaling techniques to deal, like dual-master replication
34) use memcached, like lj, slashdot, wikipedia — memory based, linux 2.6(epoll) or FreeBsD(kqueue), low overhead for lots of cxns, no master, simple!
35) how to scale the db horizontally, use MySQL, use replication to share the load, write to one master, read from many slaves, good for heavy read apps (or insert delayed, if you don’t need to write right away) — check out “High Performance MySQL”
36) relay slave replication if too much bandwidth on the master, use a replication slave to replicate to other slaves.
37) writing does not scale with replication — all servers need to do the same writes. 5.1’s row-level replication might help.
38) so partition the data, divide and conquer. separate cluster for different data sets
39) if you can’t divide, use flexible partitioning, global server keeps track for which “cluster” has what info. auto_increment columns only in the “global master”. Aggressively cache “global master” data.
40) If you use a master-master setup like 39, then you don’t have replication slaves, no latency from commit to data being available. if you are careful you can write to both masters. Make each user always use the same master, so primary keys won’t be messed up. If one master fails, use the other one.
41) don’t be afraid of the data duplication monster. use summary tables, to avoid things like COUNT(*) and GROUP BY. do it once, put result into a table — do this periodically, or do it when the data is inserted. Or data affecting a “user” and a “group” goes into both the “user” and “group” partitions (clusters). so it’s duplicating data.
42) but you can go further, and use summary dbs! copy data into special dbs optimized for special queries, ie FULLTEXT searches, anything spanning more than one or all clusters, different dbs for different latency requirements, ie RSS feeds from a replicated slave db — RSS feeds can be late).
43) save data to multiple “partitions” like the application doing manual replication — app writes to 2 places OR last_updated and deleted columns, use triggers to add to “replication_queue” table, background program to copy data based on queue table or last_updated column
44) if you’re running oracle, move read operations to MySQL with this manual replication idea. Good way to sneak MySQL into an oracle shop.
45) make everything repeatable, build summary and load scripts so they can restart or run again — also have one trusted eata place, so summaries and copies can be (re)created from there.

BREATHE! HALFWAY THERE!!

46) use innodb because it’s more robust. except for big read-only tables, high volume streaming tables (logging), lcoked tables or INSERT DELAYED, specialized engines for special needs, and more engines in the future — but for now, InnoDB
47) Multiple MySQL instances — run diff instances for diff workloads, even if they share the same server. moving to separate hardware is easier, of course. optimize the server instance for the workload. e4asy to set up with instance manager or mysqld_multi, and there are init scripts that support the instance manager.
48) asynchronous data loading when you can — if you’re updating counts or loading logs, send updates through Spread (or whatever messaging something) to a daemon loading data. Don’t update for each request (ie, counts), do it every 1000 updates, or every few minutes. This helps if db loses net connection, the frontend keeps running! or if you want to lock tables, etc.
49) preload, dump and process — let the servers pre-process, as much as possible. dump never changing data structures to js files for the client to cache (postal data maybe), or dump to memory, or use SQLite, or BerkeleyDB and rsync to each webserver, or mysql replica on webserver
50) stored procedures are dangerous because they’re not horizontal, more work than just adding a webserver– only use if it saves the db work (ie send 5 rows to app instead of 5,000 and parsing in app)
51) reconsider persistent db connections because it requires a thread = memory, all httpd processes talk to all dbs, lots of caching might mean you don’t need main db, mysql cxns are fast so why not just reopen?
52) innodb_file_per_table, so OPTIMIZE TABLE clears unused space. innodb_buffer_pool_soze set to 80% of total mem (dedicated mysql server). innodb_flush_log_at_trx_commit, innodb_log_file_size
53) have metadata in db, store images in filesystem, but then how do you replicate? or store images in myisam tables, split up so tables don’t get bigger than 4G, so if gets corrupt fewer problems. metadata table might specify what table it’s in. include last modified date in metadata, and use in URLs to optimize caching, ie with squid: /images/$timestamp/$id.jpg
54) do everything in unicode
55) UTC for everything
56) STRICT_TRANS_TABLE so MySQL is picky about bad input and does not just turn it to NULL or zero.
57) Don’t overwork the DB — dbs don’t easily scale like web servers
58) STATELESS. don’t make cookie id’s easy to guess, or sequential, etc. don’t save state on one server only, save it on every one. put the data in the db, don’t put it in the cookie, that duplicates efforts. important data into db, so it gets saved, unimportant transient data puts in memcache, SMALL data in cookie. a shopping cart would go in db, background color goes in cookie, and last viewed items go in memcache
59) to make cookies safer, use checksums and timestamps to validate cookies. Encryption usually a waste of cycles.
60) use resources wisely. balance how you use hardware — use memory to save I/O or CPU, don’t swap memory to disk EVER.
61) do the work in parallel — split work into smaller pieces and run on different boxes. send sub-requests off as soon as possible and do other stuff in the meantime.
62) light processes for light tasks — thin proxy servers for “network buffers”, goes between the user and your heavier backend application. Use httpd with mod_proxy, mod_backhand. the proxy does the ‘net work, and fewer httpd processes are needed to do the real work, this saves memory and db connections. proxies can also server static files and cache responses. Avoid starting main app as root. Load balancing, and very important if your background processes are “heavy”. Very EASY to set up a light process. ProxyPreserveHostOn in apache 2
63) job queues — use queues, AJAX can make this easy. webserver submits job to database “queue”, first avail worker picks up first job, and sends result to queue. or ue gearman, Spread, MQ/Java Messaging Service(?)
64) log http requests to a database! log all 4xx and 5xx requests, great to see which requests are slow or fast. but only log 1-2% of all requests. Time::HiRes in Perl, microseconds from gettimeofday system call.
65) get good deals on servers http://www.siliconmechanics.com, server vendor of lj and others.

IN SUMMARY: HORIZONTAL GOOD, VERTICAL BAD

for jobs: ask@develooper.com (jobs, moonlighters, perl/mysql etc)
slides will be up at http://develooper.com/talks/
Phew! That was a lot of fast typing (60 words per minute, baby!). Ask is smart, but QUICK!!!! His slides will be VERY useful when they appear. He said there were 53 tips, but I numbered each new line (and not smartly with OL and LI) and I have more than that…

This post dedicated to Edwin DeSouza.

Un-tuned SQL or stored procedures often fail to scale as table volumes increase, plague inefficiency increases exponentially with size.

Tune SQL/stored procedures and then buy new hardware.

use EXPLAIN to help optimize queries. Also use the slow query log.

EXPLAIN EXTENDED shows sql that was actually used — ie, optimizer may rewrite query, so it’s a neat tool.

you can always give optimizer hints, but they’re not recommended — keep checking them as your app grows — STRAIGHT_JOIN, FORCE INDEX, USE INDEX, and one other one.

SHOW STATUS gives you status variables. innodb_buffer_pool_read_requests and innodb_data_read will show how much data is being read from the buffer pool vs. data.

Index isn’t always used, if more than 20% or so of rows, MySQL will use a full table scan. There’s usually a range where MySQL will choose a full table scan when an index is more appropriate, or vice versa, so that’s when you’d use hints. Hey, nobody’s perfect!

think indexes — joining tables of non-trivial size Subqueries ( [NOT] EXISTS, [NOT] IN) in WHERE clause. Use index to avoid a sort, use “covering” indexes.

Establish the best set of multi-column indexes along with singular indexes.

Derived tables (subqueries in FROM cause) can’t use an index. VIEWs with UNION or GROUP BY also can’t use index — all these use TEMPTABLE view algorithm. (temp table created, and then reads from temp table).

Sorts can be improved by increasing memory (sort_buffer_size) or using an index.

Use procedures to:

  • Avoid self joins
  • Correlated updates (subqueries accessing same data)

Performance of SQL within a stored routine that dominates the performance. When SQL is tuned, optimize the routine using traditional techniques:

  • only put what’s needed in a loop
  • stop testing when you know the answer
  • order tests by most likely first

Recursion:

  • only allowed in procedures, not functions
  • depth controlled by max_sp_recursion_depth
  • iterative alternatives are almost always faster and scaleable

TRIGGERS
non-trivial (12% at least) to even simplest trigger. No trigger should EVER contain expensive SQL, because they are done for each row.

Quest free software for MySQL — http://www.quest.com/mysql/
So last night, viagra approved during a break in the quiz show (where Prokrasti Nation had a good showing, case as did the other teams — Recreational Evil, neuropathist Peeps, and Safe Hex) we bid on the T-shirt that had the signatures of all the speakers at the conference. All the proceeds were to go to the EFF, so it’s a good cause.

They announced it was cash only, so I looked in my wallet. $33. Well, the bidding quickly went over that, and when it reached about $100 they said it didn’t have to be cash only. Around $300 Brian Aker said that they’d give whoever won credits in a new command, SHOW CONTRIBUTORS. Well, when they said that I knew I HAD to have my name in the source code.

I mean, dude, my NAME in the SOURCE CODE!!! But then again, this is an open source application, I could just spend some time and write a patch.

I’ve been saving for my wedding next June (14 months away) so when I bid $500, I said, “hey, I don’t need flowers for my wedding.” (My entire wedding budget is $5,000, so spending 10% of that on my name in the source code was, I felt, worth it.)

The bidding stalled at $775, so I asked, “Will MySQL match what is raised?” And indeed, if the bidding reached $1,000 then MySQL would donate $800. So then Boyd Hemphill (wearing the “practice safe hex” T-shirt) walked up to the front, plunked down $20 and said, “I’m giving cash to help make up the $225 difference. Who else will help?”

And people started giving cash, and the bidding increased. I bid $900, and Ronald Bradford bid $1,000. That was the top bid, so he won the T-shirt, but the MySQL folks were nice enough to say if I donated the $900 I was willing to, I’d also get my name in the SHOW CONTRIBUTORS function. So I did!

And that is how it happened.

In other news:

40% of the people who took an exam on Tuesday passed. That means 60% failed — which is a lot, although it was mentioned that probably many people took the tutorials, got the free exam, and just tried it, not caring if they failed or not because it was free.

I passed both certification exams, so now I’m MySQL certified! And I stumped Brian Aker with a question about what rpl_recovery_rank was, and won an ipod nano!
by Mitch Kapor

Wikipedia uses MySQL as their backend. Wikipedia is known among geeks, sick but hasn’t quite hit society at large, more about but probably will soon. What lessons can we learn from Wikipedia? People who hear about the concept of wikipedia say “It can’t possibly work — an encyclopedia written by volunteers, that is completely open?”

But we know it does. It’s increasingly becoming the web page of choice for a wide range of factual topics. [as a side note, I go there when I hear of something and know nothing about it, and I get a good overview]

The mainstream media has been skeptical about Wikipedia, and makes stories about it. People sometimes will put untrue facts in Wikipedia, so mainstream media siezes on it as proof that Wikipedia does not work. But we know that that is the exception, not the rule.

So how and why does it work, if it’s so counter-intuitive? Most people have erroneous assumptions about how the world is, so that’s why people think it won’t work.

For example — “anyone can edit any article at any time” — this feels dangerous and uncontrolled to people. Why would you trust an encyclopedia like that? But the radical openness actually helps. Wikipedia is “more open” than open source, because there’s less technological barrier to entry, and there is less wait (no compiling) — a change is in right away.

Myths:
Someone has to be in charge. Why is authority required to guarantee quality? Open source developers know that there isn’t a single authority that checks everything, etc. Many people, for instance, volunteer to sysadmin Wikipedia — there’s no schedule, and yet every problem gets resolved. So even the operations of Wikipedia is freeform. Let’s say that again: even the operations of Wikipedia are done in a “do what you can, when you can” mentality.

How can you trust information without experts? Who’s the certified authority? But you know what? Not everyone who is in authority makes the right decisions, or does the right thing; not all experts have the right answers. Do we fear that the “radical openness” will lead to anarchy and chaos in society itself?

Maybe this is an opportunity?

There are lots of mistakes. Sure there are, but the next day, they are fixed! You can’t say that for a printed encyclopedia. When problems are brought to light, it’s an opportunity to change, not something to be chagrined about.

Wikipedia beat news organizations by HOURS when Cardinal Ratzenberger was named the new pope. This is because there were many articles put up before the fact about who might be named pope, what their qualifications were, and when the new pope was named, a 2-sentence edit to the page was all that was needed.

The Wikipedia coverage of Hurricane Katrina in New Orleans, Louisiana, US was one of the best, because it lets anyone edit anything.

In 1978, the Apple II came out with 32K of memory. But the idea was to change who uses the computer. Non-technical people could interact with complex machinery like a computer. That changed the way the world used computing machines.

In 1982, Lotus 1-2-3 was a tool implemented that unified how business was done. In 1992, UUNET came out (one of the first internet ISP’s, for businesses only). This started the idea that every business might want to connect to the internet. And there were people who thought global connectivity was too radical and would never work and would not be important.

In 1995 Real Networks streaming media — radio over the internet. In 2005, Mozilla/Firefox got big, and 2006 is Wikipedia’s year. Not to become rich, but that “collaborative knowledge will produce works of incredible economic value”.

What has to happen for Wikipedia to succeed?
It’s the community that makes it strong. The many people who are active editors are “the soul of Wikipedia.”

The vision of Wikipedia “A free encyclopedia of the world’s knowledge, for the world’s people, in their own languages.” The vision came first! Basically folks looked for experts and tried to figure it out, and finally just opened it up, and that was when the magic happened.

Wikipedian editors knew each other at the first gathering in Frankfurt, Germany (the 2nd one will be in Cambridge, MA USA!)

People are in Wikipedia because they WANT to be, not because they have to be. There’s no monetary incentive. This conflicts with our stereotype that nobody willingly does work; we all do it because our bosses tell us to; because we need the paycheck; etc. “Moral leadership by example” — the opposite of marshalling the troops. There are only 2 paid employees of Wikipedia! (“They don’t teach this at Harvard Business School, as far as I can tell”)

The communities have leadership and values, they’re just not handed down from above.

Values:
Be nice! Be respectful. Make your opponent’s case for him or her. Let them know you understand their side, you just don’t agree with it. The right thing to do is to find the parts of the opponent’s point of view you agree with and edit them in yourself. It’s not about “I’m better than you,” it’s about “I have good points, so do you, here’s what I agree with.” (this does not work all the time, but it does work most of it)

NPOV is the single most common acronym. Wikipedians believe in the “neutral point of view”

Practices How Wikipedians do things — do not just criticize, improve. If you can, fix it, if you can’t, go to the Talk page and say, “I’m not sure this is right, for these reasons, but I don’t know how to fix it.”

Real-time peer review Many people get a feed of changed articles and so it is self-policing, like a neighborhood crime watch to make sure pages don’t get vandalized.

Dispute resolution practices and policies

All of these have to work — the technology isn’t hard, it’s the VALUES that have to work.

The challenge of alien invaders For instance, political candidates’ offices spun their candidates’ entries to make them look better, but they were tracked by IP and held accountable for their actions. So the history is VERY useful. This will only get worse, the more and more popular Wikipedia becomes.

Wikipedia technology, apparently isn’t that great. Not the MySQL backend, but the code. It needs to get better for Wikipedia to be more successful.

Business opportunities Wikia is a roll-your-own Wikipedia — ie, for the “ultimate X site” to have more in-depth information about topics. This is a for-profit startup.

“We are imperfect, we are always going to screw up.” Kapor does not believe in “technologist’s rapture” — that is, that one answer is the end-all, be-all. There will always be room for improvement.


My comments:

Think of the Mining Company, and About.com. That had volunteer ‘experts’ and failed.

Look at MySQL itself — it thrives because of community.

Personally, I recommend James Surowiecki’s “The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations”.
Formed a MySQL Quiz team
Met all the requirements for the MySQL Quiz
Took a Certification exam

everyone root for Team Prokrasti Nation!
I was told that teams had to have a physical instantiation of a mascot, treatment so I said, hygiene “maybe I’ll knit something.” Well, pfizer I didn’t knit something, but I did hand-craft an origami butterfly for Team Prokrasti Nation’s mascot:

(click picture for larger image).

Oh, and I won a fun game from O’Reilly for submitting speaker evaluations.
talk by Roland Mallmann

MaxDB is older than I am, site in 1977 started at University of Berlin. Owned by SAP today. Today it’s open source under GPL, purchase or commercial license from SAP or MySQL AB.

Why Max DB is so great:
Low cost of ownership
Few config parameters
no size estimates for indvidual db objects

no reorg — space management done automatically — space no longer needed is returned immediately to the db, data occupied vs. free (holes) ration is highest as possible. This is done by matching logical pages to physical on disk with the Converter, and I/O and space management.

Space management done automatically
No reorganization is needed (ie, OPTIMIZE TABLE)
Gaps are not allowed, therefore updates and deletes are in place, and sorts happen AFTER an insertion.
Space freed is immediately returned to DB
Done by Converter, matches logical pages to physical disk.
Data is stored in B* Trees (b star tree) for almost all objects (Tables, indexes, secondary indexes, BLOBs)

Concurrent asynchronous I/O
Manages free blocks
Auto balancing of disk I/O
Savepoints
Backup Integration (including incremental)
Segmentation of the data cache
A 10 minutes cycle of changes flushed to disk
Flushing data pages to disk is spread out over the 10 minutes

Online Backup and Restore
Consistent backups, no need to apply logs
Savepoint issued before db backup, savepoint includes undo information for remaining open transactions.
Can do incremental, full data, or log backup
can restore, restore from a medium, or backup from history, or backup to a point in time.

Snapshots
Can make complete database backup
Can make a snapshot for replication
Can make incremental on master and restore snapshot on replication as a backup strategy (as long as there isn’t a newer snapshot, because then incremental backup logs are reset)

Standby Database
A standby is made possible using log shipping.
Master and slave share backup media (shared disk)
Init once with complete master backup
Redo available logs

In case of emergency: start slave, back up last log piece from master in case it hasn’t been shipped. Redo all ‘open’ log backups (should be none), redo final piece, start slave, it’s now the master!

Synchronization Manager
no permanent attention required
unattended desktop/laptop installation and operation

database snapshot functionality!

Some of these may be conflicting, therapy not applicable to everyone.

1) think horizontal — everything, patient not just the web servers. Micro optimizations are boring, as or other details
2) benchmarking techniques;. Not “how fast” but “how many”. test force, not speed.
3) bigger and faster vertical scaling is the enemy.
4) horizontal scaling = add another box
5) implementation, scale your system a few times, but scale your ARCHITECTUREa dozens or hundreds of time.
6) start from the beginning with architecture implementation.
7) don’t have “The server” for anything
8) stateless good, stateful bad
9) “shared nothing” good
10) don’t keep state within app server
11) caching good.
12) generate static pages periodically, works well for not millions of pages or changes.
13) cache full output in application
14) include cookies in the “cache key” so diff browsers can get diff info too
15) use cache when this, not when that
16) use regexp to insert customized content into the cahed page
17) set Expires header to control cache times, or rewrite rule to generate page if the cached file does not exist (rails does this)
18) if content is dynamic this does not work, but great for caching “dynamic” images
19) parial pages — pre-generate static page snippets, have handler just assemble pieces.
20) cache little snippets, ie sidebar
21) don’t spend more time managing the cadche than you sav
22) cache data that’s too slow to query, fetch, calc.
23) generate page from cached data
24) use same data to generate api responss
25) moves load to web servers
26) start with things you hit all the time
27) if you don’t use it, don’t cache it, check db logs
28) don’t depend on MySQL Query cache unless it actually helps
29) local file system not so good because you copy page for every server
30) use process memory, not shared
31) mysql cache table — id is the “cache key” type is the “namespace”, metadata for things like headers for cached http responses; purge_key to make it easier to delete data from cache (make it an index, too, primary index on id,type, expire index on expire field) fields
32) why 31 fails, how do you load balance, what if mysql server died, now no cache
33) but you can use mysql scaling techniques to deal, like dual-master replication
34) use memcached, like lj, slashdot, wikipedia — memory based, linux 2.6(epoll) or FreeBsD(kqueue), low overhead for lots of cxns, no master, simple!
35) how to scale the db horizontally, use MySQL, use replication to share the load, write to one master, read from many slaves, good for heavy read apps (or insert delayed, if you don’t need to write right away) — check out “High Performance MySQL”
36) relay slave replication if too much bandwidth on the master, use a replication slave to replicate to other slaves.
37) writing does not scale with replication — all servers need to do the same writes. 5.1’s row-level replication might help.
38) so partition the data, divide and conquer. separate cluster for different data sets
39) if you can’t divide, use flexible partitioning, global server keeps track for which “cluster” has what info. auto_increment columns only in the “global master”. Aggressively cache “global master” data.
40) If you use a master-master setup like 39, then you don’t have replication slaves, no latency from commit to data being available. if you are careful you can write to both masters. Make each user always use the same master, so primary keys won’t be messed up. If one master fails, use the other one.
41) don’t be afraid of the data duplication monster. use summary tables, to avoid things like COUNT(*) and GROUP BY. do it once, put result into a table — do this periodically, or do it when the data is inserted. Or data affecting a “user” and a “group” goes into both the “user” and “group” partitions (clusters). so it’s duplicating data.
42) but you can go further, and use summary dbs! copy data into special dbs optimized for special queries, ie FULLTEXT searches, anything spanning more than one or all clusters, different dbs for different latency requirements, ie RSS feeds from a replicated slave db — RSS feeds can be late).
43) save data to multiple “partitions” like the application doing manual replication — app writes to 2 places OR last_updated and deleted columns, use triggers to add to “replication_queue” table, background program to copy data based on queue table or last_updated column
44) if you’re running oracle, move read operations to MySQL with this manual replication idea. Good way to sneak MySQL into an oracle shop.
45) make everything repeatable, build summary and load scripts so they can restart or run again — also have one trusted eata place, so summaries and copies can be (re)created from there.

BREATHE! HALFWAY THERE!!

46) use innodb because it’s more robust. except for big read-only tables, high volume streaming tables (logging), lcoked tables or INSERT DELAYED, specialized engines for special needs, and more engines in the future — but for now, InnoDB
47) Multiple MySQL instances — run diff instances for diff workloads, even if they share the same server. moving to separate hardware is easier, of course. optimize the server instance for the workload. e4asy to set up with instance manager or mysqld_multi, and there are init scripts that support the instance manager.
48) asynchronous data loading when you can — if you’re updating counts or loading logs, send updates through Spread (or whatever messaging something) to a daemon loading data. Don’t update for each request (ie, counts), do it every 1000 updates, or every few minutes. This helps if db loses net connection, the frontend keeps running! or if you want to lock tables, etc.
49) preload, dump and process — let the servers pre-process, as much as possible. dump never changing data structures to js files for the client to cache (postal data maybe), or dump to memory, or use SQLite, or BerkeleyDB and rsync to each webserver, or mysql replica on webserver
50) stored procedures are dangerous because they’re not horizontal, more work than just adding a webserver– only use if it saves the db work (ie send 5 rows to app instead of 5,000 and parsing in app)
51) reconsider persistent db connections because it requires a thread = memory, all httpd processes talk to all dbs, lots of caching might mean you don’t need main db, mysql cxns are fast so why not just reopen?
52) innodb_file_per_table, so OPTIMIZE TABLE clears unused space. innodb_buffer_pool_soze set to 80% of total mem (dedicated mysql server). innodb_flush_log_at_trx_commit, innodb_log_file_size
53) have metadata in db, store images in filesystem, but then how do you replicate? or store images in myisam tables, split up so tables don’t get bigger than 4G, so if gets corrupt fewer problems. metadata table might specify what table it’s in. include last modified date in metadata, and use in URLs to optimize caching, ie with squid: /images/$timestamp/$id.jpg
54) do everything in unicode
55) UTC for everything
56) STRICT_TRANS_TABLE so MySQL is picky about bad input and does not just turn it to NULL or zero.
57) Don’t overwork the DB — dbs don’t easily scale like web servers
58) STATELESS. don’t make cookie id’s easy to guess, or sequential, etc. don’t save state on one server only, save it on every one. put the data in the db, don’t put it in the cookie, that duplicates efforts. important data into db, so it gets saved, unimportant transient data puts in memcache, SMALL data in cookie. a shopping cart would go in db, background color goes in cookie, and last viewed items go in memcache
59) to make cookies safer, use checksums and timestamps to validate cookies. Encryption usually a waste of cycles.
60) use resources wisely. balance how you use hardware — use memory to save I/O or CPU, don’t swap memory to disk EVER.
61) do the work in parallel — split work into smaller pieces and run on different boxes. send sub-requests off as soon as possible and do other stuff in the meantime.
62) light processes for light tasks — thin proxy servers for “network buffers”, goes between the user and your heavier backend application. Use httpd with mod_proxy, mod_backhand. the proxy does the ‘net work, and fewer httpd processes are needed to do the real work, this saves memory and db connections. proxies can also server static files and cache responses. Avoid starting main app as root. Load balancing, and very important if your background processes are “heavy”. Very EASY to set up a light process. ProxyPreserveHostOn in apache 2
63) job queues — use queues, AJAX can make this easy. webserver submits job to database “queue”, first avail worker picks up first job, and sends result to queue. or ue gearman, Spread, MQ/Java Messaging Service(?)
64) log http requests to a database! log all 4xx and 5xx requests, great to see which requests are slow or fast. but only log 1-2% of all requests. Time::HiRes in Perl, microseconds from gettimeofday system call.
65) get good deals on servers http://www.siliconmechanics.com, server vendor of lj and others.

IN SUMMARY: HORIZONTAL GOOD, VERTICAL BAD

for jobs: ask@develooper.com (jobs, moonlighters, perl/mysql etc)
slides will be up at http://develooper.com/talks/
Phew! That was a lot of fast typing (60 words per minute, baby!). Ask is smart, but QUICK!!!! His slides will be VERY useful when they appear. He said there were 53 tips, but I numbered each new line (and not smartly with OL and LI) and I have more than that…

This post dedicated to Edwin DeSouza.

Un-tuned SQL or stored procedures often fail to scale as table volumes increase, plague inefficiency increases exponentially with size.

Tune SQL/stored procedures and then buy new hardware.

use EXPLAIN to help optimize queries. Also use the slow query log.

EXPLAIN EXTENDED shows sql that was actually used — ie, optimizer may rewrite query, so it’s a neat tool.

you can always give optimizer hints, but they’re not recommended — keep checking them as your app grows — STRAIGHT_JOIN, FORCE INDEX, USE INDEX, and one other one.

SHOW STATUS gives you status variables. innodb_buffer_pool_read_requests and innodb_data_read will show how much data is being read from the buffer pool vs. data.

Index isn’t always used, if more than 20% or so of rows, MySQL will use a full table scan. There’s usually a range where MySQL will choose a full table scan when an index is more appropriate, or vice versa, so that’s when you’d use hints. Hey, nobody’s perfect!

think indexes — joining tables of non-trivial size Subqueries ( [NOT] EXISTS, [NOT] IN) in WHERE clause. Use index to avoid a sort, use “covering” indexes.

Establish the best set of multi-column indexes along with singular indexes.

Derived tables (subqueries in FROM cause) can’t use an index. VIEWs with UNION or GROUP BY also can’t use index — all these use TEMPTABLE view algorithm. (temp table created, and then reads from temp table).

Sorts can be improved by increasing memory (sort_buffer_size) or using an index.

Use procedures to:

  • Avoid self joins
  • Correlated updates (subqueries accessing same data)

Performance of SQL within a stored routine that dominates the performance. When SQL is tuned, optimize the routine using traditional techniques:

  • only put what’s needed in a loop
  • stop testing when you know the answer
  • order tests by most likely first

Recursion:

  • only allowed in procedures, not functions
  • depth controlled by max_sp_recursion_depth
  • iterative alternatives are almost always faster and scaleable

TRIGGERS
non-trivial (12% at least) to even simplest trigger. No trigger should EVER contain expensive SQL, because they are done for each row.

Quest free software for MySQL — http://www.quest.com/mysql/
So last night, viagra approved during a break in the quiz show (where Prokrasti Nation had a good showing, case as did the other teams — Recreational Evil, neuropathist Peeps, and Safe Hex) we bid on the T-shirt that had the signatures of all the speakers at the conference. All the proceeds were to go to the EFF, so it’s a good cause.

They announced it was cash only, so I looked in my wallet. $33. Well, the bidding quickly went over that, and when it reached about $100 they said it didn’t have to be cash only. Around $300 Brian Aker said that they’d give whoever won credits in a new command, SHOW CONTRIBUTORS. Well, when they said that I knew I HAD to have my name in the source code.

I mean, dude, my NAME in the SOURCE CODE!!! But then again, this is an open source application, I could just spend some time and write a patch.

I’ve been saving for my wedding next June (14 months away) so when I bid $500, I said, “hey, I don’t need flowers for my wedding.” (My entire wedding budget is $5,000, so spending 10% of that on my name in the source code was, I felt, worth it.)

The bidding stalled at $775, so I asked, “Will MySQL match what is raised?” And indeed, if the bidding reached $1,000 then MySQL would donate $800. So then Boyd Hemphill (wearing the “practice safe hex” T-shirt) walked up to the front, plunked down $20 and said, “I’m giving cash to help make up the $225 difference. Who else will help?”

And people started giving cash, and the bidding increased. I bid $900, and Ronald Bradford bid $1,000. That was the top bid, so he won the T-shirt, but the MySQL folks were nice enough to say if I donated the $900 I was willing to, I’d also get my name in the SHOW CONTRIBUTORS function. So I did!

And that is how it happened.

In other news:

40% of the people who took an exam on Tuesday passed. That means 60% failed — which is a lot, although it was mentioned that probably many people took the tutorials, got the free exam, and just tried it, not caring if they failed or not because it was free.

I passed both certification exams, so now I’m MySQL certified! And I stumped Brian Aker with a question about what rpl_recovery_rank was, and won an ipod nano!
by Mitch Kapor

Wikipedia uses MySQL as their backend. Wikipedia is known among geeks, sick but hasn’t quite hit society at large, more about but probably will soon. What lessons can we learn from Wikipedia? People who hear about the concept of wikipedia say “It can’t possibly work — an encyclopedia written by volunteers, that is completely open?”

But we know it does. It’s increasingly becoming the web page of choice for a wide range of factual topics. [as a side note, I go there when I hear of something and know nothing about it, and I get a good overview]

The mainstream media has been skeptical about Wikipedia, and makes stories about it. People sometimes will put untrue facts in Wikipedia, so mainstream media siezes on it as proof that Wikipedia does not work. But we know that that is the exception, not the rule.

So how and why does it work, if it’s so counter-intuitive? Most people have erroneous assumptions about how the world is, so that’s why people think it won’t work.

For example — “anyone can edit any article at any time” — this feels dangerous and uncontrolled to people. Why would you trust an encyclopedia like that? But the radical openness actually helps. Wikipedia is “more open” than open source, because there’s less technological barrier to entry, and there is less wait (no compiling) — a change is in right away.

Myths:
Someone has to be in charge. Why is authority required to guarantee quality? Open source developers know that there isn’t a single authority that checks everything, etc. Many people, for instance, volunteer to sysadmin Wikipedia — there’s no schedule, and yet every problem gets resolved. So even the operations of Wikipedia is freeform. Let’s say that again: even the operations of Wikipedia are done in a “do what you can, when you can” mentality.

How can you trust information without experts? Who’s the certified authority? But you know what? Not everyone who is in authority makes the right decisions, or does the right thing; not all experts have the right answers. Do we fear that the “radical openness” will lead to anarchy and chaos in society itself?

Maybe this is an opportunity?

There are lots of mistakes. Sure there are, but the next day, they are fixed! You can’t say that for a printed encyclopedia. When problems are brought to light, it’s an opportunity to change, not something to be chagrined about.

Wikipedia beat news organizations by HOURS when Cardinal Ratzenberger was named the new pope. This is because there were many articles put up before the fact about who might be named pope, what their qualifications were, and when the new pope was named, a 2-sentence edit to the page was all that was needed.

The Wikipedia coverage of Hurricane Katrina in New Orleans, Louisiana, US was one of the best, because it lets anyone edit anything.

In 1978, the Apple II came out with 32K of memory. But the idea was to change who uses the computer. Non-technical people could interact with complex machinery like a computer. That changed the way the world used computing machines.

In 1982, Lotus 1-2-3 was a tool implemented that unified how business was done. In 1992, UUNET came out (one of the first internet ISP’s, for businesses only). This started the idea that every business might want to connect to the internet. And there were people who thought global connectivity was too radical and would never work and would not be important.

In 1995 Real Networks streaming media — radio over the internet. In 2005, Mozilla/Firefox got big, and 2006 is Wikipedia’s year. Not to become rich, but that “collaborative knowledge will produce works of incredible economic value”.

What has to happen for Wikipedia to succeed?
It’s the community that makes it strong. The many people who are active editors are “the soul of Wikipedia.”

The vision of Wikipedia “A free encyclopedia of the world’s knowledge, for the world’s people, in their own languages.” The vision came first! Basically folks looked for experts and tried to figure it out, and finally just opened it up, and that was when the magic happened.

Wikipedian editors knew each other at the first gathering in Frankfurt, Germany (the 2nd one will be in Cambridge, MA USA!)

People are in Wikipedia because they WANT to be, not because they have to be. There’s no monetary incentive. This conflicts with our stereotype that nobody willingly does work; we all do it because our bosses tell us to; because we need the paycheck; etc. “Moral leadership by example” — the opposite of marshalling the troops. There are only 2 paid employees of Wikipedia! (“They don’t teach this at Harvard Business School, as far as I can tell”)

The communities have leadership and values, they’re just not handed down from above.

Values:
Be nice! Be respectful. Make your opponent’s case for him or her. Let them know you understand their side, you just don’t agree with it. The right thing to do is to find the parts of the opponent’s point of view you agree with and edit them in yourself. It’s not about “I’m better than you,” it’s about “I have good points, so do you, here’s what I agree with.” (this does not work all the time, but it does work most of it)

NPOV is the single most common acronym. Wikipedians believe in the “neutral point of view”

Practices How Wikipedians do things — do not just criticize, improve. If you can, fix it, if you can’t, go to the Talk page and say, “I’m not sure this is right, for these reasons, but I don’t know how to fix it.”

Real-time peer review Many people get a feed of changed articles and so it is self-policing, like a neighborhood crime watch to make sure pages don’t get vandalized.

Dispute resolution practices and policies

All of these have to work — the technology isn’t hard, it’s the VALUES that have to work.

The challenge of alien invaders For instance, political candidates’ offices spun their candidates’ entries to make them look better, but they were tracked by IP and held accountable for their actions. So the history is VERY useful. This will only get worse, the more and more popular Wikipedia becomes.

Wikipedia technology, apparently isn’t that great. Not the MySQL backend, but the code. It needs to get better for Wikipedia to be more successful.

Business opportunities Wikia is a roll-your-own Wikipedia — ie, for the “ultimate X site” to have more in-depth information about topics. This is a for-profit startup.

“We are imperfect, we are always going to screw up.” Kapor does not believe in “technologist’s rapture” — that is, that one answer is the end-all, be-all. There will always be room for improvement.


My comments:

Think of the Mining Company, and About.com. That had volunteer ‘experts’ and failed.

Look at MySQL itself — it thrives because of community.

Personally, I recommend James Surowiecki’s “The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations”.
Brian Aker — Director of Architecture

MySQL Server
SQL-Based, information pills aiming to be SQL-03 compliant. Stable, web scalable, order easy to use, modular, high-performance RDBMS.

Client Library Support
Libmysql c-library (think OCI)
JDBC — type IV
ODBC
Perl DBD::DBI
PHP (built in)
ADO.Net, OleDB, Ruby, Erlang, Eiffel, Smalltalk and more provided by third parties.

So great because you can connect in many ways to different storage engines
Architecture

sql/ is for kernel
mysys/ is the portable runtime library. MySQL is ported to 52 platforms
(was 53 but Brian deleted OS/2 last week πŸ™‚ ). The portable runtime library wraps commands like pwrite, unlinking and renaming files so that operating system does not matter.
mysql-test/ is for your test cases — run “mysql testrun –record”. Can take SQL from web applications, put them in test files, and run against upgraded mysql to see if the new versions break the code.

Definitions:
Storage Engine — code that stores data. Common API, so storage engine engineers don’t need to know SQL.
Handler an instance of a class. It controls the storage engine. Handler instantiates an object fetching data in the db. ie, handler is an instance of a table, etc.
Handlerton the structure. Storage engine needs to know things like a db was created, or a transaction was committed. So this is how we can talk to the storage engine. Not complete in 5.0, but complete in 5.1 — no need to hack with handler.cc code to put in your own hooks.

What you will need:
All code is written in simplified C++
An example storage engine (there’s one provided with MySQL). You can create something you need just by changing the skeleton.
Your own ideas

Is this done?
sure — Friendster, Google, Yahoo, Solid, Oracle.

Server’s kernel.

All database changes go to parser, then rewriter, then optimizer, then handler, then storage engine. (DML goes to query cache first, DDL skips that part.)

So what is a storage engine? “Data formats on disk” or “on the web” etc.

MySQL server instance talks to handlerton or handler itself.

[there was create handlerton code & explanation here]

Jeremy Zawodny’s “Writing a Simple Storage Engine”

Storage Engine Methods
storage/example is 5.1 directory to find it
in 5.0, in sql directory.

Table control
::create()
::open()
::close()
::delete()
(need to open, create, close and drop tables)

great examples!

Scan Path:
Locks -> Info (tell us about metadata of table, if no records, no need to go on. If few, don’t bother with indexes. If lots, use that info later in cost-based optimizer) -> Read Init -> Read Rows -> Cleanup

Trace of calls:
ha_example::store_lock
ha_example::external_lock (used to call flock() )
ha_example::info (all information from SHOW STATUS comes from this)
ha_example::rnd_init (can tell it if it’s going to fetch random or sequential blocks)
ha_example::extra Cash record in HA_rrnd()
ha_example::rnd_next
ha_example::rnd_next
ha_example::rnd_next
ha_example::extra End cacheing of records (def)
ha_example::external_lock
ha_example::extra Reset database to after open

only myisam uses all these extra Cash records, because Monty wrote it. InnoDB uses about 6 of them.

Check out push-down system for transactional engines.

Delete a row needs improvement to interface.

More — transaction methods (simple one in FEDERATED), bulk load methods, defrag methods, and more — read handler.h and documentation.

Lot that can be done in autoconf.

sql/Makefile.am — add your include and source file
sql/handler.h — register your handler
sql/mysql_priv.h — set up your variable for SHOW VARIABLES
sql/handler.cc — add yourself to the handler create list
sql/mysqld.cc — set up your variable for SHOW

grep for example in files, because that word is only used for the storage engine example.

More info:
sql/ha_example.h or .cc
Look at docs on mysql.com
forums.mysql.com
lists.mysql.com (internals)
MySQL Network (business opportunities available)
Formed a MySQL Quiz team
Met all the requirements for the MySQL Quiz
Took a Certification exam

everyone root for Team Prokrasti Nation!
I was told that teams had to have a physical instantiation of a mascot, treatment so I said, hygiene “maybe I’ll knit something.” Well, pfizer I didn’t knit something, but I did hand-craft an origami butterfly for Team Prokrasti Nation’s mascot:

(click picture for larger image).

Oh, and I won a fun game from O’Reilly for submitting speaker evaluations.
talk by Roland Mallmann

MaxDB is older than I am, site in 1977 started at University of Berlin. Owned by SAP today. Today it’s open source under GPL, purchase or commercial license from SAP or MySQL AB.

Why Max DB is so great:
Low cost of ownership
Few config parameters
no size estimates for indvidual db objects

no reorg — space management done automatically — space no longer needed is returned immediately to the db, data occupied vs. free (holes) ration is highest as possible. This is done by matching logical pages to physical on disk with the Converter, and I/O and space management.

Space management done automatically
No reorganization is needed (ie, OPTIMIZE TABLE)
Gaps are not allowed, therefore updates and deletes are in place, and sorts happen AFTER an insertion.
Space freed is immediately returned to DB
Done by Converter, matches logical pages to physical disk.
Data is stored in B* Trees (b star tree) for almost all objects (Tables, indexes, secondary indexes, BLOBs)

Concurrent asynchronous I/O
Manages free blocks
Auto balancing of disk I/O
Savepoints
Backup Integration (including incremental)
Segmentation of the data cache
A 10 minutes cycle of changes flushed to disk
Flushing data pages to disk is spread out over the 10 minutes

Online Backup and Restore
Consistent backups, no need to apply logs
Savepoint issued before db backup, savepoint includes undo information for remaining open transactions.
Can do incremental, full data, or log backup
can restore, restore from a medium, or backup from history, or backup to a point in time.

Snapshots
Can make complete database backup
Can make a snapshot for replication
Can make incremental on master and restore snapshot on replication as a backup strategy (as long as there isn’t a newer snapshot, because then incremental backup logs are reset)

Standby Database
A standby is made possible using log shipping.
Master and slave share backup media (shared disk)
Init once with complete master backup
Redo available logs

In case of emergency: start slave, back up last log piece from master in case it hasn’t been shipped. Redo all ‘open’ log backups (should be none), redo final piece, start slave, it’s now the master!

Synchronization Manager
no permanent attention required
unattended desktop/laptop installation and operation

database snapshot functionality!

Some of these may be conflicting, therapy not applicable to everyone.

1) think horizontal — everything, patient not just the web servers. Micro optimizations are boring, as or other details
2) benchmarking techniques;. Not “how fast” but “how many”. test force, not speed.
3) bigger and faster vertical scaling is the enemy.
4) horizontal scaling = add another box
5) implementation, scale your system a few times, but scale your ARCHITECTUREa dozens or hundreds of time.
6) start from the beginning with architecture implementation.
7) don’t have “The server” for anything
8) stateless good, stateful bad
9) “shared nothing” good
10) don’t keep state within app server
11) caching good.
12) generate static pages periodically, works well for not millions of pages or changes.
13) cache full output in application
14) include cookies in the “cache key” so diff browsers can get diff info too
15) use cache when this, not when that
16) use regexp to insert customized content into the cahed page
17) set Expires header to control cache times, or rewrite rule to generate page if the cached file does not exist (rails does this)
18) if content is dynamic this does not work, but great for caching “dynamic” images
19) parial pages — pre-generate static page snippets, have handler just assemble pieces.
20) cache little snippets, ie sidebar
21) don’t spend more time managing the cadche than you sav
22) cache data that’s too slow to query, fetch, calc.
23) generate page from cached data
24) use same data to generate api responss
25) moves load to web servers
26) start with things you hit all the time
27) if you don’t use it, don’t cache it, check db logs
28) don’t depend on MySQL Query cache unless it actually helps
29) local file system not so good because you copy page for every server
30) use process memory, not shared
31) mysql cache table — id is the “cache key” type is the “namespace”, metadata for things like headers for cached http responses; purge_key to make it easier to delete data from cache (make it an index, too, primary index on id,type, expire index on expire field) fields
32) why 31 fails, how do you load balance, what if mysql server died, now no cache
33) but you can use mysql scaling techniques to deal, like dual-master replication
34) use memcached, like lj, slashdot, wikipedia — memory based, linux 2.6(epoll) or FreeBsD(kqueue), low overhead for lots of cxns, no master, simple!
35) how to scale the db horizontally, use MySQL, use replication to share the load, write to one master, read from many slaves, good for heavy read apps (or insert delayed, if you don’t need to write right away) — check out “High Performance MySQL”
36) relay slave replication if too much bandwidth on the master, use a replication slave to replicate to other slaves.
37) writing does not scale with replication — all servers need to do the same writes. 5.1’s row-level replication might help.
38) so partition the data, divide and conquer. separate cluster for different data sets
39) if you can’t divide, use flexible partitioning, global server keeps track for which “cluster” has what info. auto_increment columns only in the “global master”. Aggressively cache “global master” data.
40) If you use a master-master setup like 39, then you don’t have replication slaves, no latency from commit to data being available. if you are careful you can write to both masters. Make each user always use the same master, so primary keys won’t be messed up. If one master fails, use the other one.
41) don’t be afraid of the data duplication monster. use summary tables, to avoid things like COUNT(*) and GROUP BY. do it once, put result into a table — do this periodically, or do it when the data is inserted. Or data affecting a “user” and a “group” goes into both the “user” and “group” partitions (clusters). so it’s duplicating data.
42) but you can go further, and use summary dbs! copy data into special dbs optimized for special queries, ie FULLTEXT searches, anything spanning more than one or all clusters, different dbs for different latency requirements, ie RSS feeds from a replicated slave db — RSS feeds can be late).
43) save data to multiple “partitions” like the application doing manual replication — app writes to 2 places OR last_updated and deleted columns, use triggers to add to “replication_queue” table, background program to copy data based on queue table or last_updated column
44) if you’re running oracle, move read operations to MySQL with this manual replication idea. Good way to sneak MySQL into an oracle shop.
45) make everything repeatable, build summary and load scripts so they can restart or run again — also have one trusted eata place, so summaries and copies can be (re)created from there.

BREATHE! HALFWAY THERE!!

46) use innodb because it’s more robust. except for big read-only tables, high volume streaming tables (logging), lcoked tables or INSERT DELAYED, specialized engines for special needs, and more engines in the future — but for now, InnoDB
47) Multiple MySQL instances — run diff instances for diff workloads, even if they share the same server. moving to separate hardware is easier, of course. optimize the server instance for the workload. e4asy to set up with instance manager or mysqld_multi, and there are init scripts that support the instance manager.
48) asynchronous data loading when you can — if you’re updating counts or loading logs, send updates through Spread (or whatever messaging something) to a daemon loading data. Don’t update for each request (ie, counts), do it every 1000 updates, or every few minutes. This helps if db loses net connection, the frontend keeps running! or if you want to lock tables, etc.
49) preload, dump and process — let the servers pre-process, as much as possible. dump never changing data structures to js files for the client to cache (postal data maybe), or dump to memory, or use SQLite, or BerkeleyDB and rsync to each webserver, or mysql replica on webserver
50) stored procedures are dangerous because they’re not horizontal, more work than just adding a webserver– only use if it saves the db work (ie send 5 rows to app instead of 5,000 and parsing in app)
51) reconsider persistent db connections because it requires a thread = memory, all httpd processes talk to all dbs, lots of caching might mean you don’t need main db, mysql cxns are fast so why not just reopen?
52) innodb_file_per_table, so OPTIMIZE TABLE clears unused space. innodb_buffer_pool_soze set to 80% of total mem (dedicated mysql server). innodb_flush_log_at_trx_commit, innodb_log_file_size
53) have metadata in db, store images in filesystem, but then how do you replicate? or store images in myisam tables, split up so tables don’t get bigger than 4G, so if gets corrupt fewer problems. metadata table might specify what table it’s in. include last modified date in metadata, and use in URLs to optimize caching, ie with squid: /images/$timestamp/$id.jpg
54) do everything in unicode
55) UTC for everything
56) STRICT_TRANS_TABLE so MySQL is picky about bad input and does not just turn it to NULL or zero.
57) Don’t overwork the DB — dbs don’t easily scale like web servers
58) STATELESS. don’t make cookie id’s easy to guess, or sequential, etc. don’t save state on one server only, save it on every one. put the data in the db, don’t put it in the cookie, that duplicates efforts. important data into db, so it gets saved, unimportant transient data puts in memcache, SMALL data in cookie. a shopping cart would go in db, background color goes in cookie, and last viewed items go in memcache
59) to make cookies safer, use checksums and timestamps to validate cookies. Encryption usually a waste of cycles.
60) use resources wisely. balance how you use hardware — use memory to save I/O or CPU, don’t swap memory to disk EVER.
61) do the work in parallel — split work into smaller pieces and run on different boxes. send sub-requests off as soon as possible and do other stuff in the meantime.
62) light processes for light tasks — thin proxy servers for “network buffers”, goes between the user and your heavier backend application. Use httpd with mod_proxy, mod_backhand. the proxy does the ‘net work, and fewer httpd processes are needed to do the real work, this saves memory and db connections. proxies can also server static files and cache responses. Avoid starting main app as root. Load balancing, and very important if your background processes are “heavy”. Very EASY to set up a light process. ProxyPreserveHostOn in apache 2
63) job queues — use queues, AJAX can make this easy. webserver submits job to database “queue”, first avail worker picks up first job, and sends result to queue. or ue gearman, Spread, MQ/Java Messaging Service(?)
64) log http requests to a database! log all 4xx and 5xx requests, great to see which requests are slow or fast. but only log 1-2% of all requests. Time::HiRes in Perl, microseconds from gettimeofday system call.
65) get good deals on servers http://www.siliconmechanics.com, server vendor of lj and others.

IN SUMMARY: HORIZONTAL GOOD, VERTICAL BAD

for jobs: ask@develooper.com (jobs, moonlighters, perl/mysql etc)
slides will be up at http://develooper.com/talks/
Phew! That was a lot of fast typing (60 words per minute, baby!). Ask is smart, but QUICK!!!! His slides will be VERY useful when they appear. He said there were 53 tips, but I numbered each new line (and not smartly with OL and LI) and I have more than that…

This post dedicated to Edwin DeSouza.

Un-tuned SQL or stored procedures often fail to scale as table volumes increase, plague inefficiency increases exponentially with size.

Tune SQL/stored procedures and then buy new hardware.

use EXPLAIN to help optimize queries. Also use the slow query log.

EXPLAIN EXTENDED shows sql that was actually used — ie, optimizer may rewrite query, so it’s a neat tool.

you can always give optimizer hints, but they’re not recommended — keep checking them as your app grows — STRAIGHT_JOIN, FORCE INDEX, USE INDEX, and one other one.

SHOW STATUS gives you status variables. innodb_buffer_pool_read_requests and innodb_data_read will show how much data is being read from the buffer pool vs. data.

Index isn’t always used, if more than 20% or so of rows, MySQL will use a full table scan. There’s usually a range where MySQL will choose a full table scan when an index is more appropriate, or vice versa, so that’s when you’d use hints. Hey, nobody’s perfect!

think indexes — joining tables of non-trivial size Subqueries ( [NOT] EXISTS, [NOT] IN) in WHERE clause. Use index to avoid a sort, use “covering” indexes.

Establish the best set of multi-column indexes along with singular indexes.

Derived tables (subqueries in FROM cause) can’t use an index. VIEWs with UNION or GROUP BY also can’t use index — all these use TEMPTABLE view algorithm. (temp table created, and then reads from temp table).

Sorts can be improved by increasing memory (sort_buffer_size) or using an index.

Use procedures to:

  • Avoid self joins
  • Correlated updates (subqueries accessing same data)

Performance of SQL within a stored routine that dominates the performance. When SQL is tuned, optimize the routine using traditional techniques:

  • only put what’s needed in a loop
  • stop testing when you know the answer
  • order tests by most likely first

Recursion:

  • only allowed in procedures, not functions
  • depth controlled by max_sp_recursion_depth
  • iterative alternatives are almost always faster and scaleable

TRIGGERS
non-trivial (12% at least) to even simplest trigger. No trigger should EVER contain expensive SQL, because they are done for each row.

Quest free software for MySQL — http://www.quest.com/mysql/
So last night, viagra approved during a break in the quiz show (where Prokrasti Nation had a good showing, case as did the other teams — Recreational Evil, neuropathist Peeps, and Safe Hex) we bid on the T-shirt that had the signatures of all the speakers at the conference. All the proceeds were to go to the EFF, so it’s a good cause.

They announced it was cash only, so I looked in my wallet. $33. Well, the bidding quickly went over that, and when it reached about $100 they said it didn’t have to be cash only. Around $300 Brian Aker said that they’d give whoever won credits in a new command, SHOW CONTRIBUTORS. Well, when they said that I knew I HAD to have my name in the source code.

I mean, dude, my NAME in the SOURCE CODE!!! But then again, this is an open source application, I could just spend some time and write a patch.

I’ve been saving for my wedding next June (14 months away) so when I bid $500, I said, “hey, I don’t need flowers for my wedding.” (My entire wedding budget is $5,000, so spending 10% of that on my name in the source code was, I felt, worth it.)

The bidding stalled at $775, so I asked, “Will MySQL match what is raised?” And indeed, if the bidding reached $1,000 then MySQL would donate $800. So then Boyd Hemphill (wearing the “practice safe hex” T-shirt) walked up to the front, plunked down $20 and said, “I’m giving cash to help make up the $225 difference. Who else will help?”

And people started giving cash, and the bidding increased. I bid $900, and Ronald Bradford bid $1,000. That was the top bid, so he won the T-shirt, but the MySQL folks were nice enough to say if I donated the $900 I was willing to, I’d also get my name in the SHOW CONTRIBUTORS function. So I did!

And that is how it happened.

In other news:

40% of the people who took an exam on Tuesday passed. That means 60% failed — which is a lot, although it was mentioned that probably many people took the tutorials, got the free exam, and just tried it, not caring if they failed or not because it was free.

I passed both certification exams, so now I’m MySQL certified! And I stumped Brian Aker with a question about what rpl_recovery_rank was, and won an ipod nano!
by Mitch Kapor

Wikipedia uses MySQL as their backend. Wikipedia is known among geeks, sick but hasn’t quite hit society at large, more about but probably will soon. What lessons can we learn from Wikipedia? People who hear about the concept of wikipedia say “It can’t possibly work — an encyclopedia written by volunteers, that is completely open?”

But we know it does. It’s increasingly becoming the web page of choice for a wide range of factual topics. [as a side note, I go there when I hear of something and know nothing about it, and I get a good overview]

The mainstream media has been skeptical about Wikipedia, and makes stories about it. People sometimes will put untrue facts in Wikipedia, so mainstream media siezes on it as proof that Wikipedia does not work. But we know that that is the exception, not the rule.

So how and why does it work, if it’s so counter-intuitive? Most people have erroneous assumptions about how the world is, so that’s why people think it won’t work.

For example — “anyone can edit any article at any time” — this feels dangerous and uncontrolled to people. Why would you trust an encyclopedia like that? But the radical openness actually helps. Wikipedia is “more open” than open source, because there’s less technological barrier to entry, and there is less wait (no compiling) — a change is in right away.

Myths:
Someone has to be in charge. Why is authority required to guarantee quality? Open source developers know that there isn’t a single authority that checks everything, etc. Many people, for instance, volunteer to sysadmin Wikipedia — there’s no schedule, and yet every problem gets resolved. So even the operations of Wikipedia is freeform. Let’s say that again: even the operations of Wikipedia are done in a “do what you can, when you can” mentality.

How can you trust information without experts? Who’s the certified authority? But you know what? Not everyone who is in authority makes the right decisions, or does the right thing; not all experts have the right answers. Do we fear that the “radical openness” will lead to anarchy and chaos in society itself?

Maybe this is an opportunity?

There are lots of mistakes. Sure there are, but the next day, they are fixed! You can’t say that for a printed encyclopedia. When problems are brought to light, it’s an opportunity to change, not something to be chagrined about.

Wikipedia beat news organizations by HOURS when Cardinal Ratzenberger was named the new pope. This is because there were many articles put up before the fact about who might be named pope, what their qualifications were, and when the new pope was named, a 2-sentence edit to the page was all that was needed.

The Wikipedia coverage of Hurricane Katrina in New Orleans, Louisiana, US was one of the best, because it lets anyone edit anything.

In 1978, the Apple II came out with 32K of memory. But the idea was to change who uses the computer. Non-technical people could interact with complex machinery like a computer. That changed the way the world used computing machines.

In 1982, Lotus 1-2-3 was a tool implemented that unified how business was done. In 1992, UUNET came out (one of the first internet ISP’s, for businesses only). This started the idea that every business might want to connect to the internet. And there were people who thought global connectivity was too radical and would never work and would not be important.

In 1995 Real Networks streaming media — radio over the internet. In 2005, Mozilla/Firefox got big, and 2006 is Wikipedia’s year. Not to become rich, but that “collaborative knowledge will produce works of incredible economic value”.

What has to happen for Wikipedia to succeed?
It’s the community that makes it strong. The many people who are active editors are “the soul of Wikipedia.”

The vision of Wikipedia “A free encyclopedia of the world’s knowledge, for the world’s people, in their own languages.” The vision came first! Basically folks looked for experts and tried to figure it out, and finally just opened it up, and that was when the magic happened.

Wikipedian editors knew each other at the first gathering in Frankfurt, Germany (the 2nd one will be in Cambridge, MA USA!)

People are in Wikipedia because they WANT to be, not because they have to be. There’s no monetary incentive. This conflicts with our stereotype that nobody willingly does work; we all do it because our bosses tell us to; because we need the paycheck; etc. “Moral leadership by example” — the opposite of marshalling the troops. There are only 2 paid employees of Wikipedia! (“They don’t teach this at Harvard Business School, as far as I can tell”)

The communities have leadership and values, they’re just not handed down from above.

Values:
Be nice! Be respectful. Make your opponent’s case for him or her. Let them know you understand their side, you just don’t agree with it. The right thing to do is to find the parts of the opponent’s point of view you agree with and edit them in yourself. It’s not about “I’m better than you,” it’s about “I have good points, so do you, here’s what I agree with.” (this does not work all the time, but it does work most of it)

NPOV is the single most common acronym. Wikipedians believe in the “neutral point of view”

Practices How Wikipedians do things — do not just criticize, improve. If you can, fix it, if you can’t, go to the Talk page and say, “I’m not sure this is right, for these reasons, but I don’t know how to fix it.”

Real-time peer review Many people get a feed of changed articles and so it is self-policing, like a neighborhood crime watch to make sure pages don’t get vandalized.

Dispute resolution practices and policies

All of these have to work — the technology isn’t hard, it’s the VALUES that have to work.

The challenge of alien invaders For instance, political candidates’ offices spun their candidates’ entries to make them look better, but they were tracked by IP and held accountable for their actions. So the history is VERY useful. This will only get worse, the more and more popular Wikipedia becomes.

Wikipedia technology, apparently isn’t that great. Not the MySQL backend, but the code. It needs to get better for Wikipedia to be more successful.

Business opportunities Wikia is a roll-your-own Wikipedia — ie, for the “ultimate X site” to have more in-depth information about topics. This is a for-profit startup.

“We are imperfect, we are always going to screw up.” Kapor does not believe in “technologist’s rapture” — that is, that one answer is the end-all, be-all. There will always be room for improvement.


My comments:

Think of the Mining Company, and About.com. That had volunteer ‘experts’ and failed.

Look at MySQL itself — it thrives because of community.

Personally, I recommend James Surowiecki’s “The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations”.
Brian Aker — Director of Architecture

MySQL Server
SQL-Based, information pills aiming to be SQL-03 compliant. Stable, web scalable, order easy to use, modular, high-performance RDBMS.

Client Library Support
Libmysql c-library (think OCI)
JDBC — type IV
ODBC
Perl DBD::DBI
PHP (built in)
ADO.Net, OleDB, Ruby, Erlang, Eiffel, Smalltalk and more provided by third parties.

So great because you can connect in many ways to different storage engines
Architecture

sql/ is for kernel
mysys/ is the portable runtime library. MySQL is ported to 52 platforms
(was 53 but Brian deleted OS/2 last week πŸ™‚ ). The portable runtime library wraps commands like pwrite, unlinking and renaming files so that operating system does not matter.
mysql-test/ is for your test cases — run “mysql testrun –record”. Can take SQL from web applications, put them in test files, and run against upgraded mysql to see if the new versions break the code.

Definitions:
Storage Engine — code that stores data. Common API, so storage engine engineers don’t need to know SQL.
Handler an instance of a class. It controls the storage engine. Handler instantiates an object fetching data in the db. ie, handler is an instance of a table, etc.
Handlerton the structure. Storage engine needs to know things like a db was created, or a transaction was committed. So this is how we can talk to the storage engine. Not complete in 5.0, but complete in 5.1 — no need to hack with handler.cc code to put in your own hooks.

What you will need:
All code is written in simplified C++
An example storage engine (there’s one provided with MySQL). You can create something you need just by changing the skeleton.
Your own ideas

Is this done?
sure — Friendster, Google, Yahoo, Solid, Oracle.

Server’s kernel.

All database changes go to parser, then rewriter, then optimizer, then handler, then storage engine. (DML goes to query cache first, DDL skips that part.)

So what is a storage engine? “Data formats on disk” or “on the web” etc.

MySQL server instance talks to handlerton or handler itself.

[there was create handlerton code & explanation here]

Jeremy Zawodny’s “Writing a Simple Storage Engine”

Storage Engine Methods
storage/example is 5.1 directory to find it
in 5.0, in sql directory.

Table control
::create()
::open()
::close()
::delete()
(need to open, create, close and drop tables)

great examples!

Scan Path:
Locks -> Info (tell us about metadata of table, if no records, no need to go on. If few, don’t bother with indexes. If lots, use that info later in cost-based optimizer) -> Read Init -> Read Rows -> Cleanup

Trace of calls:
ha_example::store_lock
ha_example::external_lock (used to call flock() )
ha_example::info (all information from SHOW STATUS comes from this)
ha_example::rnd_init (can tell it if it’s going to fetch random or sequential blocks)
ha_example::extra Cash record in HA_rrnd()
ha_example::rnd_next
ha_example::rnd_next
ha_example::rnd_next
ha_example::extra End cacheing of records (def)
ha_example::external_lock
ha_example::extra Reset database to after open

only myisam uses all these extra Cash records, because Monty wrote it. InnoDB uses about 6 of them.

Check out push-down system for transactional engines.

Delete a row needs improvement to interface.

More — transaction methods (simple one in FEDERATED), bulk load methods, defrag methods, and more — read handler.h and documentation.

Lot that can be done in autoconf.

sql/Makefile.am — add your include and source file
sql/handler.h — register your handler
sql/mysql_priv.h — set up your variable for SHOW VARIABLES
sql/handler.cc — add yourself to the handler create list
sql/mysqld.cc — set up your variable for SHOW

grep for example in files, because that word is only used for the storage engine example.

More info:
sql/ha_example.h or .cc
Look at docs on mysql.com
forums.mysql.com
lists.mysql.com (internals)
MySQL Network (business opportunities available)
Sergei Golubchik

Plugin API is new in MySQL 5.1, cheap so you can plugin your own API commands.

Built-in versioning
Easy to maintain and distribute
Generic — allows you to load any functionality into mysqld

Plugins can add new status variables for SHOW STATUS
For the future, plugins will allow you to add new commandline options, new server variables, and new SQL keywords.

Plugin administration:
INSTALL PLUGIN foo SONAME 'bar.so'
UNINSTALL PLUGIN foo
SHOW PLUGINS
INFORMATION_SCHEMA.PLUGINS
–plugin-dir=/path/to/dir

Plugin types:
Storage Engines
Fulltext Search Parsers
code changes text before it goes to the FULLTEXT data
can be used to search non-plaintext data formats, such as pdf, doc, mp3
can be used to parse Chinese and Japanese text.

Plugin types future:
UDFs — versioning, securyt, ease of use
Lang Modules for stores procedures (perl, php…)
Pluggable Authentication (ie, LDAP)
Fulltext Search Engines (replace the whole thing!)
Maybe new SQL commands?

Fulltext Parser plugin has to take the object, extract text, split into words, postprocess, and then the words are stored into the index. Currently the extract part does nothing because fulltext is used on strings only right now, and postprocessing is pruning out words < min length or > max length

So, a small parser plugin that allows external files to be indexes — you give mysql path.

Make a new directory (say, from_file). copy the template files for the fulltext files.

mv plugin_example.c from_file.c
it’s a Makefile.am, so change the libdir and the SOURCES:w
Change configure.in AC_INIT file to use from_file

look for mysql_declare_plugin — it contains
type of plugin
descriptor (what’s different for different types)
name
author
description
init function (on load)
de-init function (on unload)
version
status variables

Chapter 28.2 in MySQL 5.1 documentation has a complete walk-through of all the structures.

automake using your new .am file, and make install, and then load the plugin.

on http://forge.mysql.com you can find plugins at Database software -> MySQL specific -> Plugins

SHOW PLUGINS shows name, status (ie, active), type (Storage Engine or fulltext parser) and library (filename, if blank, it’s built in).

To use,
CREATE TABLE t1 (file text, fulltext(file) with parser from_file);
insert into t1 values('etc/passwd'),('/etc/services');
select * from t1; shows we have filenames only in the table.
select * from t1 where match file against ('root'); will give the result of the filename.

If you try to uninstall a plugin that is on an open table, it will have a status of “deleted” but the table that’s open will still use it. flush tables or a closed connection, and now your table is invalid. πŸ™‚ So be careful when uninstalling plugins to find the tables using them FIRST, drop the tables, and then uninstall.

This plugin does not load the data in the file every time a query runs. This plugin should be able to handle a load_file() for a filename OR a filename. If you change the file and need to reindex, you have to do REPAIR TABLE.

——————————
I loved having the example, and this stuff seems so easy to just implement. Sure, it’s the featureset itself that’s difficult, but …
Formed a MySQL Quiz team
Met all the requirements for the MySQL Quiz
Took a Certification exam

everyone root for Team Prokrasti Nation!
I was told that teams had to have a physical instantiation of a mascot, treatment so I said, hygiene “maybe I’ll knit something.” Well, pfizer I didn’t knit something, but I did hand-craft an origami butterfly for Team Prokrasti Nation’s mascot:

(click picture for larger image).

Oh, and I won a fun game from O’Reilly for submitting speaker evaluations.
talk by Roland Mallmann

MaxDB is older than I am, site in 1977 started at University of Berlin. Owned by SAP today. Today it’s open source under GPL, purchase or commercial license from SAP or MySQL AB.

Why Max DB is so great:
Low cost of ownership
Few config parameters
no size estimates for indvidual db objects

no reorg — space management done automatically — space no longer needed is returned immediately to the db, data occupied vs. free (holes) ration is highest as possible. This is done by matching logical pages to physical on disk with the Converter, and I/O and space management.

Space management done automatically
No reorganization is needed (ie, OPTIMIZE TABLE)
Gaps are not allowed, therefore updates and deletes are in place, and sorts happen AFTER an insertion.
Space freed is immediately returned to DB
Done by Converter, matches logical pages to physical disk.
Data is stored in B* Trees (b star tree) for almost all objects (Tables, indexes, secondary indexes, BLOBs)

Concurrent asynchronous I/O
Manages free blocks
Auto balancing of disk I/O
Savepoints
Backup Integration (including incremental)
Segmentation of the data cache
A 10 minutes cycle of changes flushed to disk
Flushing data pages to disk is spread out over the 10 minutes

Online Backup and Restore
Consistent backups, no need to apply logs
Savepoint issued before db backup, savepoint includes undo information for remaining open transactions.
Can do incremental, full data, or log backup
can restore, restore from a medium, or backup from history, or backup to a point in time.

Snapshots
Can make complete database backup
Can make a snapshot for replication
Can make incremental on master and restore snapshot on replication as a backup strategy (as long as there isn’t a newer snapshot, because then incremental backup logs are reset)

Standby Database
A standby is made possible using log shipping.
Master and slave share backup media (shared disk)
Init once with complete master backup
Redo available logs

In case of emergency: start slave, back up last log piece from master in case it hasn’t been shipped. Redo all ‘open’ log backups (should be none), redo final piece, start slave, it’s now the master!

Synchronization Manager
no permanent attention required
unattended desktop/laptop installation and operation

database snapshot functionality!

Some of these may be conflicting, therapy not applicable to everyone.

1) think horizontal — everything, patient not just the web servers. Micro optimizations are boring, as or other details
2) benchmarking techniques;. Not “how fast” but “how many”. test force, not speed.
3) bigger and faster vertical scaling is the enemy.
4) horizontal scaling = add another box
5) implementation, scale your system a few times, but scale your ARCHITECTUREa dozens or hundreds of time.
6) start from the beginning with architecture implementation.
7) don’t have “The server” for anything
8) stateless good, stateful bad
9) “shared nothing” good
10) don’t keep state within app server
11) caching good.
12) generate static pages periodically, works well for not millions of pages or changes.
13) cache full output in application
14) include cookies in the “cache key” so diff browsers can get diff info too
15) use cache when this, not when that
16) use regexp to insert customized content into the cahed page
17) set Expires header to control cache times, or rewrite rule to generate page if the cached file does not exist (rails does this)
18) if content is dynamic this does not work, but great for caching “dynamic” images
19) parial pages — pre-generate static page snippets, have handler just assemble pieces.
20) cache little snippets, ie sidebar
21) don’t spend more time managing the cadche than you sav
22) cache data that’s too slow to query, fetch, calc.
23) generate page from cached data
24) use same data to generate api responss
25) moves load to web servers
26) start with things you hit all the time
27) if you don’t use it, don’t cache it, check db logs
28) don’t depend on MySQL Query cache unless it actually helps
29) local file system not so good because you copy page for every server
30) use process memory, not shared
31) mysql cache table — id is the “cache key” type is the “namespace”, metadata for things like headers for cached http responses; purge_key to make it easier to delete data from cache (make it an index, too, primary index on id,type, expire index on expire field) fields
32) why 31 fails, how do you load balance, what if mysql server died, now no cache
33) but you can use mysql scaling techniques to deal, like dual-master replication
34) use memcached, like lj, slashdot, wikipedia — memory based, linux 2.6(epoll) or FreeBsD(kqueue), low overhead for lots of cxns, no master, simple!
35) how to scale the db horizontally, use MySQL, use replication to share the load, write to one master, read from many slaves, good for heavy read apps (or insert delayed, if you don’t need to write right away) — check out “High Performance MySQL”
36) relay slave replication if too much bandwidth on the master, use a replication slave to replicate to other slaves.
37) writing does not scale with replication — all servers need to do the same writes. 5.1’s row-level replication might help.
38) so partition the data, divide and conquer. separate cluster for different data sets
39) if you can’t divide, use flexible partitioning, global server keeps track for which “cluster” has what info. auto_increment columns only in the “global master”. Aggressively cache “global master” data.
40) If you use a master-master setup like 39, then you don’t have replication slaves, no latency from commit to data being available. if you are careful you can write to both masters. Make each user always use the same master, so primary keys won’t be messed up. If one master fails, use the other one.
41) don’t be afraid of the data duplication monster. use summary tables, to avoid things like COUNT(*) and GROUP BY. do it once, put result into a table — do this periodically, or do it when the data is inserted. Or data affecting a “user” and a “group” goes into both the “user” and “group” partitions (clusters). so it’s duplicating data.
42) but you can go further, and use summary dbs! copy data into special dbs optimized for special queries, ie FULLTEXT searches, anything spanning more than one or all clusters, different dbs for different latency requirements, ie RSS feeds from a replicated slave db — RSS feeds can be late).
43) save data to multiple “partitions” like the application doing manual replication — app writes to 2 places OR last_updated and deleted columns, use triggers to add to “replication_queue” table, background program to copy data based on queue table or last_updated column
44) if you’re running oracle, move read operations to MySQL with this manual replication idea. Good way to sneak MySQL into an oracle shop.
45) make everything repeatable, build summary and load scripts so they can restart or run again — also have one trusted eata place, so summaries and copies can be (re)created from there.

BREATHE! HALFWAY THERE!!

46) use innodb because it’s more robust. except for big read-only tables, high volume streaming tables (logging), lcoked tables or INSERT DELAYED, specialized engines for special needs, and more engines in the future — but for now, InnoDB
47) Multiple MySQL instances — run diff instances for diff workloads, even if they share the same server. moving to separate hardware is easier, of course. optimize the server instance for the workload. e4asy to set up with instance manager or mysqld_multi, and there are init scripts that support the instance manager.
48) asynchronous data loading when you can — if you’re updating counts or loading logs, send updates through Spread (or whatever messaging something) to a daemon loading data. Don’t update for each request (ie, counts), do it every 1000 updates, or every few minutes. This helps if db loses net connection, the frontend keeps running! or if you want to lock tables, etc.
49) preload, dump and process — let the servers pre-process, as much as possible. dump never changing data structures to js files for the client to cache (postal data maybe), or dump to memory, or use SQLite, or BerkeleyDB and rsync to each webserver, or mysql replica on webserver
50) stored procedures are dangerous because they’re not horizontal, more work than just adding a webserver– only use if it saves the db work (ie send 5 rows to app instead of 5,000 and parsing in app)
51) reconsider persistent db connections because it requires a thread = memory, all httpd processes talk to all dbs, lots of caching might mean you don’t need main db, mysql cxns are fast so why not just reopen?
52) innodb_file_per_table, so OPTIMIZE TABLE clears unused space. innodb_buffer_pool_soze set to 80% of total mem (dedicated mysql server). innodb_flush_log_at_trx_commit, innodb_log_file_size
53) have metadata in db, store images in filesystem, but then how do you replicate? or store images in myisam tables, split up so tables don’t get bigger than 4G, so if gets corrupt fewer problems. metadata table might specify what table it’s in. include last modified date in metadata, and use in URLs to optimize caching, ie with squid: /images/$timestamp/$id.jpg
54) do everything in unicode
55) UTC for everything
56) STRICT_TRANS_TABLE so MySQL is picky about bad input and does not just turn it to NULL or zero.
57) Don’t overwork the DB — dbs don’t easily scale like web servers
58) STATELESS. don’t make cookie id’s easy to guess, or sequential, etc. don’t save state on one server only, save it on every one. put the data in the db, don’t put it in the cookie, that duplicates efforts. important data into db, so it gets saved, unimportant transient data puts in memcache, SMALL data in cookie. a shopping cart would go in db, background color goes in cookie, and last viewed items go in memcache
59) to make cookies safer, use checksums and timestamps to validate cookies. Encryption usually a waste of cycles.
60) use resources wisely. balance how you use hardware — use memory to save I/O or CPU, don’t swap memory to disk EVER.
61) do the work in parallel — split work into smaller pieces and run on different boxes. send sub-requests off as soon as possible and do other stuff in the meantime.
62) light processes for light tasks — thin proxy servers for “network buffers”, goes between the user and your heavier backend application. Use httpd with mod_proxy, mod_backhand. the proxy does the ‘net work, and fewer httpd processes are needed to do the real work, this saves memory and db connections. proxies can also server static files and cache responses. Avoid starting main app as root. Load balancing, and very important if your background processes are “heavy”. Very EASY to set up a light process. ProxyPreserveHostOn in apache 2
63) job queues — use queues, AJAX can make this easy. webserver submits job to database “queue”, first avail worker picks up first job, and sends result to queue. or ue gearman, Spread, MQ/Java Messaging Service(?)
64) log http requests to a database! log all 4xx and 5xx requests, great to see which requests are slow or fast. but only log 1-2% of all requests. Time::HiRes in Perl, microseconds from gettimeofday system call.
65) get good deals on servers http://www.siliconmechanics.com, server vendor of lj and others.

IN SUMMARY: HORIZONTAL GOOD, VERTICAL BAD

for jobs: ask@develooper.com (jobs, moonlighters, perl/mysql etc)
slides will be up at http://develooper.com/talks/
Phew! That was a lot of fast typing (60 words per minute, baby!). Ask is smart, but QUICK!!!! His slides will be VERY useful when they appear. He said there were 53 tips, but I numbered each new line (and not smartly with OL and LI) and I have more than that…

This post dedicated to Edwin DeSouza.

Un-tuned SQL or stored procedures often fail to scale as table volumes increase, plague inefficiency increases exponentially with size.

Tune SQL/stored procedures and then buy new hardware.

use EXPLAIN to help optimize queries. Also use the slow query log.

EXPLAIN EXTENDED shows sql that was actually used — ie, optimizer may rewrite query, so it’s a neat tool.

you can always give optimizer hints, but they’re not recommended — keep checking them as your app grows — STRAIGHT_JOIN, FORCE INDEX, USE INDEX, and one other one.

SHOW STATUS gives you status variables. innodb_buffer_pool_read_requests and innodb_data_read will show how much data is being read from the buffer pool vs. data.

Index isn’t always used, if more than 20% or so of rows, MySQL will use a full table scan. There’s usually a range where MySQL will choose a full table scan when an index is more appropriate, or vice versa, so that’s when you’d use hints. Hey, nobody’s perfect!

think indexes — joining tables of non-trivial size Subqueries ( [NOT] EXISTS, [NOT] IN) in WHERE clause. Use index to avoid a sort, use “covering” indexes.

Establish the best set of multi-column indexes along with singular indexes.

Derived tables (subqueries in FROM cause) can’t use an index. VIEWs with UNION or GROUP BY also can’t use index — all these use TEMPTABLE view algorithm. (temp table created, and then reads from temp table).

Sorts can be improved by increasing memory (sort_buffer_size) or using an index.

Use procedures to:

  • Avoid self joins
  • Correlated updates (subqueries accessing same data)

Performance of SQL within a stored routine that dominates the performance. When SQL is tuned, optimize the routine using traditional techniques:

  • only put what’s needed in a loop
  • stop testing when you know the answer
  • order tests by most likely first

Recursion:

  • only allowed in procedures, not functions
  • depth controlled by max_sp_recursion_depth
  • iterative alternatives are almost always faster and scaleable

TRIGGERS
non-trivial (12% at least) to even simplest trigger. No trigger should EVER contain expensive SQL, because they are done for each row.

Quest free software for MySQL — http://www.quest.com/mysql/
So last night, viagra approved during a break in the quiz show (where Prokrasti Nation had a good showing, case as did the other teams — Recreational Evil, neuropathist Peeps, and Safe Hex) we bid on the T-shirt that had the signatures of all the speakers at the conference. All the proceeds were to go to the EFF, so it’s a good cause.

They announced it was cash only, so I looked in my wallet. $33. Well, the bidding quickly went over that, and when it reached about $100 they said it didn’t have to be cash only. Around $300 Brian Aker said that they’d give whoever won credits in a new command, SHOW CONTRIBUTORS. Well, when they said that I knew I HAD to have my name in the source code.

I mean, dude, my NAME in the SOURCE CODE!!! But then again, this is an open source application, I could just spend some time and write a patch.

I’ve been saving for my wedding next June (14 months away) so when I bid $500, I said, “hey, I don’t need flowers for my wedding.” (My entire wedding budget is $5,000, so spending 10% of that on my name in the source code was, I felt, worth it.)

The bidding stalled at $775, so I asked, “Will MySQL match what is raised?” And indeed, if the bidding reached $1,000 then MySQL would donate $800. So then Boyd Hemphill (wearing the “practice safe hex” T-shirt) walked up to the front, plunked down $20 and said, “I’m giving cash to help make up the $225 difference. Who else will help?”

And people started giving cash, and the bidding increased. I bid $900, and Ronald Bradford bid $1,000. That was the top bid, so he won the T-shirt, but the MySQL folks were nice enough to say if I donated the $900 I was willing to, I’d also get my name in the SHOW CONTRIBUTORS function. So I did!

And that is how it happened.

In other news:

40% of the people who took an exam on Tuesday passed. That means 60% failed — which is a lot, although it was mentioned that probably many people took the tutorials, got the free exam, and just tried it, not caring if they failed or not because it was free.

I passed both certification exams, so now I’m MySQL certified! And I stumped Brian Aker with a question about what rpl_recovery_rank was, and won an ipod nano!
by Mitch Kapor

Wikipedia uses MySQL as their backend. Wikipedia is known among geeks, sick but hasn’t quite hit society at large, more about but probably will soon. What lessons can we learn from Wikipedia? People who hear about the concept of wikipedia say “It can’t possibly work — an encyclopedia written by volunteers, that is completely open?”

But we know it does. It’s increasingly becoming the web page of choice for a wide range of factual topics. [as a side note, I go there when I hear of something and know nothing about it, and I get a good overview]

The mainstream media has been skeptical about Wikipedia, and makes stories about it. People sometimes will put untrue facts in Wikipedia, so mainstream media siezes on it as proof that Wikipedia does not work. But we know that that is the exception, not the rule.

So how and why does it work, if it’s so counter-intuitive? Most people have erroneous assumptions about how the world is, so that’s why people think it won’t work.

For example — “anyone can edit any article at any time” — this feels dangerous and uncontrolled to people. Why would you trust an encyclopedia like that? But the radical openness actually helps. Wikipedia is “more open” than open source, because there’s less technological barrier to entry, and there is less wait (no compiling) — a change is in right away.

Myths:
Someone has to be in charge. Why is authority required to guarantee quality? Open source developers know that there isn’t a single authority that checks everything, etc. Many people, for instance, volunteer to sysadmin Wikipedia — there’s no schedule, and yet every problem gets resolved. So even the operations of Wikipedia is freeform. Let’s say that again: even the operations of Wikipedia are done in a “do what you can, when you can” mentality.

How can you trust information without experts? Who’s the certified authority? But you know what? Not everyone who is in authority makes the right decisions, or does the right thing; not all experts have the right answers. Do we fear that the “radical openness” will lead to anarchy and chaos in society itself?

Maybe this is an opportunity?

There are lots of mistakes. Sure there are, but the next day, they are fixed! You can’t say that for a printed encyclopedia. When problems are brought to light, it’s an opportunity to change, not something to be chagrined about.

Wikipedia beat news organizations by HOURS when Cardinal Ratzenberger was named the new pope. This is because there were many articles put up before the fact about who might be named pope, what their qualifications were, and when the new pope was named, a 2-sentence edit to the page was all that was needed.

The Wikipedia coverage of Hurricane Katrina in New Orleans, Louisiana, US was one of the best, because it lets anyone edit anything.

In 1978, the Apple II came out with 32K of memory. But the idea was to change who uses the computer. Non-technical people could interact with complex machinery like a computer. That changed the way the world used computing machines.

In 1982, Lotus 1-2-3 was a tool implemented that unified how business was done. In 1992, UUNET came out (one of the first internet ISP’s, for businesses only). This started the idea that every business might want to connect to the internet. And there were people who thought global connectivity was too radical and would never work and would not be important.

In 1995 Real Networks streaming media — radio over the internet. In 2005, Mozilla/Firefox got big, and 2006 is Wikipedia’s year. Not to become rich, but that “collaborative knowledge will produce works of incredible economic value”.

What has to happen for Wikipedia to succeed?
It’s the community that makes it strong. The many people who are active editors are “the soul of Wikipedia.”

The vision of Wikipedia “A free encyclopedia of the world’s knowledge, for the world’s people, in their own languages.” The vision came first! Basically folks looked for experts and tried to figure it out, and finally just opened it up, and that was when the magic happened.

Wikipedian editors knew each other at the first gathering in Frankfurt, Germany (the 2nd one will be in Cambridge, MA USA!)

People are in Wikipedia because they WANT to be, not because they have to be. There’s no monetary incentive. This conflicts with our stereotype that nobody willingly does work; we all do it because our bosses tell us to; because we need the paycheck; etc. “Moral leadership by example” — the opposite of marshalling the troops. There are only 2 paid employees of Wikipedia! (“They don’t teach this at Harvard Business School, as far as I can tell”)

The communities have leadership and values, they’re just not handed down from above.

Values:
Be nice! Be respectful. Make your opponent’s case for him or her. Let them know you understand their side, you just don’t agree with it. The right thing to do is to find the parts of the opponent’s point of view you agree with and edit them in yourself. It’s not about “I’m better than you,” it’s about “I have good points, so do you, here’s what I agree with.” (this does not work all the time, but it does work most of it)

NPOV is the single most common acronym. Wikipedians believe in the “neutral point of view”

Practices How Wikipedians do things — do not just criticize, improve. If you can, fix it, if you can’t, go to the Talk page and say, “I’m not sure this is right, for these reasons, but I don’t know how to fix it.”

Real-time peer review Many people get a feed of changed articles and so it is self-policing, like a neighborhood crime watch to make sure pages don’t get vandalized.

Dispute resolution practices and policies

All of these have to work — the technology isn’t hard, it’s the VALUES that have to work.

The challenge of alien invaders For instance, political candidates’ offices spun their candidates’ entries to make them look better, but they were tracked by IP and held accountable for their actions. So the history is VERY useful. This will only get worse, the more and more popular Wikipedia becomes.

Wikipedia technology, apparently isn’t that great. Not the MySQL backend, but the code. It needs to get better for Wikipedia to be more successful.

Business opportunities Wikia is a roll-your-own Wikipedia — ie, for the “ultimate X site” to have more in-depth information about topics. This is a for-profit startup.

“We are imperfect, we are always going to screw up.” Kapor does not believe in “technologist’s rapture” — that is, that one answer is the end-all, be-all. There will always be room for improvement.


My comments:

Think of the Mining Company, and About.com. That had volunteer ‘experts’ and failed.

Look at MySQL itself — it thrives because of community.

Personally, I recommend James Surowiecki’s “The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations”.
Brian Aker — Director of Architecture

MySQL Server
SQL-Based, information pills aiming to be SQL-03 compliant. Stable, web scalable, order easy to use, modular, high-performance RDBMS.

Client Library Support
Libmysql c-library (think OCI)
JDBC — type IV
ODBC
Perl DBD::DBI
PHP (built in)
ADO.Net, OleDB, Ruby, Erlang, Eiffel, Smalltalk and more provided by third parties.

So great because you can connect in many ways to different storage engines
Architecture

sql/ is for kernel
mysys/ is the portable runtime library. MySQL is ported to 52 platforms
(was 53 but Brian deleted OS/2 last week πŸ™‚ ). The portable runtime library wraps commands like pwrite, unlinking and renaming files so that operating system does not matter.
mysql-test/ is for your test cases — run “mysql testrun –record”. Can take SQL from web applications, put them in test files, and run against upgraded mysql to see if the new versions break the code.

Definitions:
Storage Engine — code that stores data. Common API, so storage engine engineers don’t need to know SQL.
Handler an instance of a class. It controls the storage engine. Handler instantiates an object fetching data in the db. ie, handler is an instance of a table, etc.
Handlerton the structure. Storage engine needs to know things like a db was created, or a transaction was committed. So this is how we can talk to the storage engine. Not complete in 5.0, but complete in 5.1 — no need to hack with handler.cc code to put in your own hooks.

What you will need:
All code is written in simplified C++
An example storage engine (there’s one provided with MySQL). You can create something you need just by changing the skeleton.
Your own ideas

Is this done?
sure — Friendster, Google, Yahoo, Solid, Oracle.

Server’s kernel.

All database changes go to parser, then rewriter, then optimizer, then handler, then storage engine. (DML goes to query cache first, DDL skips that part.)

So what is a storage engine? “Data formats on disk” or “on the web” etc.

MySQL server instance talks to handlerton or handler itself.

[there was create handlerton code & explanation here]

Jeremy Zawodny’s “Writing a Simple Storage Engine”

Storage Engine Methods
storage/example is 5.1 directory to find it
in 5.0, in sql directory.

Table control
::create()
::open()
::close()
::delete()
(need to open, create, close and drop tables)

great examples!

Scan Path:
Locks -> Info (tell us about metadata of table, if no records, no need to go on. If few, don’t bother with indexes. If lots, use that info later in cost-based optimizer) -> Read Init -> Read Rows -> Cleanup

Trace of calls:
ha_example::store_lock
ha_example::external_lock (used to call flock() )
ha_example::info (all information from SHOW STATUS comes from this)
ha_example::rnd_init (can tell it if it’s going to fetch random or sequential blocks)
ha_example::extra Cash record in HA_rrnd()
ha_example::rnd_next
ha_example::rnd_next
ha_example::rnd_next
ha_example::extra End cacheing of records (def)
ha_example::external_lock
ha_example::extra Reset database to after open

only myisam uses all these extra Cash records, because Monty wrote it. InnoDB uses about 6 of them.

Check out push-down system for transactional engines.

Delete a row needs improvement to interface.

More — transaction methods (simple one in FEDERATED), bulk load methods, defrag methods, and more — read handler.h and documentation.

Lot that can be done in autoconf.

sql/Makefile.am — add your include and source file
sql/handler.h — register your handler
sql/mysql_priv.h — set up your variable for SHOW VARIABLES
sql/handler.cc — add yourself to the handler create list
sql/mysqld.cc — set up your variable for SHOW

grep for example in files, because that word is only used for the storage engine example.

More info:
sql/ha_example.h or .cc
Look at docs on mysql.com
forums.mysql.com
lists.mysql.com (internals)
MySQL Network (business opportunities available)
Sergei Golubchik

Plugin API is new in MySQL 5.1, cheap so you can plugin your own API commands.

Built-in versioning
Easy to maintain and distribute
Generic — allows you to load any functionality into mysqld

Plugins can add new status variables for SHOW STATUS
For the future, plugins will allow you to add new commandline options, new server variables, and new SQL keywords.

Plugin administration:
INSTALL PLUGIN foo SONAME 'bar.so'
UNINSTALL PLUGIN foo
SHOW PLUGINS
INFORMATION_SCHEMA.PLUGINS
–plugin-dir=/path/to/dir

Plugin types:
Storage Engines
Fulltext Search Parsers
code changes text before it goes to the FULLTEXT data
can be used to search non-plaintext data formats, such as pdf, doc, mp3
can be used to parse Chinese and Japanese text.

Plugin types future:
UDFs — versioning, securyt, ease of use
Lang Modules for stores procedures (perl, php…)
Pluggable Authentication (ie, LDAP)
Fulltext Search Engines (replace the whole thing!)
Maybe new SQL commands?

Fulltext Parser plugin has to take the object, extract text, split into words, postprocess, and then the words are stored into the index. Currently the extract part does nothing because fulltext is used on strings only right now, and postprocessing is pruning out words < min length or > max length

So, a small parser plugin that allows external files to be indexes — you give mysql path.

Make a new directory (say, from_file). copy the template files for the fulltext files.

mv plugin_example.c from_file.c
it’s a Makefile.am, so change the libdir and the SOURCES:w
Change configure.in AC_INIT file to use from_file

look for mysql_declare_plugin — it contains
type of plugin
descriptor (what’s different for different types)
name
author
description
init function (on load)
de-init function (on unload)
version
status variables

Chapter 28.2 in MySQL 5.1 documentation has a complete walk-through of all the structures.

automake using your new .am file, and make install, and then load the plugin.

on http://forge.mysql.com you can find plugins at Database software -> MySQL specific -> Plugins

SHOW PLUGINS shows name, status (ie, active), type (Storage Engine or fulltext parser) and library (filename, if blank, it’s built in).

To use,
CREATE TABLE t1 (file text, fulltext(file) with parser from_file);
insert into t1 values('etc/passwd'),('/etc/services');
select * from t1; shows we have filenames only in the table.
select * from t1 where match file against ('root'); will give the result of the filename.

If you try to uninstall a plugin that is on an open table, it will have a status of “deleted” but the table that’s open will still use it. flush tables or a closed connection, and now your table is invalid. πŸ™‚ So be careful when uninstalling plugins to find the tables using them FIRST, drop the tables, and then uninstall.

This plugin does not load the data in the file every time a query runs. This plugin should be able to handle a load_file() for a filename OR a filename. If you change the file and need to reindex, you have to do REPAIR TABLE.

——————————
I loved having the example, and this stuff seems so easy to just implement. Sure, it’s the featureset itself that’s difficult, but …
Jim Starkey

Falcon is based on the netfrastructure db engine
Netfrastructure has been deployed in mission critical apps for >4 years.
Extended and integrated into mysql environment.

What Falcon is NOT:
InnoDB clone
Firebird (open source derivative of Interbase db that Jim wrote years ago)
Firebird clone
Standalone Database Management System (it was, phlebologist inside of netfrastructure engine)
Netfrastructure (netfra is much more with jvm and search, urologist though these features may roll out later)

What Jim’s learned in 20 years
Disks are slower than CPU and memory than they were 25 years ago.
MVCC=Multi-generational concurrency control (how Jim named it, but someone changed it to “version”)
Putting record versions on disks are problematic
Web applications are better and for the future (religion) [I agree, though, for portability]
People have more important things to do than tune databases

Claim: Falcon is the engine design for the net 20 years.
Goals:
Exploit large memory for more than just a bigger cache
Use threads and processors for data migration
Eliminate tradeoffs, minimize tuning
Scale gracefully to use very heavy loads

Basic Architectural Model:
Incomplete in-memory db with backfill from disk
2 caches: 1) traditional LRU page cache for disk and 2) larger row cache with age group scavenging
Serial log for single write group commits — single write-ahead log.
Multi-version in memory, single version on disk
all transaction states are in memory with automatic overflow to disk
Data and indexes are 1 file plus log files (MySQL does not do this, but most other db servers do)
future: blob repositories (put them in different area of db); multiple page spaces

Basic model is MVCC. It will be extended for relaxed consistency (but why would you want to do that?!?!), and will be extended for serializable.

Index Implementation:
Btree index with prefix compression — no difference in performance with primary or secondary indexes.
No data except key in index
2-stage index retrievals — index scan generates row bitmap, so you can get from 2 indexes before going to rows; records are fetched from disk in physical row order

Data Flow:
Uncommitted row data is staged in memory
On commit, txn copies row data to serial log and written to disk (not committed until OS says the page was written)
post commit, dedicated thread copies row data from serial log to data pages
Page cache periodically flushed to disk (except blob data)
BLOB data is queued for write at blob creation, backed out on rollback — otherwise it wastes time putting it into log.

Data Reliability:
Physical structure protected by “careful write” — db on disk is ALWAYS valid and consistent; a page is written BEFORE the pointer to it is made. So worse comes to worse, you have an orphan page and NOT a null pointer. Orphaned pages will be found by
Atomicity protected by serial log — a transaction is committed when commit record hits the oxide.
The serial log is a “do log, for post commit data migration; “redo” log for post-crash recovery of data; “undo” log for post-crash resource recovery.

Jim’s Secret Agenda: not so secret anymore!
Replace varchar with string (varchar is an ABOMINATION left over from punch cards)
Replace tiny, small, medium, big ints with “number” (set limits if you want…)
Adopt a security model that’s useful for app servers (app server logs on to db server and THEN sets up security, but by then it’s too late. 3rd party client security control should have been put there 15 years ago)
Introduce useable row level security (filter sets, so querying does not accidentally give out the wrong info to the wrong people)
Teach the database world the merits of free context search, that everyone else already knows. (do you type a SELECT statement into Google?)

“The difference between theory and practice: in theory, there isn’t any difference.”

Weta Digital, order sales the company on Peter Jackson films
Heavenly Creatures (1994) nominated for
The Frighteners (1996)
in 1997, sovaldi sale Peter Jackston started working on King Kong, but Universal canned it because there were a lot of monster and disaster movies.
Contact (1997) with Robert Zemeckis — the zero-gravity ride was done by Weta Digital
Then they did the Lord of the Rings trilogy (2001-3)
Van Helsing (2004) (just a few bits and pieces)
I, Robot (2004) nominated for best digital effects — technology for armies in Lord of the Rings was used here.
King Kong (2005) and trying to get what Peter Jackson wanted.

(make a chart!)
Size of a movie is based on a shot (camera does not cut away). Visual effects movie typically had 500-1000 shots.

Year Movie Shots Processors
1994 Heavenly Creatures 30 1
1996 The Frighteners 450 2?
1997 Contact 48 60
2001 Fellowship of the Rings 450 384 more processors needed for the massive armies in the prelude, and all the CG creatures
2002 The Two Towers
760 1400 More armies, the Ents, more fantastic creatures including Gollem
2003 Return of the King 1400 3200 1/2 the movie, 90 minutes, was effects!

Kong 2005, made skull island, 1930’s New York was digitized, Peter Jackson gets seasick so all the water was digitally added.

300MB per second of film! how is it archived, after scanning the film & digitizing it? 1 petabyte of data for LOTR and King Kong — 5-6,000 tapes.

Artists get to work on the online copy. 120 Terabytes of storage isn’t enough to store all the data, so it’s copied from the archive to a server. 100G ethernet even to desktops. 700 linux workstations, win and mac boxes around too. 10G ethernet to connect rooms together [sic]. High-density blade servers. After visual effects are rendered, they have to be put back to film, using red, green and blue lasers and a spinning mirror to burn film.

Wow! Pretty neat.

He showed us the shots of the way they built New York, including the almost (or just, I forget) finished Empire State Building. Also, he had many shots of the studio and outdoors with green screens, and it was just fabulous. And how they animated Kong and how it was different from Gollem.

1999 Weta used MySQL 3.22, to be the backend an online recruiting system. Migrated from 3.23 to 4.0 in 2005, and to 5.0 in 2006. 5 production machines
10 replicas (replication)
100 dbs
thousands of tables
millions of rows

MySQL helps with
production management (who’s doing what)
HR System
User database and access control for all the different systems
System monitoring (nagios, internal tools, using MySQL as its backend)
Theater, conference rooms and event booking systems
Polls for employees
Online stores (for employees to buy movie swag)
Internal auction site

Why do they use it?
Simple
Reliable (hardware crashes, but db didn’t. No lost data to date)
Scalable

MySQL at Weta
The Cluster — persistent db connections from webservers are 2/3 of db connections into cluster. 50-100 cxns per second peak, up to 50MB/sec coming out. But they’re not running on high-end hardware, just using the old rendering hardware.
The Monster — one monster db. 40 cols, sparsely used, ENUMs that need to be updated all the time, 20 useless indexes. 750,000 rows, 2/3 are meaningless. No normalization.
The Work Horse — One db with dedicated hardware — the disk monitoring system. 30 file servers, every few hours they need to know updated disk space stats (because so much disk can be used by folks). Computing that stuff takes a lot of CPU. up to 3,000 queries/sec as it compares new data with stored data and updating if necessary.

ShotInfo
Tracks thousands of shots over multiple projects
Tracks all cuts and edit changes
Tracks all the plates and film rolls (so you can find a bit of film you want to recreate/duplicate)
Tracks assignments
Data originates from FileMaker, so normalization isn’t great, field names aren’t consistently named.
One way mirror

ShotSub
Key system
Shot review system
Tracks work in progress
Visual History
How they know where they are in a shot at a given time
35,000 submissions per month for King Kong!

Disk Space Management System
Load balances data
tracks data usage
looks like normal filesystsem — also must be cross-platform
Global Name Space Distributed File System
Transaction based (like a filesystem!)
Millions of allocations, thousands created per day.

Weta’s Future with MySQL:
refactor databases and code
More scalability, more reliability, and less simplicity.
Multi-Master clustering
Federated Database servers
64-bit platforms
Faster hardware
Weta Digital, order sales the company on Peter Jackson films
Heavenly Creatures (1994) nominated for
The Frighteners (1996)
in 1997, sovaldi sale Peter Jackston started working on King Kong, but Universal canned it because there were a lot of monster and disaster movies.
Contact (1997) with Robert Zemeckis — the zero-gravity ride was done by Weta Digital
Then they did the Lord of the Rings trilogy (2001-3)
Van Helsing (2004) (just a few bits and pieces)
I, Robot (2004) nominated for best digital effects — technology for armies in Lord of the Rings was used here.
King Kong (2005) and trying to get what Peter Jackson wanted.

(make a chart!)
Size of a movie is based on a shot (camera does not cut away). Visual effects movie typically had 500-1000 shots.

Year Movie Shots Processors
1994 Heavenly Creatures 30 1
1996 The Frighteners 450 2?
1997 Contact 48 60
2001 Fellowship of the Rings 450 384 more processors needed for the massive armies in the prelude, and all the CG creatures
2002 The Two Towers
760 1400 More armies, the Ents, more fantastic creatures including Gollem
2003 Return of the King 1400 3200 1/2 the movie, 90 minutes, was effects!

Kong 2005, made skull island, 1930’s New York was digitized, Peter Jackson gets seasick so all the water was digitally added.

300MB per second of film! how is it archived, after scanning the film & digitizing it? 1 petabyte of data for LOTR and King Kong — 5-6,000 tapes.

Artists get to work on the online copy. 120 Terabytes of storage isn’t enough to store all the data, so it’s copied from the archive to a server. 100G ethernet even to desktops. 700 linux workstations, win and mac boxes around too. 10G ethernet to connect rooms together [sic]. High-density blade servers. After visual effects are rendered, they have to be put back to film, using red, green and blue lasers and a spinning mirror to burn film.

Wow! Pretty neat.

He showed us the shots of the way they built New York, including the almost (or just, I forget) finished Empire State Building. Also, he had many shots of the studio and outdoors with green screens, and it was just fabulous. And how they animated Kong and how it was different from Gollem.

1999 Weta used MySQL 3.22, to be the backend an online recruiting system. Migrated from 3.23 to 4.0 in 2005, and to 5.0 in 2006. 5 production machines
10 replicas (replication)
100 dbs
thousands of tables
millions of rows

MySQL helps with
production management (who’s doing what)
HR System
User database and access control for all the different systems
System monitoring (nagios, internal tools, using MySQL as its backend)
Theater, conference rooms and event booking systems
Polls for employees
Online stores (for employees to buy movie swag)
Internal auction site

Why do they use it?
Simple
Reliable (hardware crashes, but db didn’t. No lost data to date)
Scalable

MySQL at Weta
The Cluster — persistent db connections from webservers are 2/3 of db connections into cluster. 50-100 cxns per second peak, up to 50MB/sec coming out. But they’re not running on high-end hardware, just using the old rendering hardware.
The Monster — one monster db. 40 cols, sparsely used, ENUMs that need to be updated all the time, 20 useless indexes. 750,000 rows, 2/3 are meaningless. No normalization.
The Work Horse — One db with dedicated hardware — the disk monitoring system. 30 file servers, every few hours they need to know updated disk space stats (because so much disk can be used by folks). Computing that stuff takes a lot of CPU. up to 3,000 queries/sec as it compares new data with stored data and updating if necessary.

ShotInfo
Tracks thousands of shots over multiple projects
Tracks all cuts and edit changes
Tracks all the plates and film rolls (so you can find a bit of film you want to recreate/duplicate)
Tracks assignments
Data originates from FileMaker, so normalization isn’t great, field names aren’t consistently named.
One way mirror

ShotSub
Key system
Shot review system
Tracks work in progress
Visual History
How they know where they are in a shot at a given time
35,000 submissions per month for King Kong!

Disk Space Management System
Load balances data
tracks data usage
looks like normal filesystsem — also must be cross-platform
Global Name Space Distributed File System
Transaction based (like a filesystem!)
Millions of allocations, thousands created per day.

Weta’s Future with MySQL:
refactor databases and code
More scalability, more reliability, and less simplicity.
Multi-Master clustering
Federated Database servers
64-bit platforms
Faster hardware
Many people have some kind of reporting or auditing on their database. The problem is that the data grows very large, geriatrician and lots of times there is data that can be purged. Sure, audiologist theoretically one never needs to purge data, but sometimes a “delete” flag just won’t work — when you search on the delete flag, a full table scan may be the most efficient way to go.

Of course, that’s not acceptable. And in many cases, say when you have users who no longer use the site but did in the past (and perhaps have billing data associated with them), you never want to get rid of them.

So what to do? Make a special reporting database, that gathers information from the production database(s). Use MyISAM tables, because a reporting server can afford to be behind the master, and MyISAM is better for reporting — better metadata. For something like a “Users” table, make 2 more tables:

1) DeletedUsers
2) AllUsers

Where Deleted Users is where you put information about the users you delete (something like INSERT INTO DeletedUsers SELECT * FROM Users WHERE [parameters of deletion] and then run DELETE FROM Users WHERE [parameters of deletion] on the master. On the reporting slave, make a MERGE table called “AllUsers” and run your reports from that, when you might need to gather historical table.

Thoughts?
Weta Digital, order sales the company on Peter Jackson films
Heavenly Creatures (1994) nominated for
The Frighteners (1996)
in 1997, sovaldi sale Peter Jackston started working on King Kong, but Universal canned it because there were a lot of monster and disaster movies.
Contact (1997) with Robert Zemeckis — the zero-gravity ride was done by Weta Digital
Then they did the Lord of the Rings trilogy (2001-3)
Van Helsing (2004) (just a few bits and pieces)
I, Robot (2004) nominated for best digital effects — technology for armies in Lord of the Rings was used here.
King Kong (2005) and trying to get what Peter Jackson wanted.

(make a chart!)
Size of a movie is based on a shot (camera does not cut away). Visual effects movie typically had 500-1000 shots.

Year Movie Shots Processors
1994 Heavenly Creatures 30 1
1996 The Frighteners 450 2?
1997 Contact 48 60
2001 Fellowship of the Rings 450 384 more processors needed for the massive armies in the prelude, and all the CG creatures
2002 The Two Towers
760 1400 More armies, the Ents, more fantastic creatures including Gollem
2003 Return of the King 1400 3200 1/2 the movie, 90 minutes, was effects!

Kong 2005, made skull island, 1930’s New York was digitized, Peter Jackson gets seasick so all the water was digitally added.

300MB per second of film! how is it archived, after scanning the film & digitizing it? 1 petabyte of data for LOTR and King Kong — 5-6,000 tapes.

Artists get to work on the online copy. 120 Terabytes of storage isn’t enough to store all the data, so it’s copied from the archive to a server. 100G ethernet even to desktops. 700 linux workstations, win and mac boxes around too. 10G ethernet to connect rooms together [sic]. High-density blade servers. After visual effects are rendered, they have to be put back to film, using red, green and blue lasers and a spinning mirror to burn film.

Wow! Pretty neat.

He showed us the shots of the way they built New York, including the almost (or just, I forget) finished Empire State Building. Also, he had many shots of the studio and outdoors with green screens, and it was just fabulous. And how they animated Kong and how it was different from Gollem.

1999 Weta used MySQL 3.22, to be the backend an online recruiting system. Migrated from 3.23 to 4.0 in 2005, and to 5.0 in 2006. 5 production machines
10 replicas (replication)
100 dbs
thousands of tables
millions of rows

MySQL helps with
production management (who’s doing what)
HR System
User database and access control for all the different systems
System monitoring (nagios, internal tools, using MySQL as its backend)
Theater, conference rooms and event booking systems
Polls for employees
Online stores (for employees to buy movie swag)
Internal auction site

Why do they use it?
Simple
Reliable (hardware crashes, but db didn’t. No lost data to date)
Scalable

MySQL at Weta
The Cluster — persistent db connections from webservers are 2/3 of db connections into cluster. 50-100 cxns per second peak, up to 50MB/sec coming out. But they’re not running on high-end hardware, just using the old rendering hardware.
The Monster — one monster db. 40 cols, sparsely used, ENUMs that need to be updated all the time, 20 useless indexes. 750,000 rows, 2/3 are meaningless. No normalization.
The Work Horse — One db with dedicated hardware — the disk monitoring system. 30 file servers, every few hours they need to know updated disk space stats (because so much disk can be used by folks). Computing that stuff takes a lot of CPU. up to 3,000 queries/sec as it compares new data with stored data and updating if necessary.

ShotInfo
Tracks thousands of shots over multiple projects
Tracks all cuts and edit changes
Tracks all the plates and film rolls (so you can find a bit of film you want to recreate/duplicate)
Tracks assignments
Data originates from FileMaker, so normalization isn’t great, field names aren’t consistently named.
One way mirror

ShotSub
Key system
Shot review system
Tracks work in progress
Visual History
How they know where they are in a shot at a given time
35,000 submissions per month for King Kong!

Disk Space Management System
Load balances data
tracks data usage
looks like normal filesystsem — also must be cross-platform
Global Name Space Distributed File System
Transaction based (like a filesystem!)
Millions of allocations, thousands created per day.

Weta’s Future with MySQL:
refactor databases and code
More scalability, more reliability, and less simplicity.
Multi-Master clustering
Federated Database servers
64-bit platforms
Faster hardware
Many people have some kind of reporting or auditing on their database. The problem is that the data grows very large, geriatrician and lots of times there is data that can be purged. Sure, audiologist theoretically one never needs to purge data, but sometimes a “delete” flag just won’t work — when you search on the delete flag, a full table scan may be the most efficient way to go.

Of course, that’s not acceptable. And in many cases, say when you have users who no longer use the site but did in the past (and perhaps have billing data associated with them), you never want to get rid of them.

So what to do? Make a special reporting database, that gathers information from the production database(s). Use MyISAM tables, because a reporting server can afford to be behind the master, and MyISAM is better for reporting — better metadata. For something like a “Users” table, make 2 more tables:

1) DeletedUsers
2) AllUsers

Where Deleted Users is where you put information about the users you delete (something like INSERT INTO DeletedUsers SELECT * FROM Users WHERE [parameters of deletion] and then run DELETE FROM Users WHERE [parameters of deletion] on the master. On the reporting slave, make a MERGE table called “AllUsers” and run your reports from that, when you might need to gather historical table.

Thoughts?
Our site went from weekly crashes during our two busiest nights to not even peeping this week (during the two busiest nights), order and the only thing we changed was that we did some table maintenance. We hadn’t done table maintenance at least as long as I’ve been around, order which is 6 months. We are a site with high volumes of both reads and writes. This article will talk about the care and feeding of tables; feel free to use this for justification to have a maintenance window, read or even permission to run table maintenance statements.

MySQL uses a cost-based optimizer to best translate the written query into what actually happens. This means when you write:

SELECT foo FROM t1 INNER JOIN t2 USING (commonField);

The optimizer looks at the statistics for tables t1 and t2 and decides which is better:
1) To go through each item in t1, looking for a matching “commonField” in t2
or
2) To go through each item in t2, looking for a matching “commonField” t1

If t1 is very large and t2 is very small, it makes sense to follow plan #2. This is a simplified example, of course.

From the documentation:
ANALYZE TABLE analyzes and stores the key distribution for a table. During the analysis, the table is locked with a read lock. This statement works with MyISAM, BDB, and InnoDB tables.

If the key distribution is off, the optimizer will be using incorrect (out-of-date) information. Therefore, the optimizations it makes will not be…well, optimal.

ANALYZE TABLE takes a very short amount of time — less than a second for even a million rows. I tested with InnoDB and MyISAM, but I’d guess that BDB is the same. Our database of 14G took less than a minute to analyze all 112 tables in 3 datases.

Documentation: http://dev.mysql.com/doc/refman/5.0/en/analyze-table.html

CHECK TABLE check tables and views for incorrectly closed tables, incorrect or deleted links, and verified checksums for the rows. It can also check for full consistency.

This takes a bit — checking our tables for everything (but not checking for full consistency, as it takes longer) took 11 minutes (14G, 112 tables in 3 databases). Next month I will run a CHECK EXTENDED and see how long this takes.

Documentation: http://dev.mysql.com/doc/refman/5.0/en/check-table.html

And the daddy of them all:
OPTIMIZE TABLE can be used on MyISAM, BDB and InnoDB tables. In MyISAM tables, it repairs deleted or split rows, updates index statistics, and sorts the index pages. For InnoDB and BDB, OPTIMIZE TABLE maps to ALTER TABLE and just rebuilds the index, thereby getting rid of defragmentation, corruption and incorrect statistics.

Documentation:
http://dev.mysql.com/doc/refman/5.0/en/optimize-table.html

This took 36 minutes on our (14G, 112 tables in 3 databases) server.

From the documentation:
In most setups, you need not run OPTIMIZE TABLE at all. Even if you do a lot of updates to variable-length rows, it is not likely that you need to do this more than once a week or month and only on certain tables.

This morning, we backed up our data. Then we ran ANALYZE TABLE on all of our tables first. Sure, OPTIMIZE TABLE performs the same function, but OPTIMIZE TABLE takes a long time, and we wanted the benefit of ANALYZE TABLE right away. Plus, if anything failed, at least the table’s index statistics are up-to-date. Then we ran CHECK TABLE, and then OPTIMIZE TABLE on each table. We did this while running live, and as I said, the entire process took less than an hour.

Actually, it took 40 minutes, because once the script I had running to CHECK TABLE was halfway through, I started the OPTIMIZE TABLE script. I specifically set the tables to run through in alphabetical order, so there was no chance of the scripts trying to run on the same table. I will not do that again, I will just run them serially for safety’s sake.
Weta Digital, order sales the company on Peter Jackson films
Heavenly Creatures (1994) nominated for
The Frighteners (1996)
in 1997, sovaldi sale Peter Jackston started working on King Kong, but Universal canned it because there were a lot of monster and disaster movies.
Contact (1997) with Robert Zemeckis — the zero-gravity ride was done by Weta Digital
Then they did the Lord of the Rings trilogy (2001-3)
Van Helsing (2004) (just a few bits and pieces)
I, Robot (2004) nominated for best digital effects — technology for armies in Lord of the Rings was used here.
King Kong (2005) and trying to get what Peter Jackson wanted.

(make a chart!)
Size of a movie is based on a shot (camera does not cut away). Visual effects movie typically had 500-1000 shots.

Year Movie Shots Processors
1994 Heavenly Creatures 30 1
1996 The Frighteners 450 2?
1997 Contact 48 60
2001 Fellowship of the Rings 450 384 more processors needed for the massive armies in the prelude, and all the CG creatures
2002 The Two Towers
760 1400 More armies, the Ents, more fantastic creatures including Gollem
2003 Return of the King 1400 3200 1/2 the movie, 90 minutes, was effects!

Kong 2005, made skull island, 1930’s New York was digitized, Peter Jackson gets seasick so all the water was digitally added.

300MB per second of film! how is it archived, after scanning the film & digitizing it? 1 petabyte of data for LOTR and King Kong — 5-6,000 tapes.

Artists get to work on the online copy. 120 Terabytes of storage isn’t enough to store all the data, so it’s copied from the archive to a server. 100G ethernet even to desktops. 700 linux workstations, win and mac boxes around too. 10G ethernet to connect rooms together [sic]. High-density blade servers. After visual effects are rendered, they have to be put back to film, using red, green and blue lasers and a spinning mirror to burn film.

Wow! Pretty neat.

He showed us the shots of the way they built New York, including the almost (or just, I forget) finished Empire State Building. Also, he had many shots of the studio and outdoors with green screens, and it was just fabulous. And how they animated Kong and how it was different from Gollem.

1999 Weta used MySQL 3.22, to be the backend an online recruiting system. Migrated from 3.23 to 4.0 in 2005, and to 5.0 in 2006. 5 production machines
10 replicas (replication)
100 dbs
thousands of tables
millions of rows

MySQL helps with
production management (who’s doing what)
HR System
User database and access control for all the different systems
System monitoring (nagios, internal tools, using MySQL as its backend)
Theater, conference rooms and event booking systems
Polls for employees
Online stores (for employees to buy movie swag)
Internal auction site

Why do they use it?
Simple
Reliable (hardware crashes, but db didn’t. No lost data to date)
Scalable

MySQL at Weta
The Cluster — persistent db connections from webservers are 2/3 of db connections into cluster. 50-100 cxns per second peak, up to 50MB/sec coming out. But they’re not running on high-end hardware, just using the old rendering hardware.
The Monster — one monster db. 40 cols, sparsely used, ENUMs that need to be updated all the time, 20 useless indexes. 750,000 rows, 2/3 are meaningless. No normalization.
The Work Horse — One db with dedicated hardware — the disk monitoring system. 30 file servers, every few hours they need to know updated disk space stats (because so much disk can be used by folks). Computing that stuff takes a lot of CPU. up to 3,000 queries/sec as it compares new data with stored data and updating if necessary.

ShotInfo
Tracks thousands of shots over multiple projects
Tracks all cuts and edit changes
Tracks all the plates and film rolls (so you can find a bit of film you want to recreate/duplicate)
Tracks assignments
Data originates from FileMaker, so normalization isn’t great, field names aren’t consistently named.
One way mirror

ShotSub
Key system
Shot review system
Tracks work in progress
Visual History
How they know where they are in a shot at a given time
35,000 submissions per month for King Kong!

Disk Space Management System
Load balances data
tracks data usage
looks like normal filesystsem — also must be cross-platform
Global Name Space Distributed File System
Transaction based (like a filesystem!)
Millions of allocations, thousands created per day.

Weta’s Future with MySQL:
refactor databases and code
More scalability, more reliability, and less simplicity.
Multi-Master clustering
Federated Database servers
64-bit platforms
Faster hardware
Many people have some kind of reporting or auditing on their database. The problem is that the data grows very large, geriatrician and lots of times there is data that can be purged. Sure, audiologist theoretically one never needs to purge data, but sometimes a “delete” flag just won’t work — when you search on the delete flag, a full table scan may be the most efficient way to go.

Of course, that’s not acceptable. And in many cases, say when you have users who no longer use the site but did in the past (and perhaps have billing data associated with them), you never want to get rid of them.

So what to do? Make a special reporting database, that gathers information from the production database(s). Use MyISAM tables, because a reporting server can afford to be behind the master, and MyISAM is better for reporting — better metadata. For something like a “Users” table, make 2 more tables:

1) DeletedUsers
2) AllUsers

Where Deleted Users is where you put information about the users you delete (something like INSERT INTO DeletedUsers SELECT * FROM Users WHERE [parameters of deletion] and then run DELETE FROM Users WHERE [parameters of deletion] on the master. On the reporting slave, make a MERGE table called “AllUsers” and run your reports from that, when you might need to gather historical table.

Thoughts?
Our site went from weekly crashes during our two busiest nights to not even peeping this week (during the two busiest nights), order and the only thing we changed was that we did some table maintenance. We hadn’t done table maintenance at least as long as I’ve been around, order which is 6 months. We are a site with high volumes of both reads and writes. This article will talk about the care and feeding of tables; feel free to use this for justification to have a maintenance window, read or even permission to run table maintenance statements.

MySQL uses a cost-based optimizer to best translate the written query into what actually happens. This means when you write:

SELECT foo FROM t1 INNER JOIN t2 USING (commonField);

The optimizer looks at the statistics for tables t1 and t2 and decides which is better:
1) To go through each item in t1, looking for a matching “commonField” in t2
or
2) To go through each item in t2, looking for a matching “commonField” t1

If t1 is very large and t2 is very small, it makes sense to follow plan #2. This is a simplified example, of course.

From the documentation:
ANALYZE TABLE analyzes and stores the key distribution for a table. During the analysis, the table is locked with a read lock. This statement works with MyISAM, BDB, and InnoDB tables.

If the key distribution is off, the optimizer will be using incorrect (out-of-date) information. Therefore, the optimizations it makes will not be…well, optimal.

ANALYZE TABLE takes a very short amount of time — less than a second for even a million rows. I tested with InnoDB and MyISAM, but I’d guess that BDB is the same. Our database of 14G took less than a minute to analyze all 112 tables in 3 datases.

Documentation: http://dev.mysql.com/doc/refman/5.0/en/analyze-table.html

CHECK TABLE check tables and views for incorrectly closed tables, incorrect or deleted links, and verified checksums for the rows. It can also check for full consistency.

This takes a bit — checking our tables for everything (but not checking for full consistency, as it takes longer) took 11 minutes (14G, 112 tables in 3 databases). Next month I will run a CHECK EXTENDED and see how long this takes.

Documentation: http://dev.mysql.com/doc/refman/5.0/en/check-table.html

And the daddy of them all:
OPTIMIZE TABLE can be used on MyISAM, BDB and InnoDB tables. In MyISAM tables, it repairs deleted or split rows, updates index statistics, and sorts the index pages. For InnoDB and BDB, OPTIMIZE TABLE maps to ALTER TABLE and just rebuilds the index, thereby getting rid of defragmentation, corruption and incorrect statistics.

Documentation:
http://dev.mysql.com/doc/refman/5.0/en/optimize-table.html

This took 36 minutes on our (14G, 112 tables in 3 databases) server.

From the documentation:
In most setups, you need not run OPTIMIZE TABLE at all. Even if you do a lot of updates to variable-length rows, it is not likely that you need to do this more than once a week or month and only on certain tables.

This morning, we backed up our data. Then we ran ANALYZE TABLE on all of our tables first. Sure, OPTIMIZE TABLE performs the same function, but OPTIMIZE TABLE takes a long time, and we wanted the benefit of ANALYZE TABLE right away. Plus, if anything failed, at least the table’s index statistics are up-to-date. Then we ran CHECK TABLE, and then OPTIMIZE TABLE on each table. We did this while running live, and as I said, the entire process took less than an hour.

Actually, it took 40 minutes, because once the script I had running to CHECK TABLE was halfway through, I started the OPTIMIZE TABLE script. I specifically set the tables to run through in alphabetical order, so there was no chance of the scripts trying to run on the same table. I will not do that again, I will just run them serially for safety’s sake.
So, treatment the “MySQL Gotchas” page was mentioned in one of the talks at the conference last week. The page itself is at:

http://sql-info.de/mysql/gotchas.html

Now, to go through it all…..
This is probably a dumb question, diagnosis pills but I’ll put it forth anyway. Is there a routine or easy way to limit the number of items in a group? What I want to do is limit the number of items in a group to no more than a certain number. For instance, the last 10 times someone logged in.

I’m thinking of a routine that takes in field1, ed field2, # limit, and then an optional keyword of {FIRST,LAST} and maybe an optional WHERE clause. So in an example, the routine would take in:

uid
lastLoginTime
10
FIRST
uid=12345

and the routine would find the number of times uid 12345 logged in. If it’s less than or equal to 10, leave it alone. If it’s greater than 10, delete it so it gets to 10, deleting the oldest records first.

This is not something that could be done with a trigger (ie, on insert of a new login, check to see how many logins there are, and if there are 10 delete the first (oldest) one) because in our case, it’s done when we take away privileges from someone. We pare down their friends list, or reduce the # of images they’re allowed, etc. We usually do a loop with code, but I’d rather have a stored routine do it. (bonus points if I can use it in MySQL 4.1.12).
This is probably a dumb question, diagnosis pills but I’ll put it forth anyway. Is there a routine or easy way to limit the number of items in a group? What I want to do is limit the number of items in a group to no more than a certain number. For instance, the last 10 times someone logged in.

I’m thinking of a routine that takes in field1, ed field2, # limit, and then an optional keyword of {FIRST,LAST} and maybe an optional WHERE clause. So in an example, the routine would take in:

uid
lastLoginTime
10
FIRST
uid=12345

and the routine would find the number of times uid 12345 logged in. If it’s less than or equal to 10, leave it alone. If it’s greater than 10, delete it so it gets to 10, deleting the oldest records first.

This is not something that could be done with a trigger (ie, on insert of a new login, check to see how many logins there are, and if there are 10 delete the first (oldest) one) because in our case, it’s done when we take away privileges from someone. We pare down their friends list, or reduce the # of images they’re allowed, etc. We usually do a loop with code, but I’d rather have a stored routine do it. (bonus points if I can use it in MySQL 4.1.12).
One image I shot on the last day, global burden of disease which is quite perfect:

(Click on the link for a larger image)
SHOW ENGINE INNODB STATUS shows current InnoDB status, this discount I got my answer:

http://dev.mysql.com/doc/refman/5.0/en/innodb-monitor.html

To cause the standard InnoDB Monitor to write to the standard output of mysqld, use the following SQL statement:

CREATE TABLE innodb_monitor (a INT) ENGINE=INNODB;

The monitor can be stopped by issuing the following statement:

DROP TABLE innodb_monitor;

SHOW ENGINE INNODB STATUS shows current InnoDB status, this discount I got my answer:

http://dev.mysql.com/doc/refman/5.0/en/innodb-monitor.html

To cause the standard InnoDB Monitor to write to the standard output of mysqld, use the following SQL statement:

CREATE TABLE innodb_monitor (a INT) ENGINE=INNODB;

The monitor can be stopped by issuing the following statement:

DROP TABLE innodb_monitor;

On the forge:
Simple E-mail address validator

This stored procedure is a simple e-mail address validator — it makes sure the e-mail address is in the format word@word.word, oncology and makes sure there are no special characters:
( ) <> @ , ; : . [ ] */

I allow ” because technically you can have “word”@word.com.

Folks can easily add to this snippet to make it fully compliant with RFC822 if they want. (I got bored and didn’t really feel like adding all that other stuff in. πŸ™‚ )

Comments are closed.