Category Archives: OurSQL Database Podcast

Log Buffer #72 — a Carnival of the Vanities for DBAs

Does anyone have any ways they create an API for their stored routines (functions and procedures)? Currently it seems as though I have to parse the CREATE statement to get the input variables….Has anyone else done this? Is it in any third party tools?
Does anyone have any ways they create an API for their stored routines (functions and procedures)? Currently it seems as though I have to parse the CREATE statement to get the input variables….Has anyone else done this? Is it in any third party tools?
Peter makes an interesting post about the MySQL company’s trademarks at http://www.mysqlperformanceblog.com/2007/10/26/mysql-support-or-support-for-mysql-mysql-trademark-policies/

The point is that Peter is not selling “MySQL Support” — he is selling “Support *for* MySQL”. “MySQL Support” is the name of a product that MySQL offers. Even if some other consulting company used the name before the MySQL company ever did, more about MySQL still has the rights to the name.

I chose to name my podcast “OurSQL: The MySQL Database Podcast for the Community, neurologist By the Community.” I chose every word carefully. For instance, abortion I call it “the MySQL Database Podcast” so that anyone looking for a podcast about “database” will find it.

I could have just called it “MySQL Podcast”. But if the company MySQL (AB or Inc) ever makes a podcast, they would do the same thing to me that they do to you. I have to distinguish it’s “a podcast about MySQL”, not “MySQL’s podcast”. In fact if you look for “oursql” references, there is actually software released in September 2001 called “oursql”, but it was only released once and I have only ever found a handful of e-mails about it.

Similarly with “Technocation, Inc”. I googled around for it and found that a Baltimore, MD USA paper has a column called “technocation”, and it’s similar to why I picked the name — technology + education. But there’s no way anyone would confuse the two.

Same thing as MySQL would do if they made their own toolkit. “MySQL Toolkit” is in fact a really bad name because it’s so generic. Right now there’s no confusion, because MySQL doesn’t have a toolkit. Same with the “MySQL Magazine”. If MySQL ever puts out a magazine, they’ll send a letter right away. I was actually worried that “The MySQL Guy Podcast” at http://www.themysqlguy.com/ would get a letter from them. After all, there are plenty of “MySQL guys” out there, and he doesn’t work for the company……(hence why I’m the “She-BA”, not “MySQL Gal”).

In fact, Microsoft seems to do this on purpose. They named their database engine “SQL Server”. I’ve been frustrated when I get Microsoft pages when I’m just looking for “something relating to SQL”. I’d much rather get something related to the SQL standard. Same with their “Windows Mobile” platform. Check out their list of servers on the right-hand side of the page at http://www.microsoft.com/servers/default.mspx — if you’re looking for “security” on a Windows server, chances are most of your search result will be for the “Security Server” that Microsoft offers. Ditto with “Content Management Server” and “Data Protection Manager” and “Speech Server” and “Virtual Server” and “Small Business Server”…etc.

If you have questions about Intellectual Property (IP) or Patents in the United States, I highly recommend retaining services from the law firm of Bakos and Kritzer — http://www.bakoskritzer.com/. It’s not just a law firm where my brother is a partner, it’s also a damn good one.

(Speaking of name, I will likely be changing my name in the near future to “Sheeri K. Cabral”, so if you see it around, don’t get confused. You can always find me at www.sheeri.com )
Does anyone have any ways they create an API for their stored routines (functions and procedures)? Currently it seems as though I have to parse the CREATE statement to get the input variables….Has anyone else done this? Is it in any third party tools?
Peter makes an interesting post about the MySQL company’s trademarks at http://www.mysqlperformanceblog.com/2007/10/26/mysql-support-or-support-for-mysql-mysql-trademark-policies/

The point is that Peter is not selling “MySQL Support” — he is selling “Support *for* MySQL”. “MySQL Support” is the name of a product that MySQL offers. Even if some other consulting company used the name before the MySQL company ever did, more about MySQL still has the rights to the name.

I chose to name my podcast “OurSQL: The MySQL Database Podcast for the Community, neurologist By the Community.” I chose every word carefully. For instance, abortion I call it “the MySQL Database Podcast” so that anyone looking for a podcast about “database” will find it.

I could have just called it “MySQL Podcast”. But if the company MySQL (AB or Inc) ever makes a podcast, they would do the same thing to me that they do to you. I have to distinguish it’s “a podcast about MySQL”, not “MySQL’s podcast”. In fact if you look for “oursql” references, there is actually software released in September 2001 called “oursql”, but it was only released once and I have only ever found a handful of e-mails about it.

Similarly with “Technocation, Inc”. I googled around for it and found that a Baltimore, MD USA paper has a column called “technocation”, and it’s similar to why I picked the name — technology + education. But there’s no way anyone would confuse the two.

Same thing as MySQL would do if they made their own toolkit. “MySQL Toolkit” is in fact a really bad name because it’s so generic. Right now there’s no confusion, because MySQL doesn’t have a toolkit. Same with the “MySQL Magazine”. If MySQL ever puts out a magazine, they’ll send a letter right away. I was actually worried that “The MySQL Guy Podcast” at http://www.themysqlguy.com/ would get a letter from them. After all, there are plenty of “MySQL guys” out there, and he doesn’t work for the company……(hence why I’m the “She-BA”, not “MySQL Gal”).

In fact, Microsoft seems to do this on purpose. They named their database engine “SQL Server”. I’ve been frustrated when I get Microsoft pages when I’m just looking for “something relating to SQL”. I’d much rather get something related to the SQL standard. Same with their “Windows Mobile” platform. Check out their list of servers on the right-hand side of the page at http://www.microsoft.com/servers/default.mspx — if you’re looking for “security” on a Windows server, chances are most of your search result will be for the “Security Server” that Microsoft offers. Ditto with “Content Management Server” and “Data Protection Manager” and “Speech Server” and “Virtual Server” and “Small Business Server”…etc.

If you have questions about Intellectual Property (IP) or Patents in the United States, I highly recommend retaining services from the law firm of Bakos and Kritzer — http://www.bakoskritzer.com/. It’s not just a law firm where my brother is a partner, it’s also a damn good one.

(Speaking of name, I will likely be changing my name in the near future to “Sheeri K. Cabral”, so if you see it around, don’t get confused. You can always find me at www.sheeri.com )
So, sickness the article at:

http://mysql-dba-journey.blogspot.com/2007/11/mysql-and-vmware.html says:

Don’t get seduced to the dark side unless you understand all the issues.

And that’s wonderful and all, dosage but….what are all the issues? What are some of the issues? Is it related more to VMware, generic or more to MySQL, or more to MySQL on VMware? Is it something like “VMware isn’t stable” or more like “load testing on vmware isn’t always going to work because you won’t have full resources”?

Many people talk about using virtualization for development and testing….but if you develop and test on a virtual machine and then put it on a physical machine for production, isn’t that basically having differing environments for dev/testing and production, which is usually seen as bad? If a line of code crashes a virtual machine but is fine on production, is it worth tracking the bug down? How many hours will you spend doing that?

Also, how is using a virtual machine better/worse/different from using something like mysqld_multi on a machine with many IPs, or other strategies folks use in dev/test so they don’t have to buy the exact same hardware as in production, but still have the same separation of databases, etc?
Does anyone have any ways they create an API for their stored routines (functions and procedures)? Currently it seems as though I have to parse the CREATE statement to get the input variables….Has anyone else done this? Is it in any third party tools?
Peter makes an interesting post about the MySQL company’s trademarks at http://www.mysqlperformanceblog.com/2007/10/26/mysql-support-or-support-for-mysql-mysql-trademark-policies/

The point is that Peter is not selling “MySQL Support” — he is selling “Support *for* MySQL”. “MySQL Support” is the name of a product that MySQL offers. Even if some other consulting company used the name before the MySQL company ever did, more about MySQL still has the rights to the name.

I chose to name my podcast “OurSQL: The MySQL Database Podcast for the Community, neurologist By the Community.” I chose every word carefully. For instance, abortion I call it “the MySQL Database Podcast” so that anyone looking for a podcast about “database” will find it.

I could have just called it “MySQL Podcast”. But if the company MySQL (AB or Inc) ever makes a podcast, they would do the same thing to me that they do to you. I have to distinguish it’s “a podcast about MySQL”, not “MySQL’s podcast”. In fact if you look for “oursql” references, there is actually software released in September 2001 called “oursql”, but it was only released once and I have only ever found a handful of e-mails about it.

Similarly with “Technocation, Inc”. I googled around for it and found that a Baltimore, MD USA paper has a column called “technocation”, and it’s similar to why I picked the name — technology + education. But there’s no way anyone would confuse the two.

Same thing as MySQL would do if they made their own toolkit. “MySQL Toolkit” is in fact a really bad name because it’s so generic. Right now there’s no confusion, because MySQL doesn’t have a toolkit. Same with the “MySQL Magazine”. If MySQL ever puts out a magazine, they’ll send a letter right away. I was actually worried that “The MySQL Guy Podcast” at http://www.themysqlguy.com/ would get a letter from them. After all, there are plenty of “MySQL guys” out there, and he doesn’t work for the company……(hence why I’m the “She-BA”, not “MySQL Gal”).

In fact, Microsoft seems to do this on purpose. They named their database engine “SQL Server”. I’ve been frustrated when I get Microsoft pages when I’m just looking for “something relating to SQL”. I’d much rather get something related to the SQL standard. Same with their “Windows Mobile” platform. Check out their list of servers on the right-hand side of the page at http://www.microsoft.com/servers/default.mspx — if you’re looking for “security” on a Windows server, chances are most of your search result will be for the “Security Server” that Microsoft offers. Ditto with “Content Management Server” and “Data Protection Manager” and “Speech Server” and “Virtual Server” and “Small Business Server”…etc.

If you have questions about Intellectual Property (IP) or Patents in the United States, I highly recommend retaining services from the law firm of Bakos and Kritzer — http://www.bakoskritzer.com/. It’s not just a law firm where my brother is a partner, it’s also a damn good one.

(Speaking of name, I will likely be changing my name in the near future to “Sheeri K. Cabral”, so if you see it around, don’t get confused. You can always find me at www.sheeri.com )
So, sickness the article at:

http://mysql-dba-journey.blogspot.com/2007/11/mysql-and-vmware.html says:

Don’t get seduced to the dark side unless you understand all the issues.

And that’s wonderful and all, dosage but….what are all the issues? What are some of the issues? Is it related more to VMware, generic or more to MySQL, or more to MySQL on VMware? Is it something like “VMware isn’t stable” or more like “load testing on vmware isn’t always going to work because you won’t have full resources”?

Many people talk about using virtualization for development and testing….but if you develop and test on a virtual machine and then put it on a physical machine for production, isn’t that basically having differing environments for dev/testing and production, which is usually seen as bad? If a line of code crashes a virtual machine but is fine on production, is it worth tracking the bug down? How many hours will you spend doing that?

Also, how is using a virtual machine better/worse/different from using something like mysqld_multi on a machine with many IPs, or other strategies folks use in dev/test so they don’t have to buy the exact same hardware as in production, but still have the same separation of databases, etc?
So, sanitary O’Reilly’s ONLamp.com has published the “Top 10 MySQL Best Practices” at http://www.onlamp.com/pub/a/onlamp/2002/07/11/MySQLtips.html. Sadly, I find most “best practice” list do not thoroughly explain the “why” enough so that people can make their own decisions.

For instance, #3 is “Protect the MySQL installation directory from access by other users.” I was intrigued at what they would consider the “installation” directory. By reading the tip, they actually mean the data directory. They say nothing of the log directory, nor that innodb data files may be in different places than the standard myisam data directories.

They perpetuate a myth in #4, “Don’t store binary data in MySQL.” What they really mean is “don’t store large data in MySQL”, which they go into in the tip. While it’s true that there is very little benefit to having binary data in a database, they don’t go into what those benefits are. This means that people can’t make informed decisions, just “the best practice is this so I’m doing it.”

The benefit of putting binary data in MySQL is to be able to associate metadata and other data. For instance, “user 200 owns file 483”. If user 200 is gone from the system, how can you make sure file 483 is as well? There’s no referential integrity unless it’s in the database. While it’s true that in most cases people would rather sacrifice the referential integrity for things like faster database backups and easier partitioning of large data objects, I believe in giving people full disclosure so they can make their own informed decision.

#5 is my biggest pet peeve. “Stick to ANSI SQL,” with the goal being to be able to migrate to a different platform without having to rewrite the code. Does anyone tell Oracle folks not to use pl/sql like collections? Nobody says “SQL is a declarative language, pl/sql is procedural therefore you should never use it”. How about SQL Server folks not to use transact-sql statements like WAITFOR? MATCH… AGAINST is not standard SQL, so I should never use it?

Now, of course, if you’re selling a product to be run on different database platforms, then sure, you want to be platform agnostic. But you’d know that from the start. And if you have to migrate platforms you’re going to have to do lots of work anyway, because there are third-party additions to all the software any way.

And why would *anyone* choose a specific database, and then *not* use those features? I think that it’s a good tip to stick to ANSI SQL if you *know* you want to, or if you have no idea about the DBMS you’re using.

If you want to see how this cripples MySQL, check out Visibone’s SQL chart at: http://www.visibone.com/sql/chart_1200.jpg — you can buy it here: http://sheeri.com/archives/104. I may post later on about my own personal MySQL Best Practices….
Does anyone have any ways they create an API for their stored routines (functions and procedures)? Currently it seems as though I have to parse the CREATE statement to get the input variables….Has anyone else done this? Is it in any third party tools?
Peter makes an interesting post about the MySQL company’s trademarks at http://www.mysqlperformanceblog.com/2007/10/26/mysql-support-or-support-for-mysql-mysql-trademark-policies/

The point is that Peter is not selling “MySQL Support” — he is selling “Support *for* MySQL”. “MySQL Support” is the name of a product that MySQL offers. Even if some other consulting company used the name before the MySQL company ever did, more about MySQL still has the rights to the name.

I chose to name my podcast “OurSQL: The MySQL Database Podcast for the Community, neurologist By the Community.” I chose every word carefully. For instance, abortion I call it “the MySQL Database Podcast” so that anyone looking for a podcast about “database” will find it.

I could have just called it “MySQL Podcast”. But if the company MySQL (AB or Inc) ever makes a podcast, they would do the same thing to me that they do to you. I have to distinguish it’s “a podcast about MySQL”, not “MySQL’s podcast”. In fact if you look for “oursql” references, there is actually software released in September 2001 called “oursql”, but it was only released once and I have only ever found a handful of e-mails about it.

Similarly with “Technocation, Inc”. I googled around for it and found that a Baltimore, MD USA paper has a column called “technocation”, and it’s similar to why I picked the name — technology + education. But there’s no way anyone would confuse the two.

Same thing as MySQL would do if they made their own toolkit. “MySQL Toolkit” is in fact a really bad name because it’s so generic. Right now there’s no confusion, because MySQL doesn’t have a toolkit. Same with the “MySQL Magazine”. If MySQL ever puts out a magazine, they’ll send a letter right away. I was actually worried that “The MySQL Guy Podcast” at http://www.themysqlguy.com/ would get a letter from them. After all, there are plenty of “MySQL guys” out there, and he doesn’t work for the company……(hence why I’m the “She-BA”, not “MySQL Gal”).

In fact, Microsoft seems to do this on purpose. They named their database engine “SQL Server”. I’ve been frustrated when I get Microsoft pages when I’m just looking for “something relating to SQL”. I’d much rather get something related to the SQL standard. Same with their “Windows Mobile” platform. Check out their list of servers on the right-hand side of the page at http://www.microsoft.com/servers/default.mspx — if you’re looking for “security” on a Windows server, chances are most of your search result will be for the “Security Server” that Microsoft offers. Ditto with “Content Management Server” and “Data Protection Manager” and “Speech Server” and “Virtual Server” and “Small Business Server”…etc.

If you have questions about Intellectual Property (IP) or Patents in the United States, I highly recommend retaining services from the law firm of Bakos and Kritzer — http://www.bakoskritzer.com/. It’s not just a law firm where my brother is a partner, it’s also a damn good one.

(Speaking of name, I will likely be changing my name in the near future to “Sheeri K. Cabral”, so if you see it around, don’t get confused. You can always find me at www.sheeri.com )
So, sickness the article at:

http://mysql-dba-journey.blogspot.com/2007/11/mysql-and-vmware.html says:

Don’t get seduced to the dark side unless you understand all the issues.

And that’s wonderful and all, dosage but….what are all the issues? What are some of the issues? Is it related more to VMware, generic or more to MySQL, or more to MySQL on VMware? Is it something like “VMware isn’t stable” or more like “load testing on vmware isn’t always going to work because you won’t have full resources”?

Many people talk about using virtualization for development and testing….but if you develop and test on a virtual machine and then put it on a physical machine for production, isn’t that basically having differing environments for dev/testing and production, which is usually seen as bad? If a line of code crashes a virtual machine but is fine on production, is it worth tracking the bug down? How many hours will you spend doing that?

Also, how is using a virtual machine better/worse/different from using something like mysqld_multi on a machine with many IPs, or other strategies folks use in dev/test so they don’t have to buy the exact same hardware as in production, but still have the same separation of databases, etc?
So, sanitary O’Reilly’s ONLamp.com has published the “Top 10 MySQL Best Practices” at http://www.onlamp.com/pub/a/onlamp/2002/07/11/MySQLtips.html. Sadly, I find most “best practice” list do not thoroughly explain the “why” enough so that people can make their own decisions.

For instance, #3 is “Protect the MySQL installation directory from access by other users.” I was intrigued at what they would consider the “installation” directory. By reading the tip, they actually mean the data directory. They say nothing of the log directory, nor that innodb data files may be in different places than the standard myisam data directories.

They perpetuate a myth in #4, “Don’t store binary data in MySQL.” What they really mean is “don’t store large data in MySQL”, which they go into in the tip. While it’s true that there is very little benefit to having binary data in a database, they don’t go into what those benefits are. This means that people can’t make informed decisions, just “the best practice is this so I’m doing it.”

The benefit of putting binary data in MySQL is to be able to associate metadata and other data. For instance, “user 200 owns file 483”. If user 200 is gone from the system, how can you make sure file 483 is as well? There’s no referential integrity unless it’s in the database. While it’s true that in most cases people would rather sacrifice the referential integrity for things like faster database backups and easier partitioning of large data objects, I believe in giving people full disclosure so they can make their own informed decision.

#5 is my biggest pet peeve. “Stick to ANSI SQL,” with the goal being to be able to migrate to a different platform without having to rewrite the code. Does anyone tell Oracle folks not to use pl/sql like collections? Nobody says “SQL is a declarative language, pl/sql is procedural therefore you should never use it”. How about SQL Server folks not to use transact-sql statements like WAITFOR? MATCH… AGAINST is not standard SQL, so I should never use it?

Now, of course, if you’re selling a product to be run on different database platforms, then sure, you want to be platform agnostic. But you’d know that from the start. And if you have to migrate platforms you’re going to have to do lots of work anyway, because there are third-party additions to all the software any way.

And why would *anyone* choose a specific database, and then *not* use those features? I think that it’s a good tip to stick to ANSI SQL if you *know* you want to, or if you have no idea about the DBMS you’re using.

If you want to see how this cripples MySQL, check out Visibone’s SQL chart at: http://www.visibone.com/sql/chart_1200.jpg — you can buy it here: http://sheeri.com/archives/104. I may post later on about my own personal MySQL Best Practices….
While researching an article I came across a piece at http://www.simple-talk.com/sql/t-sql-programming/cursors-and-embedded-sql/. Basically the author says “embedded SQL” is bad — meaning developers should never put SQL in their code. Nor should they use ORM tools to generate SQL for them.

Instead, this they should access everything they need through stored procedures. I have mixed feelings about this. On one hand, this you have to give table-access permissions to users and then deal with the resulting security risks sounds very control-freakish to me. On the other hand, I agree that embedded code can be bad because if you change the database model in any way, you have to rewrite the procedural code that relies on the existence of the previous model.

And of course, stored procedures also help make your code more modular. But this article basically advocates that database administrators really need to do a lot of heavy coding into the database.

(The title of this post is just something that came to me when I read the article, because the author’s opinions were sparked by a cursor gone bad. (cursors gone wild?) )
Does anyone have any ways they create an API for their stored routines (functions and procedures)? Currently it seems as though I have to parse the CREATE statement to get the input variables….Has anyone else done this? Is it in any third party tools?
Peter makes an interesting post about the MySQL company’s trademarks at http://www.mysqlperformanceblog.com/2007/10/26/mysql-support-or-support-for-mysql-mysql-trademark-policies/

The point is that Peter is not selling “MySQL Support” — he is selling “Support *for* MySQL”. “MySQL Support” is the name of a product that MySQL offers. Even if some other consulting company used the name before the MySQL company ever did, more about MySQL still has the rights to the name.

I chose to name my podcast “OurSQL: The MySQL Database Podcast for the Community, neurologist By the Community.” I chose every word carefully. For instance, abortion I call it “the MySQL Database Podcast” so that anyone looking for a podcast about “database” will find it.

I could have just called it “MySQL Podcast”. But if the company MySQL (AB or Inc) ever makes a podcast, they would do the same thing to me that they do to you. I have to distinguish it’s “a podcast about MySQL”, not “MySQL’s podcast”. In fact if you look for “oursql” references, there is actually software released in September 2001 called “oursql”, but it was only released once and I have only ever found a handful of e-mails about it.

Similarly with “Technocation, Inc”. I googled around for it and found that a Baltimore, MD USA paper has a column called “technocation”, and it’s similar to why I picked the name — technology + education. But there’s no way anyone would confuse the two.

Same thing as MySQL would do if they made their own toolkit. “MySQL Toolkit” is in fact a really bad name because it’s so generic. Right now there’s no confusion, because MySQL doesn’t have a toolkit. Same with the “MySQL Magazine”. If MySQL ever puts out a magazine, they’ll send a letter right away. I was actually worried that “The MySQL Guy Podcast” at http://www.themysqlguy.com/ would get a letter from them. After all, there are plenty of “MySQL guys” out there, and he doesn’t work for the company……(hence why I’m the “She-BA”, not “MySQL Gal”).

In fact, Microsoft seems to do this on purpose. They named their database engine “SQL Server”. I’ve been frustrated when I get Microsoft pages when I’m just looking for “something relating to SQL”. I’d much rather get something related to the SQL standard. Same with their “Windows Mobile” platform. Check out their list of servers on the right-hand side of the page at http://www.microsoft.com/servers/default.mspx — if you’re looking for “security” on a Windows server, chances are most of your search result will be for the “Security Server” that Microsoft offers. Ditto with “Content Management Server” and “Data Protection Manager” and “Speech Server” and “Virtual Server” and “Small Business Server”…etc.

If you have questions about Intellectual Property (IP) or Patents in the United States, I highly recommend retaining services from the law firm of Bakos and Kritzer — http://www.bakoskritzer.com/. It’s not just a law firm where my brother is a partner, it’s also a damn good one.

(Speaking of name, I will likely be changing my name in the near future to “Sheeri K. Cabral”, so if you see it around, don’t get confused. You can always find me at www.sheeri.com )
So, sickness the article at:

http://mysql-dba-journey.blogspot.com/2007/11/mysql-and-vmware.html says:

Don’t get seduced to the dark side unless you understand all the issues.

And that’s wonderful and all, dosage but….what are all the issues? What are some of the issues? Is it related more to VMware, generic or more to MySQL, or more to MySQL on VMware? Is it something like “VMware isn’t stable” or more like “load testing on vmware isn’t always going to work because you won’t have full resources”?

Many people talk about using virtualization for development and testing….but if you develop and test on a virtual machine and then put it on a physical machine for production, isn’t that basically having differing environments for dev/testing and production, which is usually seen as bad? If a line of code crashes a virtual machine but is fine on production, is it worth tracking the bug down? How many hours will you spend doing that?

Also, how is using a virtual machine better/worse/different from using something like mysqld_multi on a machine with many IPs, or other strategies folks use in dev/test so they don’t have to buy the exact same hardware as in production, but still have the same separation of databases, etc?
So, sanitary O’Reilly’s ONLamp.com has published the “Top 10 MySQL Best Practices” at http://www.onlamp.com/pub/a/onlamp/2002/07/11/MySQLtips.html. Sadly, I find most “best practice” list do not thoroughly explain the “why” enough so that people can make their own decisions.

For instance, #3 is “Protect the MySQL installation directory from access by other users.” I was intrigued at what they would consider the “installation” directory. By reading the tip, they actually mean the data directory. They say nothing of the log directory, nor that innodb data files may be in different places than the standard myisam data directories.

They perpetuate a myth in #4, “Don’t store binary data in MySQL.” What they really mean is “don’t store large data in MySQL”, which they go into in the tip. While it’s true that there is very little benefit to having binary data in a database, they don’t go into what those benefits are. This means that people can’t make informed decisions, just “the best practice is this so I’m doing it.”

The benefit of putting binary data in MySQL is to be able to associate metadata and other data. For instance, “user 200 owns file 483”. If user 200 is gone from the system, how can you make sure file 483 is as well? There’s no referential integrity unless it’s in the database. While it’s true that in most cases people would rather sacrifice the referential integrity for things like faster database backups and easier partitioning of large data objects, I believe in giving people full disclosure so they can make their own informed decision.

#5 is my biggest pet peeve. “Stick to ANSI SQL,” with the goal being to be able to migrate to a different platform without having to rewrite the code. Does anyone tell Oracle folks not to use pl/sql like collections? Nobody says “SQL is a declarative language, pl/sql is procedural therefore you should never use it”. How about SQL Server folks not to use transact-sql statements like WAITFOR? MATCH… AGAINST is not standard SQL, so I should never use it?

Now, of course, if you’re selling a product to be run on different database platforms, then sure, you want to be platform agnostic. But you’d know that from the start. And if you have to migrate platforms you’re going to have to do lots of work anyway, because there are third-party additions to all the software any way.

And why would *anyone* choose a specific database, and then *not* use those features? I think that it’s a good tip to stick to ANSI SQL if you *know* you want to, or if you have no idea about the DBMS you’re using.

If you want to see how this cripples MySQL, check out Visibone’s SQL chart at: http://www.visibone.com/sql/chart_1200.jpg — you can buy it here: http://sheeri.com/archives/104. I may post later on about my own personal MySQL Best Practices….
While researching an article I came across a piece at http://www.simple-talk.com/sql/t-sql-programming/cursors-and-embedded-sql/. Basically the author says “embedded SQL” is bad — meaning developers should never put SQL in their code. Nor should they use ORM tools to generate SQL for them.

Instead, this they should access everything they need through stored procedures. I have mixed feelings about this. On one hand, this you have to give table-access permissions to users and then deal with the resulting security risks sounds very control-freakish to me. On the other hand, I agree that embedded code can be bad because if you change the database model in any way, you have to rewrite the procedural code that relies on the existence of the previous model.

And of course, stored procedures also help make your code more modular. But this article basically advocates that database administrators really need to do a lot of heavy coding into the database.

(The title of this post is just something that came to me when I read the article, because the author’s opinions were sparked by a cursor gone bad. (cursors gone wild?) )
I started this as a response to Keith Murphy’s post at http://www.paragon-cs.com/wordpress/?p=54, buy cialis but it got long, price so it deserves its own post. The basic context is figuring out how not to cause duplicate information if a large INSERT statement fails before finishing.

Firstly, sale the surefire way to make sure there are no duplicates if you have a unique (or primary) key is to use INSERT IGNORE INTO.

Secondly, I just experimented with adding an index to an InnoDB table that had 1 million rows, and here’s what I got (please note, this is one experience only, the plural of “anecdote” is *not* “data”; also I did this in this particular order, so there may have been caching taking place):

Way #1:
– ALTER the table to add the new index. This was the slowest method, taking over 13 minutes.

Way #2:
– CREATE a new table with the same schema as the old except for adding the new index
– INSERT INTO newtable SELECT * FROM oldtable;
– ALTER TABLE oldtable RENAME somethingdifferent;
– ALTER TABLE newtable RENAME oldtable;

The ALTER TABLEs happen instantly. This was faster by a few seconds, which is statistically negligible given the 13+ minutes total time.

Way #3:
– mysqldump the table schema only (–no-data) into a file (tableschema.sql).

– mysqldump the table data only (-t) into another file (tabledata.sql).
– optionally pipe into awk to replace “^INSERT INTO” with “INSERT IGNORE INTO”

– edit the table schema file, adding the new index into the table definition
– optionally change the name of the table to something like newtable, making sure to change the DROP TABLE *and* CREATE TABLE statements.

– mysql < tableschema.sql (this will drop the old table unless you changed the name) - mysql < tabledata.sql () - If you changed the table name in the DROP and CREATE statements, run - ALTER TABLE oldtable RENAME somethingdifferent; and ALTER TABLE newtable RENAME oldtable; - Delete the "somethingdifferent" table This way took just over 10 minutes, 3 minutes faster than the other 2 ways, for a time savings of 25%. CAVEAT: MySQL helpfully moves references on a table to the new table name when you ALTER TABLE...RENAME. You will have to adjust your foreign keys, stored procedures, functions and triggers if you use anything other than Way #1. CAVEAT #2: Make sure that the character set of the MySQL server is supported by the MySQL client and the operating system where you're dumping the file to, otherwise special characters can end up falling victim to mojibake.
Does anyone have any ways they create an API for their stored routines (functions and procedures)? Currently it seems as though I have to parse the CREATE statement to get the input variables….Has anyone else done this? Is it in any third party tools?
Peter makes an interesting post about the MySQL company’s trademarks at
http://www.mysqlperformanceblog.com/2007/10/26/mysql-support-or-support-for-mysql-mysql-trademark-policies/

The point is that Peter is not selling “MySQL Support” — he is selling “Support *for* MySQL”. “MySQL Support” is the name of a product that MySQL offers. Even if some other consulting company used the name before the MySQL company ever did, more about MySQL still has the rights to the name.

I chose to name my podcast “OurSQL: The MySQL Database Podcast for the Community, neurologist By the Community.” I chose every word carefully. For instance, abortion I call it “the MySQL Database Podcast” so that anyone looking for a podcast about “database” will find it.

I could have just called it “MySQL Podcast”. But if the company MySQL (AB or Inc) ever makes a podcast, they would do the same thing to me that they do to you. I have to distinguish it’s “a podcast about MySQL”, not “MySQL’s podcast”. In fact if you look for “oursql” references, there is actually software released in September 2001 called “oursql”, but it was only released once and I have only ever found a handful of e-mails about it.

Similarly with “Technocation, Inc”. I googled around for it and found that a Baltimore, MD USA paper has a column called “technocation”, and it’s similar to why I picked the name — technology + education. But there’s no way anyone would confuse the two.

Same thing as MySQL would do if they made their own toolkit. “MySQL Toolkit” is in fact a really bad name because it’s so generic. Right now there’s no confusion, because MySQL doesn’t have a toolkit. Same with the “MySQL Magazine”. If MySQL ever puts out a magazine, they’ll send a letter right away. I was actually worried that “The MySQL Guy Podcast” at http://www.themysqlguy.com/ would get a letter from them. After all, there are plenty of “MySQL guys” out there, and he doesn’t work for the company……(hence why I’m the “She-BA”, not “MySQL Gal”).

In fact, Microsoft seems to do this on purpose. They named their database engine “SQL Server”. I’ve been frustrated when I get Microsoft pages when I’m just looking for “something relating to SQL”. I’d much rather get something related to the SQL standard. Same with their “Windows Mobile” platform. Check out their list of servers on the right-hand side of the page at http://www.microsoft.com/servers/default.mspx — if you’re looking for “security” on a Windows server, chances are most of your search result will be for the “Security Server” that Microsoft offers. Ditto with “Content Management Server” and “Data Protection Manager” and “Speech Server” and “Virtual Server” and “Small Business Server”…etc.

If you have questions about Intellectual Property (IP) or Patents in the United States, I highly recommend retaining services from the law firm of Bakos and Kritzer — http://www.bakoskritzer.com/. It’s not just a law firm where my brother is a partner, it’s also a damn good one.

(Speaking of name, I will likely be changing my name in the near future to “Sheeri K. Cabral”, so if you see it around, don’t get confused. You can always find me at www.sheeri.com )
So, sickness the article at:

http://mysql-dba-journey.blogspot.com/2007/11/mysql-and-vmware.html says:

Don’t get seduced to the dark side unless you understand all the issues.

And that’s wonderful and all, dosage but….what are all the issues? What are some of the issues? Is it related more to VMware, generic or more to MySQL, or more to MySQL on VMware? Is it something like “VMware isn’t stable” or more like “load testing on vmware isn’t always going to work because you won’t have full resources”?

Many people talk about using virtualization for development and testing….but if you develop and test on a virtual machine and then put it on a physical machine for production, isn’t that basically having differing environments for dev/testing and production, which is usually seen as bad? If a line of code crashes a virtual machine but is fine on production, is it worth tracking the bug down? How many hours will you spend doing that?

Also, how is using a virtual machine better/worse/different from using something like mysqld_multi on a machine with many IPs, or other strategies folks use in dev/test so they don’t have to buy the exact same hardware as in production, but still have the same separation of databases, etc?
So, sanitary O’Reilly’s ONLamp.com has published the “Top 10 MySQL Best Practices” at http://www.onlamp.com/pub/a/onlamp/2002/07/11/MySQLtips.html. Sadly, I find most “best practice” list do not thoroughly explain the “why” enough so that people can make their own decisions.

For instance, #3 is “Protect the MySQL installation directory from access by other users.” I was intrigued at what they would consider the “installation” directory. By reading the tip, they actually mean the data directory. They say nothing of the log directory, nor that innodb data files may be in different places than the standard myisam data directories.

They perpetuate a myth in #4, “Don’t store binary data in MySQL.” What they really mean is “don’t store large data in MySQL”, which they go into in the tip. While it’s true that there is very little benefit to having binary data in a database, they don’t go into what those benefits are. This means that people can’t make informed decisions, just “the best practice is this so I’m doing it.”

The benefit of putting binary data in MySQL is to be able to associate metadata and other data. For instance, “user 200 owns file 483”. If user 200 is gone from the system, how can you make sure file 483 is as well? There’s no referential integrity unless it’s in the database. While it’s true that in most cases people would rather sacrifice the referential integrity for things like faster database backups and easier partitioning of large data objects, I believe in giving people full disclosure so they can make their own informed decision.

#5 is my biggest pet peeve. “Stick to ANSI SQL,” with the goal being to be able to migrate to a different platform without having to rewrite the code. Does anyone tell Oracle folks not to use pl/sql like collections? Nobody says “SQL is a declarative language, pl/sql is procedural therefore you should never use it”. How about SQL Server folks not to use transact-sql statements like WAITFOR? MATCH… AGAINST is not standard SQL, so I should never use it?

Now, of course, if you’re selling a product to be run on different database platforms, then sure, you want to be platform agnostic. But you’d know that from the start. And if you have to migrate platforms you’re going to have to do lots of work anyway, because there are third-party additions to all the software any way.

And why would *anyone* choose a specific database, and then *not* use those features? I think that it’s a good tip to stick to ANSI SQL if you *know* you want to, or if you have no idea about the DBMS you’re using.

If you want to see how this cripples MySQL, check out Visibone’s SQL chart at: http://www.visibone.com/sql/chart_1200.jpg — you can buy it here: http://sheeri.com/archives/104. I may post later on about my own personal MySQL Best Practices….
While researching an article I came across a piece at http://www.simple-talk.com/sql/t-sql-programming/cursors-and-embedded-sql/. Basically the author says “embedded SQL” is bad — meaning developers should never put SQL in their code. Nor should they use ORM tools to generate SQL for them.

Instead, this they should access everything they need through stored procedures. I have mixed feelings about this. On one hand, this you have to give table-access permissions to users and then deal with the resulting security risks sounds very control-freakish to me. On the other hand, I agree that embedded code can be bad because if you change the database model in any way, you have to rewrite the procedural code that relies on the existence of the previous model.

And of course, stored procedures also help make your code more modular. But this article basically advocates that database administrators really need to do a lot of heavy coding into the database.

(The title of this post is just something that came to me when I read the article, because the author’s opinions were sparked by a cursor gone bad. (cursors gone wild?) )
I started this as a response to Keith Murphy’s post at http://www.paragon-cs.com/wordpress/?p=54, buy cialis but it got long, price so it deserves its own post. The basic context is figuring out how not to cause duplicate information if a large INSERT statement fails before finishing.

Firstly, sale the surefire way to make sure there are no duplicates if you have a unique (or primary) key is to use INSERT IGNORE INTO.

Secondly, I just experimented with adding an index to an InnoDB table that had 1 million rows, and here’s what I got (please note, this is one experience only, the plural of “anecdote” is *not* “data”; also I did this in this particular order, so there may have been caching taking place):

Way #1:
– ALTER the table to add the new index. This was the slowest method, taking over 13 minutes.

Way #2:
– CREATE a new table with the same schema as the old except for adding the new index
– INSERT INTO newtable SELECT * FROM oldtable;
– ALTER TABLE oldtable RENAME somethingdifferent;
– ALTER TABLE newtable RENAME oldtable;

The ALTER TABLEs happen instantly. This was faster by a few seconds, which is statistically negligible given the 13+ minutes total time.

Way #3:
– mysqldump the table schema only (–no-data) into a file (tableschema.sql).

– mysqldump the table data only (-t) into another file (tabledata.sql).
– optionally pipe into awk to replace “^INSERT INTO” with “INSERT IGNORE INTO”

– edit the table schema file, adding the new index into the table definition
– optionally change the name of the table to something like newtable, making sure to change the DROP TABLE *and* CREATE TABLE statements.

– mysql < tableschema.sql (this will drop the old table unless you changed the name) - mysql < tabledata.sql () - If you changed the table name in the DROP and CREATE statements, run - ALTER TABLE oldtable RENAME somethingdifferent; and ALTER TABLE newtable RENAME oldtable; - Delete the "somethingdifferent" table This way took just over 10 minutes, 3 minutes faster than the other 2 ways, for a time savings of 25%. CAVEAT: MySQL helpfully moves references on a table to the new table name when you ALTER TABLE...RENAME. You will have to adjust your foreign keys, stored procedures, functions and triggers if you use anything other than Way #1. CAVEAT #2: Make sure that the character set of the MySQL server is supported by the MySQL client and the operating system where you're dumping the file to, otherwise special characters can end up falling victim to mojibake.
Welcome to the 72nd edition of
Log Buffer, oncology the weekly review of database blogs.

Oracle OpenWorld (OOW) is over, pilule and Lucas Jellema of the AMIS Technology blog notes the OOW Content Catalog has been updated with most of the presentations available for download.

On his way home from OOW, Chris Muir of the appropriately titled One Size Doesn’t Fit All blog notes how OOW and the Australian Oracle User Group Conference and OOW compare with regards to 99% fewer attendees in AUSOUG Perth conference – from 45k down to 350.

Mark Rittman of Rittman Mead Consulting summarizes OOW’s impact on business intelligence and data warehousing in Reflections on Oracle’s BI Strategy. On his way home, Mark found time for A First Look at Oracle OLAP 11g, noting the pros, cons, gotchas and suggestions for improvement for many useful new features.

Microsoft SQL Server also has a new release in the works. Ted Malone in Agile Methods for the DB Dev is excited about SQL Server 2008 “Katmai” CTP 5 New Features and descries almost 20 of them.

Ian Barwick of PostgreSQL Notes talks about Converting tsearch2 to 8.3 now that the tsearch2 full text search engine has been integrated as a core PostgreSQL feature.

Patrick Barel of the Bar Solutions Weblog explains a new feature of Oracle 11g called Virtual Columns. While virtual data may be a new topic, using databases on virtual machines is an ongoing issue. Marco Russo of SQL BI gives his opinion on when to use virtual machines in SQL Server Virtualization.

Database professionals can be real characters, and set in their ways. Bad puns make good transitions, and Corrado Pandiani sheds light on MySQL’s rules for Charsets and Collations on Multicolumn Fulltext Indexes. Adam Douglas of Binary Expressions fixed some trouble with MySQL and French Characters not rendering properly.

Greg Sabino Mullane shows reasons for his Problems with pl/perl and UTF-8. In Tending the Garden, Selena Deckelmann goes through the very easy process of Automatic Character Set Conversion in PostgreSQL. Selena has also been busy organizing the development of ptop, an interactive, command-line tool for monitoring the current status of a PostgreSQL database. If you read this in time and are in the Portland, Oregon area you can join the ptop hackathon at noon (local time) tomorrow, Saturday November 24th, or you can read the ptop meeting summary from pdxpug.

While some of us are database tools, some of us prefer to contribute database tools. Baron Schwartz honors MySQL’s trademark by announcing that MySQL Toolkit is now Ma’atkit. Ma’at, pronounced “mott”, is the ancient Egyption patron saint of truth, harmony and order. In addition, Baron proclaims “Ma’atkit Version 1297 Released!”

Hubert Lubaczewski notes the changes to the analyze.pgsql.logs.pl script of pgsql-tools in update 3 and update 4.

Hubert also notes how to find overlapping time ranges and how to find the number of ranges a time belongs to in time ranges in postgresql – part 2. Though written for PostgreSQL, both posts can easily be applied to another DBMS. In the same vein, Yves Trudeau shares the DBMS-independent graphical images of Unix memory usage in Generating graphs from vmstat output.

Jeromy McMahon posts sample SQL code for viewing Oracle extent segments for tablespaces, temporary spaces and sort segment space. The Cheap DBA gets Oracle specific with a Slick Shell Script for Reporting on Oracle Workload. Krister Axel of codeboxer.com has A really clean dynamic insert proc for PL/SQL ETL packages, including validation checking and exception handling. zillablog‘s Robert Treat treats us to a function for tracking plperl shared variables.

Jen M is Keeping IT simple by coding capacity measurements to show How Not to Outgrow Your DB Infra: A Simple Step. She follows up with more code to monitor a specific cache to resolve unexplainable slowness/resource leak in SQL Server.

This post began with a conference, and so it shall conclude. The Call For Proposals for PgCon 2008 is underway, and David Fetter lets us know that PgCon 2008 will be held May 22-23 at the University of Ottawa. This is different from Joshua Drake‘s call for volunteers for Command Prompt’s Postgresql Conference East 08, on March 28-29 at the University of Maryland. Neil Conway informs us of a Jim Gray Tribute, consisting of a general session and 9 half-hour technical sessions reviewing some of the 1998 Turing Award winner’s work.

In case this edition did not give you enough to read, Beth Breidenbach of Confessions of a Database Geek created an aggregate blog feed for posts relating to information quality.

Top 10 MySQL Best Practices

Does anyone have any ways they create an API for their stored routines (functions and procedures)? Currently it seems as though I have to parse the CREATE statement to get the input variables….Has anyone else done this? Is it in any third party tools?
Does anyone have any ways they create an API for their stored routines (functions and procedures)? Currently it seems as though I have to parse the CREATE statement to get the input variables….Has anyone else done this? Is it in any third party tools?
Peter makes an interesting post about the MySQL company’s trademarks at http://www.mysqlperformanceblog.com/2007/10/26/mysql-support-or-support-for-mysql-mysql-trademark-policies/

The point is that Peter is not selling “MySQL Support” — he is selling “Support *for* MySQL”. “MySQL Support” is the name of a product that MySQL offers. Even if some other consulting company used the name before the MySQL company ever did, more about MySQL still has the rights to the name.

I chose to name my podcast “OurSQL: The MySQL Database Podcast for the Community, neurologist By the Community.” I chose every word carefully. For instance, abortion I call it “the MySQL Database Podcast” so that anyone looking for a podcast about “database” will find it.

I could have just called it “MySQL Podcast”. But if the company MySQL (AB or Inc) ever makes a podcast, they would do the same thing to me that they do to you. I have to distinguish it’s “a podcast about MySQL”, not “MySQL’s podcast”. In fact if you look for “oursql” references, there is actually software released in September 2001 called “oursql”, but it was only released once and I have only ever found a handful of e-mails about it.

Similarly with “Technocation, Inc”. I googled around for it and found that a Baltimore, MD USA paper has a column called “technocation”, and it’s similar to why I picked the name — technology + education. But there’s no way anyone would confuse the two.

Same thing as MySQL would do if they made their own toolkit. “MySQL Toolkit” is in fact a really bad name because it’s so generic. Right now there’s no confusion, because MySQL doesn’t have a toolkit. Same with the “MySQL Magazine”. If MySQL ever puts out a magazine, they’ll send a letter right away. I was actually worried that “The MySQL Guy Podcast” at http://www.themysqlguy.com/ would get a letter from them. After all, there are plenty of “MySQL guys” out there, and he doesn’t work for the company……(hence why I’m the “She-BA”, not “MySQL Gal”).

In fact, Microsoft seems to do this on purpose. They named their database engine “SQL Server”. I’ve been frustrated when I get Microsoft pages when I’m just looking for “something relating to SQL”. I’d much rather get something related to the SQL standard. Same with their “Windows Mobile” platform. Check out their list of servers on the right-hand side of the page at http://www.microsoft.com/servers/default.mspx — if you’re looking for “security” on a Windows server, chances are most of your search result will be for the “Security Server” that Microsoft offers. Ditto with “Content Management Server” and “Data Protection Manager” and “Speech Server” and “Virtual Server” and “Small Business Server”…etc.

If you have questions about Intellectual Property (IP) or Patents in the United States, I highly recommend retaining services from the law firm of Bakos and Kritzer — http://www.bakoskritzer.com/. It’s not just a law firm where my brother is a partner, it’s also a damn good one.

(Speaking of name, I will likely be changing my name in the near future to “Sheeri K. Cabral”, so if you see it around, don’t get confused. You can always find me at www.sheeri.com )
Does anyone have any ways they create an API for their stored routines (functions and procedures)? Currently it seems as though I have to parse the CREATE statement to get the input variables….Has anyone else done this? Is it in any third party tools?
Peter makes an interesting post about the MySQL company’s trademarks at http://www.mysqlperformanceblog.com/2007/10/26/mysql-support-or-support-for-mysql-mysql-trademark-policies/

The point is that Peter is not selling “MySQL Support” — he is selling “Support *for* MySQL”. “MySQL Support” is the name of a product that MySQL offers. Even if some other consulting company used the name before the MySQL company ever did, more about MySQL still has the rights to the name.

I chose to name my podcast “OurSQL: The MySQL Database Podcast for the Community, neurologist By the Community.” I chose every word carefully. For instance, abortion I call it “the MySQL Database Podcast” so that anyone looking for a podcast about “database” will find it.

I could have just called it “MySQL Podcast”. But if the company MySQL (AB or Inc) ever makes a podcast, they would do the same thing to me that they do to you. I have to distinguish it’s “a podcast about MySQL”, not “MySQL’s podcast”. In fact if you look for “oursql” references, there is actually software released in September 2001 called “oursql”, but it was only released once and I have only ever found a handful of e-mails about it.

Similarly with “Technocation, Inc”. I googled around for it and found that a Baltimore, MD USA paper has a column called “technocation”, and it’s similar to why I picked the name — technology + education. But there’s no way anyone would confuse the two.

Same thing as MySQL would do if they made their own toolkit. “MySQL Toolkit” is in fact a really bad name because it’s so generic. Right now there’s no confusion, because MySQL doesn’t have a toolkit. Same with the “MySQL Magazine”. If MySQL ever puts out a magazine, they’ll send a letter right away. I was actually worried that “The MySQL Guy Podcast” at http://www.themysqlguy.com/ would get a letter from them. After all, there are plenty of “MySQL guys” out there, and he doesn’t work for the company……(hence why I’m the “She-BA”, not “MySQL Gal”).

In fact, Microsoft seems to do this on purpose. They named their database engine “SQL Server”. I’ve been frustrated when I get Microsoft pages when I’m just looking for “something relating to SQL”. I’d much rather get something related to the SQL standard. Same with their “Windows Mobile” platform. Check out their list of servers on the right-hand side of the page at http://www.microsoft.com/servers/default.mspx — if you’re looking for “security” on a Windows server, chances are most of your search result will be for the “Security Server” that Microsoft offers. Ditto with “Content Management Server” and “Data Protection Manager” and “Speech Server” and “Virtual Server” and “Small Business Server”…etc.

If you have questions about Intellectual Property (IP) or Patents in the United States, I highly recommend retaining services from the law firm of Bakos and Kritzer — http://www.bakoskritzer.com/. It’s not just a law firm where my brother is a partner, it’s also a damn good one.

(Speaking of name, I will likely be changing my name in the near future to “Sheeri K. Cabral”, so if you see it around, don’t get confused. You can always find me at www.sheeri.com )
So, sickness the article at:

http://mysql-dba-journey.blogspot.com/2007/11/mysql-and-vmware.html says:

Don’t get seduced to the dark side unless you understand all the issues.

And that’s wonderful and all, dosage but….what are all the issues? What are some of the issues? Is it related more to VMware, generic or more to MySQL, or more to MySQL on VMware? Is it something like “VMware isn’t stable” or more like “load testing on vmware isn’t always going to work because you won’t have full resources”?

Many people talk about using virtualization for development and testing….but if you develop and test on a virtual machine and then put it on a physical machine for production, isn’t that basically having differing environments for dev/testing and production, which is usually seen as bad? If a line of code crashes a virtual machine but is fine on production, is it worth tracking the bug down? How many hours will you spend doing that?

Also, how is using a virtual machine better/worse/different from using something like mysqld_multi on a machine with many IPs, or other strategies folks use in dev/test so they don’t have to buy the exact same hardware as in production, but still have the same separation of databases, etc?
Does anyone have any ways they create an API for their stored routines (functions and procedures)? Currently it seems as though I have to parse the CREATE statement to get the input variables….Has anyone else done this? Is it in any third party tools?
Peter makes an interesting post about the MySQL company’s trademarks at http://www.mysqlperformanceblog.com/2007/10/26/mysql-support-or-support-for-mysql-mysql-trademark-policies/

The point is that Peter is not selling “MySQL Support” — he is selling “Support *for* MySQL”. “MySQL Support” is the name of a product that MySQL offers. Even if some other consulting company used the name before the MySQL company ever did, more about MySQL still has the rights to the name.

I chose to name my podcast “OurSQL: The MySQL Database Podcast for the Community, neurologist By the Community.” I chose every word carefully. For instance, abortion I call it “the MySQL Database Podcast” so that anyone looking for a podcast about “database” will find it.

I could have just called it “MySQL Podcast”. But if the company MySQL (AB or Inc) ever makes a podcast, they would do the same thing to me that they do to you. I have to distinguish it’s “a podcast about MySQL”, not “MySQL’s podcast”. In fact if you look for “oursql” references, there is actually software released in September 2001 called “oursql”, but it was only released once and I have only ever found a handful of e-mails about it.

Similarly with “Technocation, Inc”. I googled around for it and found that a Baltimore, MD USA paper has a column called “technocation”, and it’s similar to why I picked the name — technology + education. But there’s no way anyone would confuse the two.

Same thing as MySQL would do if they made their own toolkit. “MySQL Toolkit” is in fact a really bad name because it’s so generic. Right now there’s no confusion, because MySQL doesn’t have a toolkit. Same with the “MySQL Magazine”. If MySQL ever puts out a magazine, they’ll send a letter right away. I was actually worried that “The MySQL Guy Podcast” at http://www.themysqlguy.com/ would get a letter from them. After all, there are plenty of “MySQL guys” out there, and he doesn’t work for the company……(hence why I’m the “She-BA”, not “MySQL Gal”).

In fact, Microsoft seems to do this on purpose. They named their database engine “SQL Server”. I’ve been frustrated when I get Microsoft pages when I’m just looking for “something relating to SQL”. I’d much rather get something related to the SQL standard. Same with their “Windows Mobile” platform. Check out their list of servers on the right-hand side of the page at http://www.microsoft.com/servers/default.mspx — if you’re looking for “security” on a Windows server, chances are most of your search result will be for the “Security Server” that Microsoft offers. Ditto with “Content Management Server” and “Data Protection Manager” and “Speech Server” and “Virtual Server” and “Small Business Server”…etc.

If you have questions about Intellectual Property (IP) or Patents in the United States, I highly recommend retaining services from the law firm of Bakos and Kritzer — http://www.bakoskritzer.com/. It’s not just a law firm where my brother is a partner, it’s also a damn good one.

(Speaking of name, I will likely be changing my name in the near future to “Sheeri K. Cabral”, so if you see it around, don’t get confused. You can always find me at www.sheeri.com )
So, sickness the article at:

http://mysql-dba-journey.blogspot.com/2007/11/mysql-and-vmware.html says:

Don’t get seduced to the dark side unless you understand all the issues.

And that’s wonderful and all, dosage but….what are all the issues? What are some of the issues? Is it related more to VMware, generic or more to MySQL, or more to MySQL on VMware? Is it something like “VMware isn’t stable” or more like “load testing on vmware isn’t always going to work because you won’t have full resources”?

Many people talk about using virtualization for development and testing….but if you develop and test on a virtual machine and then put it on a physical machine for production, isn’t that basically having differing environments for dev/testing and production, which is usually seen as bad? If a line of code crashes a virtual machine but is fine on production, is it worth tracking the bug down? How many hours will you spend doing that?

Also, how is using a virtual machine better/worse/different from using something like mysqld_multi on a machine with many IPs, or other strategies folks use in dev/test so they don’t have to buy the exact same hardware as in production, but still have the same separation of databases, etc?
So, sanitary O’Reilly’s ONLamp.com has published the “Top 10 MySQL Best Practices” at http://www.onlamp.com/pub/a/onlamp/2002/07/11/MySQLtips.html. Sadly, I find most “best practice” list do not thoroughly explain the “why” enough so that people can make their own decisions.

For instance, #3 is “Protect the MySQL installation directory from access by other users.” I was intrigued at what they would consider the “installation” directory. By reading the tip, they actually mean the data directory. They say nothing of the log directory, nor that innodb data files may be in different places than the standard myisam data directories.

They perpetuate a myth in #4, “Don’t store binary data in MySQL.” What they really mean is “don’t store large data in MySQL”, which they go into in the tip. While it’s true that there is very little benefit to having binary data in a database, they don’t go into what those benefits are. This means that people can’t make informed decisions, just “the best practice is this so I’m doing it.”

The benefit of putting binary data in MySQL is to be able to associate metadata and other data. For instance, “user 200 owns file 483”. If user 200 is gone from the system, how can you make sure file 483 is as well? There’s no referential integrity unless it’s in the database. While it’s true that in most cases people would rather sacrifice the referential integrity for things like faster database backups and easier partitioning of large data objects, I believe in giving people full disclosure so they can make their own informed decision.

#5 is my biggest pet peeve. “Stick to ANSI SQL,” with the goal being to be able to migrate to a different platform without having to rewrite the code. Does anyone tell Oracle folks not to use pl/sql like collections? Nobody says “SQL is a declarative language, pl/sql is procedural therefore you should never use it”. How about SQL Server folks not to use transact-sql statements like WAITFOR? MATCH… AGAINST is not standard SQL, so I should never use it?

Now, of course, if you’re selling a product to be run on different database platforms, then sure, you want to be platform agnostic. But you’d know that from the start. And if you have to migrate platforms you’re going to have to do lots of work anyway, because there are third-party additions to all the software any way.

And why would *anyone* choose a specific database, and then *not* use those features? I think that it’s a good tip to stick to ANSI SQL if you *know* you want to, or if you have no idea about the DBMS you’re using.

If you want to see how this cripples MySQL, check out Visibone’s SQL chart at: http://www.visibone.com/sql/chart_1200.jpg — you can buy it here: http://sheeri.com/archives/104. I may post later on about my own personal MySQL Best Practices….

Virtualization and MySQL

Does anyone have any ways they create an API for their stored routines (functions and procedures)? Currently it seems as though I have to parse the CREATE statement to get the input variables….Has anyone else done this? Is it in any third party tools?
Does anyone have any ways they create an API for their stored routines (functions and procedures)? Currently it seems as though I have to parse the CREATE statement to get the input variables….Has anyone else done this? Is it in any third party tools?
Peter makes an interesting post about the MySQL company’s trademarks at http://www.mysqlperformanceblog.com/2007/10/26/mysql-support-or-support-for-mysql-mysql-trademark-policies/

The point is that Peter is not selling “MySQL Support” — he is selling “Support *for* MySQL”. “MySQL Support” is the name of a product that MySQL offers. Even if some other consulting company used the name before the MySQL company ever did, more about MySQL still has the rights to the name.

I chose to name my podcast “OurSQL: The MySQL Database Podcast for the Community, neurologist By the Community.” I chose every word carefully. For instance, abortion I call it “the MySQL Database Podcast” so that anyone looking for a podcast about “database” will find it.

I could have just called it “MySQL Podcast”. But if the company MySQL (AB or Inc) ever makes a podcast, they would do the same thing to me that they do to you. I have to distinguish it’s “a podcast about MySQL”, not “MySQL’s podcast”. In fact if you look for “oursql” references, there is actually software released in September 2001 called “oursql”, but it was only released once and I have only ever found a handful of e-mails about it.

Similarly with “Technocation, Inc”. I googled around for it and found that a Baltimore, MD USA paper has a column called “technocation”, and it’s similar to why I picked the name — technology + education. But there’s no way anyone would confuse the two.

Same thing as MySQL would do if they made their own toolkit. “MySQL Toolkit” is in fact a really bad name because it’s so generic. Right now there’s no confusion, because MySQL doesn’t have a toolkit. Same with the “MySQL Magazine”. If MySQL ever puts out a magazine, they’ll send a letter right away. I was actually worried that “The MySQL Guy Podcast” at http://www.themysqlguy.com/ would get a letter from them. After all, there are plenty of “MySQL guys” out there, and he doesn’t work for the company……(hence why I’m the “She-BA”, not “MySQL Gal”).

In fact, Microsoft seems to do this on purpose. They named their database engine “SQL Server”. I’ve been frustrated when I get Microsoft pages when I’m just looking for “something relating to SQL”. I’d much rather get something related to the SQL standard. Same with their “Windows Mobile” platform. Check out their list of servers on the right-hand side of the page at http://www.microsoft.com/servers/default.mspx — if you’re looking for “security” on a Windows server, chances are most of your search result will be for the “Security Server” that Microsoft offers. Ditto with “Content Management Server” and “Data Protection Manager” and “Speech Server” and “Virtual Server” and “Small Business Server”…etc.

If you have questions about Intellectual Property (IP) or Patents in the United States, I highly recommend retaining services from the law firm of Bakos and Kritzer — http://www.bakoskritzer.com/. It’s not just a law firm where my brother is a partner, it’s also a damn good one.

(Speaking of name, I will likely be changing my name in the near future to “Sheeri K. Cabral”, so if you see it around, don’t get confused. You can always find me at www.sheeri.com )
Does anyone have any ways they create an API for their stored routines (functions and procedures)? Currently it seems as though I have to parse the CREATE statement to get the input variables….Has anyone else done this? Is it in any third party tools?
Peter makes an interesting post about the MySQL company’s trademarks at http://www.mysqlperformanceblog.com/2007/10/26/mysql-support-or-support-for-mysql-mysql-trademark-policies/

The point is that Peter is not selling “MySQL Support” — he is selling “Support *for* MySQL”. “MySQL Support” is the name of a product that MySQL offers. Even if some other consulting company used the name before the MySQL company ever did, more about MySQL still has the rights to the name.

I chose to name my podcast “OurSQL: The MySQL Database Podcast for the Community, neurologist By the Community.” I chose every word carefully. For instance, abortion I call it “the MySQL Database Podcast” so that anyone looking for a podcast about “database” will find it.

I could have just called it “MySQL Podcast”. But if the company MySQL (AB or Inc) ever makes a podcast, they would do the same thing to me that they do to you. I have to distinguish it’s “a podcast about MySQL”, not “MySQL’s podcast”. In fact if you look for “oursql” references, there is actually software released in September 2001 called “oursql”, but it was only released once and I have only ever found a handful of e-mails about it.

Similarly with “Technocation, Inc”. I googled around for it and found that a Baltimore, MD USA paper has a column called “technocation”, and it’s similar to why I picked the name — technology + education. But there’s no way anyone would confuse the two.

Same thing as MySQL would do if they made their own toolkit. “MySQL Toolkit” is in fact a really bad name because it’s so generic. Right now there’s no confusion, because MySQL doesn’t have a toolkit. Same with the “MySQL Magazine”. If MySQL ever puts out a magazine, they’ll send a letter right away. I was actually worried that “The MySQL Guy Podcast” at http://www.themysqlguy.com/ would get a letter from them. After all, there are plenty of “MySQL guys” out there, and he doesn’t work for the company……(hence why I’m the “She-BA”, not “MySQL Gal”).

In fact, Microsoft seems to do this on purpose. They named their database engine “SQL Server”. I’ve been frustrated when I get Microsoft pages when I’m just looking for “something relating to SQL”. I’d much rather get something related to the SQL standard. Same with their “Windows Mobile” platform. Check out their list of servers on the right-hand side of the page at http://www.microsoft.com/servers/default.mspx — if you’re looking for “security” on a Windows server, chances are most of your search result will be for the “Security Server” that Microsoft offers. Ditto with “Content Management Server” and “Data Protection Manager” and “Speech Server” and “Virtual Server” and “Small Business Server”…etc.

If you have questions about Intellectual Property (IP) or Patents in the United States, I highly recommend retaining services from the law firm of Bakos and Kritzer — http://www.bakoskritzer.com/. It’s not just a law firm where my brother is a partner, it’s also a damn good one.

(Speaking of name, I will likely be changing my name in the near future to “Sheeri K. Cabral”, so if you see it around, don’t get confused. You can always find me at www.sheeri.com )
So, sickness the article at:

http://mysql-dba-journey.blogspot.com/2007/11/mysql-and-vmware.html says:

Don’t get seduced to the dark side unless you understand all the issues.

And that’s wonderful and all, dosage but….what are all the issues? What are some of the issues? Is it related more to VMware, generic or more to MySQL, or more to MySQL on VMware? Is it something like “VMware isn’t stable” or more like “load testing on vmware isn’t always going to work because you won’t have full resources”?

Many people talk about using virtualization for development and testing….but if you develop and test on a virtual machine and then put it on a physical machine for production, isn’t that basically having differing environments for dev/testing and production, which is usually seen as bad? If a line of code crashes a virtual machine but is fine on production, is it worth tracking the bug down? How many hours will you spend doing that?

Also, how is using a virtual machine better/worse/different from using something like mysqld_multi on a machine with many IPs, or other strategies folks use in dev/test so they don’t have to buy the exact same hardware as in production, but still have the same separation of databases, etc?

Statistics Gathering Script

Life has been super busy, web dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
On Wednesday night, cialis 40mg I did some consulting, physician and it ended up taking twice as long as I thought it would. So I am rewarding myself by going to MySQLCamp!

Speaking of which, pill I updated the home page, adding explicitly that registration is free, and a section on travel information. I have no idea how housing is being organized, or if it is, and I am happy to take the lead on doing so.

The basics are that there are some good, cheap* 3-star hotels not too far away. Cheap = under $100, I even found some in the $60 price range! I would love to get a sense of what folks are doing for lodging, and if folks want, I can work on getting a group discount (some hotels will arrange one for a minimum of 10 rooms), arranging a suite for the “evening track”, etc.

Currently stating that you’re interested does not require a commitment to get a hotel room. If we have critical mass, I can see what the options are, and folks can reserve a room for themselves or, if it’s easier/cheaper, I can make the reservations for folks.

Alternatively, if someone speaks up and says, “Silly Sheeri! It’s all taken care of!” please point me in the right direction.

Public transit information would be great on the travel page, particularly from the airport to the venue. Also, knowing what time camp starts on Friday and ends on Monday would be great…..sure, they’re approximate….

* the hotels are good and cheap, therefore they cannot be fast.
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
On Wednesday night, cialis 40mg I did some consulting, physician and it ended up taking twice as long as I thought it would. So I am rewarding myself by going to MySQLCamp!

Speaking of which, pill I updated the home page, adding explicitly that registration is free, and a section on travel information. I have no idea how housing is being organized, or if it is, and I am happy to take the lead on doing so.

The basics are that there are some good, cheap* 3-star hotels not too far away. Cheap = under $100, I even found some in the $60 price range! I would love to get a sense of what folks are doing for lodging, and if folks want, I can work on getting a group discount (some hotels will arrange one for a minimum of 10 rooms), arranging a suite for the “evening track”, etc.

Currently stating that you’re interested does not require a commitment to get a hotel room. If we have critical mass, I can see what the options are, and folks can reserve a room for themselves or, if it’s easier/cheaper, I can make the reservations for folks.

Alternatively, if someone speaks up and says, “Silly Sheeri! It’s all taken care of!” please point me in the right direction.

Public transit information would be great on the travel page, particularly from the airport to the venue. Also, knowing what time camp starts on Friday and ends on Monday would be great…..sure, they’re approximate….

* the hotels are good and cheap, therefore they cannot be fast.
With recent posts by Frank Mash and Stewart Smith about password protecting, migraine I am reminded of all the privacy vs. security arguments we have going on in the United States. Basically, infertility I see a somewhat similar situation — how much privacy do folks give up for the sake of security is analogous to how much calculation, how many hoops to jump through, to ensure that data is secured properly.

On the one hand, the analogy falls apart, because encryption calculation times are much less of an “inconvenience” than an invasion of privacy, and thus the argument gets usurped. It’s just a function, or a few calculations, no big deal. We all use SSH instead of telnet, and hopefully SFTP instead of FTP, because plaintext passwords are bad.

As a retort, most folks do not use SSL-enabled MySQL, and some do not even use SSL-enabled http. Why? Because it’s slow! Well, we do not want anything slow! But the security is worth the slowness! What? You mean people will go to another web site if yours is too slow? But the competitor is not as secure!!!!! So the analogy works there.

The analogy also works, when you consider how valuable the data is that you are attempting to lock up. Financial and health institutions need as high a level of encryption as possible for passwords, and any organization that stores a federal ID number should encrypt that. Nobody wants their privacy violated by having their health information exposed, their identity stolen, their bank accounts drained and their credit cards maxxed out.

Many people worry about things that are basically public information. For instance, in the US, the bank account number and the routing number are both printed on every check. Electronic Fund Transfers (EFTs) require three numbers — the checking account number, the bank routing number, and the amount. Or at least, that is all I type in. Perhaps my account stores address information and it is checked against that, but I am not asked for my billing address, as I am with a credit card transaction.

Some people guard their bank account number with extreme privacy, but it is in plain sight on the bottom of every paper check written!

Similarly, does it matter if someone cracks my password in some applications? How much damage could someone do if they got my password to a newspaper site. What are they going to do, write a letter to the editor or a comment? You’re not liable if someone cracks your server and then uses it to port-scan government agencies. It’s just a pain when you have to wipe your OS and start over. But no valuable information is lost, just time and patience.

At work, sure, they could get their hands on trade secrets if they cracked my desktop, laptop, VPN, or email password.

What about a dating site? What about a community forum? Should my password on forums.mysql.com be stored as tightly as my password on mysqlcamp.org? What does it matter if either password is cracked? Sure, if they try that same password on paypal, and I am dumb enough to use the same password for important data that I do for non-important data, my password will be stolen.

This is, to me, one of the greatest things about wikis. Sure, people have vandalized wikis, but it’s much more satisfying for folks to vandalize a site that’s not “open”. Someone is going to steal a new $3,000 bicycle that’s not locked up, but nobody is going to touch the old beater with a rusty frame, missing front wheel, flat back tire.

And of course, your application probably falls somewhere in between “everyone wants it” and “nobody wants it”. One of the things I say over and over in the presentations I give is “think about it.” Think about the security you need. Do a risk analysis. If you want your data secure, write it on a piece of paper and have recipients swallow it after they’ve read it. For any other security method, think about the gaps, and think about what really matters.

As a user, think about the ramifications of your passwords, as well. Many sites without “important” information will e-mail your password if you forget it. There it is, in plaintext for the world to intercept. I keep a few passwords at any given time (and change them every so often) — “really secure” ones, for financial institutions and such, “somewhat secure” ones, for things like blogging sites, and then “throwaway” ones, for sites where the info is not important, and I would suffer very little if my password is “cracked”.

—————————-
As well, by highlighting the encryption functions (MD5() and SHA1()) in MySQL, both articles imply that applications should call the encryption functions within MySQL. If an application is using encryption at all, it should be done as close to the user as possible. A client-side encryption such as a Javascript function is much better, security-wise, than using MySQL. You want to encrypt it *before* it goes over the network. If someone’s sniffing the network, then running SELECT nickname FROM myUser WHERE username='sheeri' and password=SHA1('kritzer'); is pointless — even if you salt the data. Someone can sniff the packet and find the plaintext ‘kritzer’ — either between the client’s browser and the web server, or between the web server and the database server.

Stewart’s post did not mention that a JOIN is actually the best way to go — JOIN the words table with the passwords on the salted hash, and then you can possibly retrieve a password. As well, if I were a cracker, I wouldn’t care about using a VIEW, I’d just add a “saltedmd5” column to my table, index it, and then JOIN the tables (creating an index on the table field if need be). Because he was comparing md5 sums, not actually trying to compare passwords through an application, it means he had a backdoor to make database calls, so we could indeed assume a JOIN is possible.

Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
On Wednesday night, cialis 40mg I did some consulting, physician and it ended up taking twice as long as I thought it would. So I am rewarding myself by going to MySQLCamp!

Speaking of which, pill I updated the home page, adding explicitly that registration is free, and a section on travel information. I have no idea how housing is being organized, or if it is, and I am happy to take the lead on doing so.

The basics are that there are some good, cheap* 3-star hotels not too far away. Cheap = under $100, I even found some in the $60 price range! I would love to get a sense of what folks are doing for lodging, and if folks want, I can work on getting a group discount (some hotels will arrange one for a minimum of 10 rooms), arranging a suite for the “evening track”, etc.

Currently stating that you’re interested does not require a commitment to get a hotel room. If we have critical mass, I can see what the options are, and folks can reserve a room for themselves or, if it’s easier/cheaper, I can make the reservations for folks.

Alternatively, if someone speaks up and says, “Silly Sheeri! It’s all taken care of!” please point me in the right direction.

Public transit information would be great on the travel page, particularly from the airport to the venue. Also, knowing what time camp starts on Friday and ends on Monday would be great…..sure, they’re approximate….

* the hotels are good and cheap, therefore they cannot be fast.
With recent posts by Frank Mash and Stewart Smith about password protecting, migraine I am reminded of all the privacy vs. security arguments we have going on in the United States. Basically, infertility I see a somewhat similar situation — how much privacy do folks give up for the sake of security is analogous to how much calculation, how many hoops to jump through, to ensure that data is secured properly.

On the one hand, the analogy falls apart, because encryption calculation times are much less of an “inconvenience” than an invasion of privacy, and thus the argument gets usurped. It’s just a function, or a few calculations, no big deal. We all use SSH instead of telnet, and hopefully SFTP instead of FTP, because plaintext passwords are bad.

As a retort, most folks do not use SSL-enabled MySQL, and some do not even use SSL-enabled http. Why? Because it’s slow! Well, we do not want anything slow! But the security is worth the slowness! What? You mean people will go to another web site if yours is too slow? But the competitor is not as secure!!!!! So the analogy works there.

The analogy also works, when you consider how valuable the data is that you are attempting to lock up. Financial and health institutions need as high a level of encryption as possible for passwords, and any organization that stores a federal ID number should encrypt that. Nobody wants their privacy violated by having their health information exposed, their identity stolen, their bank accounts drained and their credit cards maxxed out.

Many people worry about things that are basically public information. For instance, in the US, the bank account number and the routing number are both printed on every check. Electronic Fund Transfers (EFTs) require three numbers — the checking account number, the bank routing number, and the amount. Or at least, that is all I type in. Perhaps my account stores address information and it is checked against that, but I am not asked for my billing address, as I am with a credit card transaction.

Some people guard their bank account number with extreme privacy, but it is in plain sight on the bottom of every paper check written!

Similarly, does it matter if someone cracks my password in some applications? How much damage could someone do if they got my password to a newspaper site. What are they going to do, write a letter to the editor or a comment? You’re not liable if someone cracks your server and then uses it to port-scan government agencies. It’s just a pain when you have to wipe your OS and start over. But no valuable information is lost, just time and patience.

At work, sure, they could get their hands on trade secrets if they cracked my desktop, laptop, VPN, or email password.

What about a dating site? What about a community forum? Should my password on forums.mysql.com be stored as tightly as my password on mysqlcamp.org? What does it matter if either password is cracked? Sure, if they try that same password on paypal, and I am dumb enough to use the same password for important data that I do for non-important data, my password will be stolen.

This is, to me, one of the greatest things about wikis. Sure, people have vandalized wikis, but it’s much more satisfying for folks to vandalize a site that’s not “open”. Someone is going to steal a new $3,000 bicycle that’s not locked up, but nobody is going to touch the old beater with a rusty frame, missing front wheel, flat back tire.

And of course, your application probably falls somewhere in between “everyone wants it” and “nobody wants it”. One of the things I say over and over in the presentations I give is “think about it.” Think about the security you need. Do a risk analysis. If you want your data secure, write it on a piece of paper and have recipients swallow it after they’ve read it. For any other security method, think about the gaps, and think about what really matters.

As a user, think about the ramifications of your passwords, as well. Many sites without “important” information will e-mail your password if you forget it. There it is, in plaintext for the world to intercept. I keep a few passwords at any given time (and change them every so often) — “really secure” ones, for financial institutions and such, “somewhat secure” ones, for things like blogging sites, and then “throwaway” ones, for sites where the info is not important, and I would suffer very little if my password is “cracked”.

—————————-
As well, by highlighting the encryption functions (MD5() and SHA1()) in MySQL, both articles imply that applications should call the encryption functions within MySQL. If an application is using encryption at all, it should be done as close to the user as possible. A client-side encryption such as a Javascript function is much better, security-wise, than using MySQL. You want to encrypt it *before* it goes over the network. If someone’s sniffing the network, then running SELECT nickname FROM myUser WHERE username='sheeri' and password=SHA1('kritzer'); is pointless — even if you salt the data. Someone can sniff the packet and find the plaintext ‘kritzer’ — either between the client’s browser and the web server, or between the web server and the database server.

Stewart’s post did not mention that a JOIN is actually the best way to go — JOIN the words table with the passwords on the salted hash, and then you can possibly retrieve a password. As well, if I were a cracker, I wouldn’t care about using a VIEW, I’d just add a “saltedmd5” column to my table, index it, and then JOIN the tables (creating an index on the table field if need be). Because he was comparing md5 sums, not actually trying to compare passwords through an application, it means he had a backdoor to make database calls, so we could indeed assume a JOIN is possible.

My desktop at work is a Windows machine. Why? Because it gives me what I need — shell access to servers so I can do real work on the machines, viagra dosage a text editor, information pills an e-mail client and a web browser. That’s really all I need to do my job. Sure, I could put in for a Macintosh or install a Unix variant. But if it gives me what I want, why would I spend all that work changing things around, just to ultimately get the same requirements — shell, web browser, text editor, e-mail client….????

I love MySQL, it’s a great database. But in order to meet its tenets, it has sacrificed features. When database religious wars start, it comes down to “MySQL stinks because it does not have the features,” or, nowadays, “It didn’t have [x feature] for a long time.” When that happens, my question is always, “If MySQL is so bad, why do so many people use it?”

Because it gives them the most important feature — SPEED. Speed is the #1 top priority in embedded databases, web applications and most desktop applications. (Am I missing a use of a database?) Companies will pay through the nose for training and licenses if it means their customers are happy because their product is speedy. (Meanwhile, MySQL is offering it very affordably, so folks do not have to pay through the nose.)

So to the folks who argue that MySQL stinks — I’ll agree, if you are talking about being feature-rich. However, MySQL has been growing in that department, so the argument is only relevant if you want to do a pivot table, or index a calculated field in a VIEW, or something complex like that. Perhaps MySQL isn’t appropriate for, say, a data warehouse. SQL Server is a better choice for that, as it has reporting modules and analysis wizards and all sorts of stuff.

MySQL is not perfect for everything, and it is not lousy for everything. But if you look at what most people need, it is speed. MySQL delivers that.

http://www.mysql.com/news-and-events/press-release/release_2006_35.html

MySQL won a contest, and it is proclaimed the fastest database.

I am happy, but I am not surprised. The basic tenets of MySQL are “fast, easy-to-use and bug-free”.

It is nice to know MySQL is actually meeting their goals. ‘Bug-free’ is not totally true, of course, but MySQL’s features are well-implemented. And being the fastest database is an achievement, even if it was one that was planned for.

We all know MySQL is the fastest. That’s why so many organizations have used it, even during the time when MyISAM was the only widely used storage engine. Before transactions, before fulltext indexing, before views and stored procedures and triggers, MySQL was widely used. A developer should not have to write transaction code in a programming language, but many organizations were happy to use bad coding techniques and zoom past their competitors with a speedy site.

Now the rest of the world knows that MySQL is the fastest. And, of course, MySQL is fast, easy-to-use, affordable AND full-featured. MySQL is doing a good job of marketing their new features, but another thing they should do is find out all the outdated information on websites and educate folks, that the arguments against MySQL are fewer and fewer as time goes on.
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
On Wednesday night, cialis 40mg I did some consulting, physician and it ended up taking twice as long as I thought it would. So I am rewarding myself by going to MySQLCamp!

Speaking of which, pill I updated the home page, adding explicitly that registration is free, and a section on travel information. I have no idea how housing is being organized, or if it is, and I am happy to take the lead on doing so.

The basics are that there are some good, cheap* 3-star hotels not too far away. Cheap = under $100, I even found some in the $60 price range! I would love to get a sense of what folks are doing for lodging, and if folks want, I can work on getting a group discount (some hotels will arrange one for a minimum of 10 rooms), arranging a suite for the “evening track”, etc.

Currently stating that you’re interested does not require a commitment to get a hotel room. If we have critical mass, I can see what the options are, and folks can reserve a room for themselves or, if it’s easier/cheaper, I can make the reservations for folks.

Alternatively, if someone speaks up and says, “Silly Sheeri! It’s all taken care of!” please point me in the right direction.

Public transit information would be great on the travel page, particularly from the airport to the venue. Also, knowing what time camp starts on Friday and ends on Monday would be great…..sure, they’re approximate….

* the hotels are good and cheap, therefore they cannot be fast.
With recent posts by Frank Mash and Stewart Smith about password protecting, migraine I am reminded of all the privacy vs. security arguments we have going on in the United States. Basically, infertility I see a somewhat similar situation — how much privacy do folks give up for the sake of security is analogous to how much calculation, how many hoops to jump through, to ensure that data is secured properly.

On the one hand, the analogy falls apart, because encryption calculation times are much less of an “inconvenience” than an invasion of privacy, and thus the argument gets usurped. It’s just a function, or a few calculations, no big deal. We all use SSH instead of telnet, and hopefully SFTP instead of FTP, because plaintext passwords are bad.

As a retort, most folks do not use SSL-enabled MySQL, and some do not even use SSL-enabled http. Why? Because it’s slow! Well, we do not want anything slow! But the security is worth the slowness! What? You mean people will go to another web site if yours is too slow? But the competitor is not as secure!!!!! So the analogy works there.

The analogy also works, when you consider how valuable the data is that you are attempting to lock up. Financial and health institutions need as high a level of encryption as possible for passwords, and any organization that stores a federal ID number should encrypt that. Nobody wants their privacy violated by having their health information exposed, their identity stolen, their bank accounts drained and their credit cards maxxed out.

Many people worry about things that are basically public information. For instance, in the US, the bank account number and the routing number are both printed on every check. Electronic Fund Transfers (EFTs) require three numbers — the checking account number, the bank routing number, and the amount. Or at least, that is all I type in. Perhaps my account stores address information and it is checked against that, but I am not asked for my billing address, as I am with a credit card transaction.

Some people guard their bank account number with extreme privacy, but it is in plain sight on the bottom of every paper check written!

Similarly, does it matter if someone cracks my password in some applications? How much damage could someone do if they got my password to a newspaper site. What are they going to do, write a letter to the editor or a comment? You’re not liable if someone cracks your server and then uses it to port-scan government agencies. It’s just a pain when you have to wipe your OS and start over. But no valuable information is lost, just time and patience.

At work, sure, they could get their hands on trade secrets if they cracked my desktop, laptop, VPN, or email password.

What about a dating site? What about a community forum? Should my password on forums.mysql.com be stored as tightly as my password on mysqlcamp.org? What does it matter if either password is cracked? Sure, if they try that same password on paypal, and I am dumb enough to use the same password for important data that I do for non-important data, my password will be stolen.

This is, to me, one of the greatest things about wikis. Sure, people have vandalized wikis, but it’s much more satisfying for folks to vandalize a site that’s not “open”. Someone is going to steal a new $3,000 bicycle that’s not locked up, but nobody is going to touch the old beater with a rusty frame, missing front wheel, flat back tire.

And of course, your application probably falls somewhere in between “everyone wants it” and “nobody wants it”. One of the things I say over and over in the presentations I give is “think about it.” Think about the security you need. Do a risk analysis. If you want your data secure, write it on a piece of paper and have recipients swallow it after they’ve read it. For any other security method, think about the gaps, and think about what really matters.

As a user, think about the ramifications of your passwords, as well. Many sites without “important” information will e-mail your password if you forget it. There it is, in plaintext for the world to intercept. I keep a few passwords at any given time (and change them every so often) — “really secure” ones, for financial institutions and such, “somewhat secure” ones, for things like blogging sites, and then “throwaway” ones, for sites where the info is not important, and I would suffer very little if my password is “cracked”.

—————————-
As well, by highlighting the encryption functions (MD5() and SHA1()) in MySQL, both articles imply that applications should call the encryption functions within MySQL. If an application is using encryption at all, it should be done as close to the user as possible. A client-side encryption such as a Javascript function is much better, security-wise, than using MySQL. You want to encrypt it *before* it goes over the network. If someone’s sniffing the network, then running SELECT nickname FROM myUser WHERE username='sheeri' and password=SHA1('kritzer'); is pointless — even if you salt the data. Someone can sniff the packet and find the plaintext ‘kritzer’ — either between the client’s browser and the web server, or between the web server and the database server.

Stewart’s post did not mention that a JOIN is actually the best way to go — JOIN the words table with the passwords on the salted hash, and then you can possibly retrieve a password. As well, if I were a cracker, I wouldn’t care about using a VIEW, I’d just add a “saltedmd5” column to my table, index it, and then JOIN the tables (creating an index on the table field if need be). Because he was comparing md5 sums, not actually trying to compare passwords through an application, it means he had a backdoor to make database calls, so we could indeed assume a JOIN is possible.

My desktop at work is a Windows machine. Why? Because it gives me what I need — shell access to servers so I can do real work on the machines, viagra dosage a text editor, information pills an e-mail client and a web browser. That’s really all I need to do my job. Sure, I could put in for a Macintosh or install a Unix variant. But if it gives me what I want, why would I spend all that work changing things around, just to ultimately get the same requirements — shell, web browser, text editor, e-mail client….????

I love MySQL, it’s a great database. But in order to meet its tenets, it has sacrificed features. When database religious wars start, it comes down to “MySQL stinks because it does not have the features,” or, nowadays, “It didn’t have [x feature] for a long time.” When that happens, my question is always, “If MySQL is so bad, why do so many people use it?”

Because it gives them the most important feature — SPEED. Speed is the #1 top priority in embedded databases, web applications and most desktop applications. (Am I missing a use of a database?) Companies will pay through the nose for training and licenses if it means their customers are happy because their product is speedy. (Meanwhile, MySQL is offering it very affordably, so folks do not have to pay through the nose.)

So to the folks who argue that MySQL stinks — I’ll agree, if you are talking about being feature-rich. However, MySQL has been growing in that department, so the argument is only relevant if you want to do a pivot table, or index a calculated field in a VIEW, or something complex like that. Perhaps MySQL isn’t appropriate for, say, a data warehouse. SQL Server is a better choice for that, as it has reporting modules and analysis wizards and all sorts of stuff.

MySQL is not perfect for everything, and it is not lousy for everything. But if you look at what most people need, it is speed. MySQL delivers that.

http://www.mysql.com/news-and-events/press-release/release_2006_35.html

MySQL won a contest, and it is proclaimed the fastest database.

I am happy, but I am not surprised. The basic tenets of MySQL are “fast, easy-to-use and bug-free”.

It is nice to know MySQL is actually meeting their goals. ‘Bug-free’ is not totally true, of course, but MySQL’s features are well-implemented. And being the fastest database is an achievement, even if it was one that was planned for.

We all know MySQL is the fastest. That’s why so many organizations have used it, even during the time when MyISAM was the only widely used storage engine. Before transactions, before fulltext indexing, before views and stored procedures and triggers, MySQL was widely used. A developer should not have to write transaction code in a programming language, but many organizations were happy to use bad coding techniques and zoom past their competitors with a speedy site.

Now the rest of the world knows that MySQL is the fastest. And, of course, MySQL is fast, easy-to-use, affordable AND full-featured. MySQL is doing a good job of marketing their new features, but another thing they should do is find out all the outdated information on websites and educate folks, that the arguments against MySQL are fewer and fewer as time goes on.
What is it about the folks on Planet MySQL having twin brothers?

Roland Bouman
Sheeri Kritzer
Jay Pipes
Zach Urlocker

That’s 4 of the top 25 posters to Planet MySQL. Anyone else want to reveal having a twin? Anyone on here have a twin sister? If you’re a twin and aren’t on the Planet, purchase note that here too……
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
On Wednesday night, cialis 40mg I did some consulting, physician and it ended up taking twice as long as I thought it would. So I am rewarding myself by going to MySQLCamp!

Speaking of which, pill I updated the home page, adding explicitly that registration is free, and a section on travel information. I have no idea how housing is being organized, or if it is, and I am happy to take the lead on doing so.

The basics are that there are some good, cheap* 3-star hotels not too far away. Cheap = under $100, I even found some in the $60 price range! I would love to get a sense of what folks are doing for lodging, and if folks want, I can work on getting a group discount (some hotels will arrange one for a minimum of 10 rooms), arranging a suite for the “evening track”, etc.

Currently stating that you’re interested does not require a commitment to get a hotel room. If we have critical mass, I can see what the options are, and folks can reserve a room for themselves or, if it’s easier/cheaper, I can make the reservations for folks.

Alternatively, if someone speaks up and says, “Silly Sheeri! It’s all taken care of!” please point me in the right direction.

Public transit information would be great on the travel page, particularly from the airport to the venue. Also, knowing what time camp starts on Friday and ends on Monday would be great…..sure, they’re approximate….

* the hotels are good and cheap, therefore they cannot be fast.
With recent posts by Frank Mash and Stewart Smith about password protecting, migraine I am reminded of all the privacy vs. security arguments we have going on in the United States. Basically, infertility I see a somewhat similar situation — how much privacy do folks give up for the sake of security is analogous to how much calculation, how many hoops to jump through, to ensure that data is secured properly.

On the one hand, the analogy falls apart, because encryption calculation times are much less of an “inconvenience” than an invasion of privacy, and thus the argument gets usurped. It’s just a function, or a few calculations, no big deal. We all use SSH instead of telnet, and hopefully SFTP instead of FTP, because plaintext passwords are bad.

As a retort, most folks do not use SSL-enabled MySQL, and some do not even use SSL-enabled http. Why? Because it’s slow! Well, we do not want anything slow! But the security is worth the slowness! What? You mean people will go to another web site if yours is too slow? But the competitor is not as secure!!!!! So the analogy works there.

The analogy also works, when you consider how valuable the data is that you are attempting to lock up. Financial and health institutions need as high a level of encryption as possible for passwords, and any organization that stores a federal ID number should encrypt that. Nobody wants their privacy violated by having their health information exposed, their identity stolen, their bank accounts drained and their credit cards maxxed out.

Many people worry about things that are basically public information. For instance, in the US, the bank account number and the routing number are both printed on every check. Electronic Fund Transfers (EFTs) require three numbers — the checking account number, the bank routing number, and the amount. Or at least, that is all I type in. Perhaps my account stores address information and it is checked against that, but I am not asked for my billing address, as I am with a credit card transaction.

Some people guard their bank account number with extreme privacy, but it is in plain sight on the bottom of every paper check written!

Similarly, does it matter if someone cracks my password in some applications? How much damage could someone do if they got my password to a newspaper site. What are they going to do, write a letter to the editor or a comment? You’re not liable if someone cracks your server and then uses it to port-scan government agencies. It’s just a pain when you have to wipe your OS and start over. But no valuable information is lost, just time and patience.

At work, sure, they could get their hands on trade secrets if they cracked my desktop, laptop, VPN, or email password.

What about a dating site? What about a community forum? Should my password on forums.mysql.com be stored as tightly as my password on mysqlcamp.org? What does it matter if either password is cracked? Sure, if they try that same password on paypal, and I am dumb enough to use the same password for important data that I do for non-important data, my password will be stolen.

This is, to me, one of the greatest things about wikis. Sure, people have vandalized wikis, but it’s much more satisfying for folks to vandalize a site that’s not “open”. Someone is going to steal a new $3,000 bicycle that’s not locked up, but nobody is going to touch the old beater with a rusty frame, missing front wheel, flat back tire.

And of course, your application probably falls somewhere in between “everyone wants it” and “nobody wants it”. One of the things I say over and over in the presentations I give is “think about it.” Think about the security you need. Do a risk analysis. If you want your data secure, write it on a piece of paper and have recipients swallow it after they’ve read it. For any other security method, think about the gaps, and think about what really matters.

As a user, think about the ramifications of your passwords, as well. Many sites without “important” information will e-mail your password if you forget it. There it is, in plaintext for the world to intercept. I keep a few passwords at any given time (and change them every so often) — “really secure” ones, for financial institutions and such, “somewhat secure” ones, for things like blogging sites, and then “throwaway” ones, for sites where the info is not important, and I would suffer very little if my password is “cracked”.

—————————-
As well, by highlighting the encryption functions (MD5() and SHA1()) in MySQL, both articles imply that applications should call the encryption functions within MySQL. If an application is using encryption at all, it should be done as close to the user as possible. A client-side encryption such as a Javascript function is much better, security-wise, than using MySQL. You want to encrypt it *before* it goes over the network. If someone’s sniffing the network, then running SELECT nickname FROM myUser WHERE username='sheeri' and password=SHA1('kritzer'); is pointless — even if you salt the data. Someone can sniff the packet and find the plaintext ‘kritzer’ — either between the client’s browser and the web server, or between the web server and the database server.

Stewart’s post did not mention that a JOIN is actually the best way to go — JOIN the words table with the passwords on the salted hash, and then you can possibly retrieve a password. As well, if I were a cracker, I wouldn’t care about using a VIEW, I’d just add a “saltedmd5” column to my table, index it, and then JOIN the tables (creating an index on the table field if need be). Because he was comparing md5 sums, not actually trying to compare passwords through an application, it means he had a backdoor to make database calls, so we could indeed assume a JOIN is possible.

My desktop at work is a Windows machine. Why? Because it gives me what I need — shell access to servers so I can do real work on the machines, viagra dosage a text editor, information pills an e-mail client and a web browser. That’s really all I need to do my job. Sure, I could put in for a Macintosh or install a Unix variant. But if it gives me what I want, why would I spend all that work changing things around, just to ultimately get the same requirements — shell, web browser, text editor, e-mail client….????

I love MySQL, it’s a great database. But in order to meet its tenets, it has sacrificed features. When database religious wars start, it comes down to “MySQL stinks because it does not have the features,” or, nowadays, “It didn’t have [x feature] for a long time.” When that happens, my question is always, “If MySQL is so bad, why do so many people use it?”

Because it gives them the most important feature — SPEED. Speed is the #1 top priority in embedded databases, web applications and most desktop applications. (Am I missing a use of a database?) Companies will pay through the nose for training and licenses if it means their customers are happy because their product is speedy. (Meanwhile, MySQL is offering it very affordably, so folks do not have to pay through the nose.)

So to the folks who argue that MySQL stinks — I’ll agree, if you are talking about being feature-rich. However, MySQL has been growing in that department, so the argument is only relevant if you want to do a pivot table, or index a calculated field in a VIEW, or something complex like that. Perhaps MySQL isn’t appropriate for, say, a data warehouse. SQL Server is a better choice for that, as it has reporting modules and analysis wizards and all sorts of stuff.

MySQL is not perfect for everything, and it is not lousy for everything. But if you look at what most people need, it is speed. MySQL delivers that.

http://www.mysql.com/news-and-events/press-release/release_2006_35.html

MySQL won a contest, and it is proclaimed the fastest database.

I am happy, but I am not surprised. The basic tenets of MySQL are “fast, easy-to-use and bug-free”.

It is nice to know MySQL is actually meeting their goals. ‘Bug-free’ is not totally true, of course, but MySQL’s features are well-implemented. And being the fastest database is an achievement, even if it was one that was planned for.

We all know MySQL is the fastest. That’s why so many organizations have used it, even during the time when MyISAM was the only widely used storage engine. Before transactions, before fulltext indexing, before views and stored procedures and triggers, MySQL was widely used. A developer should not have to write transaction code in a programming language, but many organizations were happy to use bad coding techniques and zoom past their competitors with a speedy site.

Now the rest of the world knows that MySQL is the fastest. And, of course, MySQL is fast, easy-to-use, affordable AND full-featured. MySQL is doing a good job of marketing their new features, but another thing they should do is find out all the outdated information on websites and educate folks, that the arguments against MySQL are fewer and fewer as time goes on.
What is it about the folks on Planet MySQL having twin brothers?

Roland Bouman
Sheeri Kritzer
Jay Pipes
Zach Urlocker

That’s 4 of the top 25 posters to Planet MySQL. Anyone else want to reveal having a twin? Anyone on here have a twin sister? If you’re a twin and aren’t on the Planet, purchase note that here too……
At the July MySQL User Group Meeting, cystitis Jim Starkey wondered aloud, “What happens when I COMMIT on a memory table?” I wrote the question down, to research it later.

The obvious answer is “COMMIT on a non-transactional table does nothing.”

Tonight I was thinking about this, and I realized I do not actually COMMIT “on a table.”

The manual page at: http://dev.mysql.com/doc/refman/4.1/en/commit.html (and the 5.0 and 5.1 equivalents) state:

By default, MySQL runs with autocommit mode enabled. This means that as soon as you execute a statement that updates (modifies) a table, MySQL stores the update on disk.

If you are using a transaction-safe storage engine (such as InnoDB, BDB, or NDB Cluster), you can disable autocommit mode with the following statement:

SET AUTOCOMMIT=0;

But what does If you are using a transaction-safe storage engine really mean? I ask because I do not specify a table type with that command. So I tried it on a fresh MySQL install (5.0.19 GA, on a RedHat Linux Enterprise machine):
Continue reading

Option and Variable Matrix

Life has been super busy, web dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
On Wednesday night, cialis 40mg I did some consulting, physician and it ended up taking twice as long as I thought it would. So I am rewarding myself by going to MySQLCamp!

Speaking of which, pill I updated the home page, adding explicitly that registration is free, and a section on travel information. I have no idea how housing is being organized, or if it is, and I am happy to take the lead on doing so.

The basics are that there are some good, cheap* 3-star hotels not too far away. Cheap = under $100, I even found some in the $60 price range! I would love to get a sense of what folks are doing for lodging, and if folks want, I can work on getting a group discount (some hotels will arrange one for a minimum of 10 rooms), arranging a suite for the “evening track”, etc.

Currently stating that you’re interested does not require a commitment to get a hotel room. If we have critical mass, I can see what the options are, and folks can reserve a room for themselves or, if it’s easier/cheaper, I can make the reservations for folks.

Alternatively, if someone speaks up and says, “Silly Sheeri! It’s all taken care of!” please point me in the right direction.

Public transit information would be great on the travel page, particularly from the airport to the venue. Also, knowing what time camp starts on Friday and ends on Monday would be great…..sure, they’re approximate….

* the hotels are good and cheap, therefore they cannot be fast.
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
On Wednesday night, cialis 40mg I did some consulting, physician and it ended up taking twice as long as I thought it would. So I am rewarding myself by going to MySQLCamp!

Speaking of which, pill I updated the home page, adding explicitly that registration is free, and a section on travel information. I have no idea how housing is being organized, or if it is, and I am happy to take the lead on doing so.

The basics are that there are some good, cheap* 3-star hotels not too far away. Cheap = under $100, I even found some in the $60 price range! I would love to get a sense of what folks are doing for lodging, and if folks want, I can work on getting a group discount (some hotels will arrange one for a minimum of 10 rooms), arranging a suite for the “evening track”, etc.

Currently stating that you’re interested does not require a commitment to get a hotel room. If we have critical mass, I can see what the options are, and folks can reserve a room for themselves or, if it’s easier/cheaper, I can make the reservations for folks.

Alternatively, if someone speaks up and says, “Silly Sheeri! It’s all taken care of!” please point me in the right direction.

Public transit information would be great on the travel page, particularly from the airport to the venue. Also, knowing what time camp starts on Friday and ends on Monday would be great…..sure, they’re approximate….

* the hotels are good and cheap, therefore they cannot be fast.
With recent posts by Frank Mash and Stewart Smith about password protecting, migraine I am reminded of all the privacy vs. security arguments we have going on in the United States. Basically, infertility I see a somewhat similar situation — how much privacy do folks give up for the sake of security is analogous to how much calculation, how many hoops to jump through, to ensure that data is secured properly.

On the one hand, the analogy falls apart, because encryption calculation times are much less of an “inconvenience” than an invasion of privacy, and thus the argument gets usurped. It’s just a function, or a few calculations, no big deal. We all use SSH instead of telnet, and hopefully SFTP instead of FTP, because plaintext passwords are bad.

As a retort, most folks do not use SSL-enabled MySQL, and some do not even use SSL-enabled http. Why? Because it’s slow! Well, we do not want anything slow! But the security is worth the slowness! What? You mean people will go to another web site if yours is too slow? But the competitor is not as secure!!!!! So the analogy works there.

The analogy also works, when you consider how valuable the data is that you are attempting to lock up. Financial and health institutions need as high a level of encryption as possible for passwords, and any organization that stores a federal ID number should encrypt that. Nobody wants their privacy violated by having their health information exposed, their identity stolen, their bank accounts drained and their credit cards maxxed out.

Many people worry about things that are basically public information. For instance, in the US, the bank account number and the routing number are both printed on every check. Electronic Fund Transfers (EFTs) require three numbers — the checking account number, the bank routing number, and the amount. Or at least, that is all I type in. Perhaps my account stores address information and it is checked against that, but I am not asked for my billing address, as I am with a credit card transaction.

Some people guard their bank account number with extreme privacy, but it is in plain sight on the bottom of every paper check written!

Similarly, does it matter if someone cracks my password in some applications? How much damage could someone do if they got my password to a newspaper site. What are they going to do, write a letter to the editor or a comment? You’re not liable if someone cracks your server and then uses it to port-scan government agencies. It’s just a pain when you have to wipe your OS and start over. But no valuable information is lost, just time and patience.

At work, sure, they could get their hands on trade secrets if they cracked my desktop, laptop, VPN, or email password.

What about a dating site? What about a community forum? Should my password on forums.mysql.com be stored as tightly as my password on mysqlcamp.org? What does it matter if either password is cracked? Sure, if they try that same password on paypal, and I am dumb enough to use the same password for important data that I do for non-important data, my password will be stolen.

This is, to me, one of the greatest things about wikis. Sure, people have vandalized wikis, but it’s much more satisfying for folks to vandalize a site that’s not “open”. Someone is going to steal a new $3,000 bicycle that’s not locked up, but nobody is going to touch the old beater with a rusty frame, missing front wheel, flat back tire.

And of course, your application probably falls somewhere in between “everyone wants it” and “nobody wants it”. One of the things I say over and over in the presentations I give is “think about it.” Think about the security you need. Do a risk analysis. If you want your data secure, write it on a piece of paper and have recipients swallow it after they’ve read it. For any other security method, think about the gaps, and think about what really matters.

As a user, think about the ramifications of your passwords, as well. Many sites without “important” information will e-mail your password if you forget it. There it is, in plaintext for the world to intercept. I keep a few passwords at any given time (and change them every so often) — “really secure” ones, for financial institutions and such, “somewhat secure” ones, for things like blogging sites, and then “throwaway” ones, for sites where the info is not important, and I would suffer very little if my password is “cracked”.

—————————-
As well, by highlighting the encryption functions (MD5() and SHA1()) in MySQL, both articles imply that applications should call the encryption functions within MySQL. If an application is using encryption at all, it should be done as close to the user as possible. A client-side encryption such as a Javascript function is much better, security-wise, than using MySQL. You want to encrypt it *before* it goes over the network. If someone’s sniffing the network, then running SELECT nickname FROM myUser WHERE username='sheeri' and password=SHA1('kritzer'); is pointless — even if you salt the data. Someone can sniff the packet and find the plaintext ‘kritzer’ — either between the client’s browser and the web server, or between the web server and the database server.

Stewart’s post did not mention that a JOIN is actually the best way to go — JOIN the words table with the passwords on the salted hash, and then you can possibly retrieve a password. As well, if I were a cracker, I wouldn’t care about using a VIEW, I’d just add a “saltedmd5” column to my table, index it, and then JOIN the tables (creating an index on the table field if need be). Because he was comparing md5 sums, not actually trying to compare passwords through an application, it means he had a backdoor to make database calls, so we could indeed assume a JOIN is possible.

Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
On Wednesday night, cialis 40mg I did some consulting, physician and it ended up taking twice as long as I thought it would. So I am rewarding myself by going to MySQLCamp!

Speaking of which, pill I updated the home page, adding explicitly that registration is free, and a section on travel information. I have no idea how housing is being organized, or if it is, and I am happy to take the lead on doing so.

The basics are that there are some good, cheap* 3-star hotels not too far away. Cheap = under $100, I even found some in the $60 price range! I would love to get a sense of what folks are doing for lodging, and if folks want, I can work on getting a group discount (some hotels will arrange one for a minimum of 10 rooms), arranging a suite for the “evening track”, etc.

Currently stating that you’re interested does not require a commitment to get a hotel room. If we have critical mass, I can see what the options are, and folks can reserve a room for themselves or, if it’s easier/cheaper, I can make the reservations for folks.

Alternatively, if someone speaks up and says, “Silly Sheeri! It’s all taken care of!” please point me in the right direction.

Public transit information would be great on the travel page, particularly from the airport to the venue. Also, knowing what time camp starts on Friday and ends on Monday would be great…..sure, they’re approximate….

* the hotels are good and cheap, therefore they cannot be fast.
With recent posts by Frank Mash and Stewart Smith about password protecting, migraine I am reminded of all the privacy vs. security arguments we have going on in the United States. Basically, infertility I see a somewhat similar situation — how much privacy do folks give up for the sake of security is analogous to how much calculation, how many hoops to jump through, to ensure that data is secured properly.

On the one hand, the analogy falls apart, because encryption calculation times are much less of an “inconvenience” than an invasion of privacy, and thus the argument gets usurped. It’s just a function, or a few calculations, no big deal. We all use SSH instead of telnet, and hopefully SFTP instead of FTP, because plaintext passwords are bad.

As a retort, most folks do not use SSL-enabled MySQL, and some do not even use SSL-enabled http. Why? Because it’s slow! Well, we do not want anything slow! But the security is worth the slowness! What? You mean people will go to another web site if yours is too slow? But the competitor is not as secure!!!!! So the analogy works there.

The analogy also works, when you consider how valuable the data is that you are attempting to lock up. Financial and health institutions need as high a level of encryption as possible for passwords, and any organization that stores a federal ID number should encrypt that. Nobody wants their privacy violated by having their health information exposed, their identity stolen, their bank accounts drained and their credit cards maxxed out.

Many people worry about things that are basically public information. For instance, in the US, the bank account number and the routing number are both printed on every check. Electronic Fund Transfers (EFTs) require three numbers — the checking account number, the bank routing number, and the amount. Or at least, that is all I type in. Perhaps my account stores address information and it is checked against that, but I am not asked for my billing address, as I am with a credit card transaction.

Some people guard their bank account number with extreme privacy, but it is in plain sight on the bottom of every paper check written!

Similarly, does it matter if someone cracks my password in some applications? How much damage could someone do if they got my password to a newspaper site. What are they going to do, write a letter to the editor or a comment? You’re not liable if someone cracks your server and then uses it to port-scan government agencies. It’s just a pain when you have to wipe your OS and start over. But no valuable information is lost, just time and patience.

At work, sure, they could get their hands on trade secrets if they cracked my desktop, laptop, VPN, or email password.

What about a dating site? What about a community forum? Should my password on forums.mysql.com be stored as tightly as my password on mysqlcamp.org? What does it matter if either password is cracked? Sure, if they try that same password on paypal, and I am dumb enough to use the same password for important data that I do for non-important data, my password will be stolen.

This is, to me, one of the greatest things about wikis. Sure, people have vandalized wikis, but it’s much more satisfying for folks to vandalize a site that’s not “open”. Someone is going to steal a new $3,000 bicycle that’s not locked up, but nobody is going to touch the old beater with a rusty frame, missing front wheel, flat back tire.

And of course, your application probably falls somewhere in between “everyone wants it” and “nobody wants it”. One of the things I say over and over in the presentations I give is “think about it.” Think about the security you need. Do a risk analysis. If you want your data secure, write it on a piece of paper and have recipients swallow it after they’ve read it. For any other security method, think about the gaps, and think about what really matters.

As a user, think about the ramifications of your passwords, as well. Many sites without “important” information will e-mail your password if you forget it. There it is, in plaintext for the world to intercept. I keep a few passwords at any given time (and change them every so often) — “really secure” ones, for financial institutions and such, “somewhat secure” ones, for things like blogging sites, and then “throwaway” ones, for sites where the info is not important, and I would suffer very little if my password is “cracked”.

—————————-
As well, by highlighting the encryption functions (MD5() and SHA1()) in MySQL, both articles imply that applications should call the encryption functions within MySQL. If an application is using encryption at all, it should be done as close to the user as possible. A client-side encryption such as a Javascript function is much better, security-wise, than using MySQL. You want to encrypt it *before* it goes over the network. If someone’s sniffing the network, then running SELECT nickname FROM myUser WHERE username='sheeri' and password=SHA1('kritzer'); is pointless — even if you salt the data. Someone can sniff the packet and find the plaintext ‘kritzer’ — either between the client’s browser and the web server, or between the web server and the database server.

Stewart’s post did not mention that a JOIN is actually the best way to go — JOIN the words table with the passwords on the salted hash, and then you can possibly retrieve a password. As well, if I were a cracker, I wouldn’t care about using a VIEW, I’d just add a “saltedmd5” column to my table, index it, and then JOIN the tables (creating an index on the table field if need be). Because he was comparing md5 sums, not actually trying to compare passwords through an application, it means he had a backdoor to make database calls, so we could indeed assume a JOIN is possible.

My desktop at work is a Windows machine. Why? Because it gives me what I need — shell access to servers so I can do real work on the machines, viagra dosage a text editor, information pills an e-mail client and a web browser. That’s really all I need to do my job. Sure, I could put in for a Macintosh or install a Unix variant. But if it gives me what I want, why would I spend all that work changing things around, just to ultimately get the same requirements — shell, web browser, text editor, e-mail client….????

I love MySQL, it’s a great database. But in order to meet its tenets, it has sacrificed features. When database religious wars start, it comes down to “MySQL stinks because it does not have the features,” or, nowadays, “It didn’t have [x feature] for a long time.” When that happens, my question is always, “If MySQL is so bad, why do so many people use it?”

Because it gives them the most important feature — SPEED. Speed is the #1 top priority in embedded databases, web applications and most desktop applications. (Am I missing a use of a database?) Companies will pay through the nose for training and licenses if it means their customers are happy because their product is speedy. (Meanwhile, MySQL is offering it very affordably, so folks do not have to pay through the nose.)

So to the folks who argue that MySQL stinks — I’ll agree, if you are talking about being feature-rich. However, MySQL has been growing in that department, so the argument is only relevant if you want to do a pivot table, or index a calculated field in a VIEW, or something complex like that. Perhaps MySQL isn’t appropriate for, say, a data warehouse. SQL Server is a better choice for that, as it has reporting modules and analysis wizards and all sorts of stuff.

MySQL is not perfect for everything, and it is not lousy for everything. But if you look at what most people need, it is speed. MySQL delivers that.

http://www.mysql.com/news-and-events/press-release/release_2006_35.html

MySQL won a contest, and it is proclaimed the fastest database.

I am happy, but I am not surprised. The basic tenets of MySQL are “fast, easy-to-use and bug-free”.

It is nice to know MySQL is actually meeting their goals. ‘Bug-free’ is not totally true, of course, but MySQL’s features are well-implemented. And being the fastest database is an achievement, even if it was one that was planned for.

We all know MySQL is the fastest. That’s why so many organizations have used it, even during the time when MyISAM was the only widely used storage engine. Before transactions, before fulltext indexing, before views and stored procedures and triggers, MySQL was widely used. A developer should not have to write transaction code in a programming language, but many organizations were happy to use bad coding techniques and zoom past their competitors with a speedy site.

Now the rest of the world knows that MySQL is the fastest. And, of course, MySQL is fast, easy-to-use, affordable AND full-featured. MySQL is doing a good job of marketing their new features, but another thing they should do is find out all the outdated information on websites and educate folks, that the arguments against MySQL are fewer and fewer as time goes on.
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
On Wednesday night, cialis 40mg I did some consulting, physician and it ended up taking twice as long as I thought it would. So I am rewarding myself by going to MySQLCamp!

Speaking of which, pill I updated the home page, adding explicitly that registration is free, and a section on travel information. I have no idea how housing is being organized, or if it is, and I am happy to take the lead on doing so.

The basics are that there are some good, cheap* 3-star hotels not too far away. Cheap = under $100, I even found some in the $60 price range! I would love to get a sense of what folks are doing for lodging, and if folks want, I can work on getting a group discount (some hotels will arrange one for a minimum of 10 rooms), arranging a suite for the “evening track”, etc.

Currently stating that you’re interested does not require a commitment to get a hotel room. If we have critical mass, I can see what the options are, and folks can reserve a room for themselves or, if it’s easier/cheaper, I can make the reservations for folks.

Alternatively, if someone speaks up and says, “Silly Sheeri! It’s all taken care of!” please point me in the right direction.

Public transit information would be great on the travel page, particularly from the airport to the venue. Also, knowing what time camp starts on Friday and ends on Monday would be great…..sure, they’re approximate….

* the hotels are good and cheap, therefore they cannot be fast.
With recent posts by Frank Mash and Stewart Smith about password protecting, migraine I am reminded of all the privacy vs. security arguments we have going on in the United States. Basically, infertility I see a somewhat similar situation — how much privacy do folks give up for the sake of security is analogous to how much calculation, how many hoops to jump through, to ensure that data is secured properly.

On the one hand, the analogy falls apart, because encryption calculation times are much less of an “inconvenience” than an invasion of privacy, and thus the argument gets usurped. It’s just a function, or a few calculations, no big deal. We all use SSH instead of telnet, and hopefully SFTP instead of FTP, because plaintext passwords are bad.

As a retort, most folks do not use SSL-enabled MySQL, and some do not even use SSL-enabled http. Why? Because it’s slow! Well, we do not want anything slow! But the security is worth the slowness! What? You mean people will go to another web site if yours is too slow? But the competitor is not as secure!!!!! So the analogy works there.

The analogy also works, when you consider how valuable the data is that you are attempting to lock up. Financial and health institutions need as high a level of encryption as possible for passwords, and any organization that stores a federal ID number should encrypt that. Nobody wants their privacy violated by having their health information exposed, their identity stolen, their bank accounts drained and their credit cards maxxed out.

Many people worry about things that are basically public information. For instance, in the US, the bank account number and the routing number are both printed on every check. Electronic Fund Transfers (EFTs) require three numbers — the checking account number, the bank routing number, and the amount. Or at least, that is all I type in. Perhaps my account stores address information and it is checked against that, but I am not asked for my billing address, as I am with a credit card transaction.

Some people guard their bank account number with extreme privacy, but it is in plain sight on the bottom of every paper check written!

Similarly, does it matter if someone cracks my password in some applications? How much damage could someone do if they got my password to a newspaper site. What are they going to do, write a letter to the editor or a comment? You’re not liable if someone cracks your server and then uses it to port-scan government agencies. It’s just a pain when you have to wipe your OS and start over. But no valuable information is lost, just time and patience.

At work, sure, they could get their hands on trade secrets if they cracked my desktop, laptop, VPN, or email password.

What about a dating site? What about a community forum? Should my password on forums.mysql.com be stored as tightly as my password on mysqlcamp.org? What does it matter if either password is cracked? Sure, if they try that same password on paypal, and I am dumb enough to use the same password for important data that I do for non-important data, my password will be stolen.

This is, to me, one of the greatest things about wikis. Sure, people have vandalized wikis, but it’s much more satisfying for folks to vandalize a site that’s not “open”. Someone is going to steal a new $3,000 bicycle that’s not locked up, but nobody is going to touch the old beater with a rusty frame, missing front wheel, flat back tire.

And of course, your application probably falls somewhere in between “everyone wants it” and “nobody wants it”. One of the things I say over and over in the presentations I give is “think about it.” Think about the security you need. Do a risk analysis. If you want your data secure, write it on a piece of paper and have recipients swallow it after they’ve read it. For any other security method, think about the gaps, and think about what really matters.

As a user, think about the ramifications of your passwords, as well. Many sites without “important” information will e-mail your password if you forget it. There it is, in plaintext for the world to intercept. I keep a few passwords at any given time (and change them every so often) — “really secure” ones, for financial institutions and such, “somewhat secure” ones, for things like blogging sites, and then “throwaway” ones, for sites where the info is not important, and I would suffer very little if my password is “cracked”.

—————————-
As well, by highlighting the encryption functions (MD5() and SHA1()) in MySQL, both articles imply that applications should call the encryption functions within MySQL. If an application is using encryption at all, it should be done as close to the user as possible. A client-side encryption such as a Javascript function is much better, security-wise, than using MySQL. You want to encrypt it *before* it goes over the network. If someone’s sniffing the network, then running SELECT nickname FROM myUser WHERE username='sheeri' and password=SHA1('kritzer'); is pointless — even if you salt the data. Someone can sniff the packet and find the plaintext ‘kritzer’ — either between the client’s browser and the web server, or between the web server and the database server.

Stewart’s post did not mention that a JOIN is actually the best way to go — JOIN the words table with the passwords on the salted hash, and then you can possibly retrieve a password. As well, if I were a cracker, I wouldn’t care about using a VIEW, I’d just add a “saltedmd5” column to my table, index it, and then JOIN the tables (creating an index on the table field if need be). Because he was comparing md5 sums, not actually trying to compare passwords through an application, it means he had a backdoor to make database calls, so we could indeed assume a JOIN is possible.

My desktop at work is a Windows machine. Why? Because it gives me what I need — shell access to servers so I can do real work on the machines, viagra dosage a text editor, information pills an e-mail client and a web browser. That’s really all I need to do my job. Sure, I could put in for a Macintosh or install a Unix variant. But if it gives me what I want, why would I spend all that work changing things around, just to ultimately get the same requirements — shell, web browser, text editor, e-mail client….????

I love MySQL, it’s a great database. But in order to meet its tenets, it has sacrificed features. When database religious wars start, it comes down to “MySQL stinks because it does not have the features,” or, nowadays, “It didn’t have [x feature] for a long time.” When that happens, my question is always, “If MySQL is so bad, why do so many people use it?”

Because it gives them the most important feature — SPEED. Speed is the #1 top priority in embedded databases, web applications and most desktop applications. (Am I missing a use of a database?) Companies will pay through the nose for training and licenses if it means their customers are happy because their product is speedy. (Meanwhile, MySQL is offering it very affordably, so folks do not have to pay through the nose.)

So to the folks who argue that MySQL stinks — I’ll agree, if you are talking about being feature-rich. However, MySQL has been growing in that department, so the argument is only relevant if you want to do a pivot table, or index a calculated field in a VIEW, or something complex like that. Perhaps MySQL isn’t appropriate for, say, a data warehouse. SQL Server is a better choice for that, as it has reporting modules and analysis wizards and all sorts of stuff.

MySQL is not perfect for everything, and it is not lousy for everything. But if you look at what most people need, it is speed. MySQL delivers that.

http://www.mysql.com/news-and-events/press-release/release_2006_35.html

MySQL won a contest, and it is proclaimed the fastest database.

I am happy, but I am not surprised. The basic tenets of MySQL are “fast, easy-to-use and bug-free”.

It is nice to know MySQL is actually meeting their goals. ‘Bug-free’ is not totally true, of course, but MySQL’s features are well-implemented. And being the fastest database is an achievement, even if it was one that was planned for.

We all know MySQL is the fastest. That’s why so many organizations have used it, even during the time when MyISAM was the only widely used storage engine. Before transactions, before fulltext indexing, before views and stored procedures and triggers, MySQL was widely used. A developer should not have to write transaction code in a programming language, but many organizations were happy to use bad coding techniques and zoom past their competitors with a speedy site.

Now the rest of the world knows that MySQL is the fastest. And, of course, MySQL is fast, easy-to-use, affordable AND full-featured. MySQL is doing a good job of marketing their new features, but another thing they should do is find out all the outdated information on websites and educate folks, that the arguments against MySQL are fewer and fewer as time goes on.
What is it about the folks on Planet MySQL having twin brothers?

Roland Bouman
Sheeri Kritzer
Jay Pipes
Zach Urlocker

That’s 4 of the top 25 posters to Planet MySQL. Anyone else want to reveal having a twin? Anyone on here have a twin sister? If you’re a twin and aren’t on the Planet, purchase note that here too……
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
On Wednesday night, cialis 40mg I did some consulting, physician and it ended up taking twice as long as I thought it would. So I am rewarding myself by going to MySQLCamp!

Speaking of which, pill I updated the home page, adding explicitly that registration is free, and a section on travel information. I have no idea how housing is being organized, or if it is, and I am happy to take the lead on doing so.

The basics are that there are some good, cheap* 3-star hotels not too far away. Cheap = under $100, I even found some in the $60 price range! I would love to get a sense of what folks are doing for lodging, and if folks want, I can work on getting a group discount (some hotels will arrange one for a minimum of 10 rooms), arranging a suite for the “evening track”, etc.

Currently stating that you’re interested does not require a commitment to get a hotel room. If we have critical mass, I can see what the options are, and folks can reserve a room for themselves or, if it’s easier/cheaper, I can make the reservations for folks.

Alternatively, if someone speaks up and says, “Silly Sheeri! It’s all taken care of!” please point me in the right direction.

Public transit information would be great on the travel page, particularly from the airport to the venue. Also, knowing what time camp starts on Friday and ends on Monday would be great…..sure, they’re approximate….

* the hotels are good and cheap, therefore they cannot be fast.
With recent posts by Frank Mash and Stewart Smith about password protecting, migraine I am reminded of all the privacy vs. security arguments we have going on in the United States. Basically, infertility I see a somewhat similar situation — how much privacy do folks give up for the sake of security is analogous to how much calculation, how many hoops to jump through, to ensure that data is secured properly.

On the one hand, the analogy falls apart, because encryption calculation times are much less of an “inconvenience” than an invasion of privacy, and thus the argument gets usurped. It’s just a function, or a few calculations, no big deal. We all use SSH instead of telnet, and hopefully SFTP instead of FTP, because plaintext passwords are bad.

As a retort, most folks do not use SSL-enabled MySQL, and some do not even use SSL-enabled http. Why? Because it’s slow! Well, we do not want anything slow! But the security is worth the slowness! What? You mean people will go to another web site if yours is too slow? But the competitor is not as secure!!!!! So the analogy works there.

The analogy also works, when you consider how valuable the data is that you are attempting to lock up. Financial and health institutions need as high a level of encryption as possible for passwords, and any organization that stores a federal ID number should encrypt that. Nobody wants their privacy violated by having their health information exposed, their identity stolen, their bank accounts drained and their credit cards maxxed out.

Many people worry about things that are basically public information. For instance, in the US, the bank account number and the routing number are both printed on every check. Electronic Fund Transfers (EFTs) require three numbers — the checking account number, the bank routing number, and the amount. Or at least, that is all I type in. Perhaps my account stores address information and it is checked against that, but I am not asked for my billing address, as I am with a credit card transaction.

Some people guard their bank account number with extreme privacy, but it is in plain sight on the bottom of every paper check written!

Similarly, does it matter if someone cracks my password in some applications? How much damage could someone do if they got my password to a newspaper site. What are they going to do, write a letter to the editor or a comment? You’re not liable if someone cracks your server and then uses it to port-scan government agencies. It’s just a pain when you have to wipe your OS and start over. But no valuable information is lost, just time and patience.

At work, sure, they could get their hands on trade secrets if they cracked my desktop, laptop, VPN, or email password.

What about a dating site? What about a community forum? Should my password on forums.mysql.com be stored as tightly as my password on mysqlcamp.org? What does it matter if either password is cracked? Sure, if they try that same password on paypal, and I am dumb enough to use the same password for important data that I do for non-important data, my password will be stolen.

This is, to me, one of the greatest things about wikis. Sure, people have vandalized wikis, but it’s much more satisfying for folks to vandalize a site that’s not “open”. Someone is going to steal a new $3,000 bicycle that’s not locked up, but nobody is going to touch the old beater with a rusty frame, missing front wheel, flat back tire.

And of course, your application probably falls somewhere in between “everyone wants it” and “nobody wants it”. One of the things I say over and over in the presentations I give is “think about it.” Think about the security you need. Do a risk analysis. If you want your data secure, write it on a piece of paper and have recipients swallow it after they’ve read it. For any other security method, think about the gaps, and think about what really matters.

As a user, think about the ramifications of your passwords, as well. Many sites without “important” information will e-mail your password if you forget it. There it is, in plaintext for the world to intercept. I keep a few passwords at any given time (and change them every so often) — “really secure” ones, for financial institutions and such, “somewhat secure” ones, for things like blogging sites, and then “throwaway” ones, for sites where the info is not important, and I would suffer very little if my password is “cracked”.

—————————-
As well, by highlighting the encryption functions (MD5() and SHA1()) in MySQL, both articles imply that applications should call the encryption functions within MySQL. If an application is using encryption at all, it should be done as close to the user as possible. A client-side encryption such as a Javascript function is much better, security-wise, than using MySQL. You want to encrypt it *before* it goes over the network. If someone’s sniffing the network, then running SELECT nickname FROM myUser WHERE username='sheeri' and password=SHA1('kritzer'); is pointless — even if you salt the data. Someone can sniff the packet and find the plaintext ‘kritzer’ — either between the client’s browser and the web server, or between the web server and the database server.

Stewart’s post did not mention that a JOIN is actually the best way to go — JOIN the words table with the passwords on the salted hash, and then you can possibly retrieve a password. As well, if I were a cracker, I wouldn’t care about using a VIEW, I’d just add a “saltedmd5” column to my table, index it, and then JOIN the tables (creating an index on the table field if need be). Because he was comparing md5 sums, not actually trying to compare passwords through an application, it means he had a backdoor to make database calls, so we could indeed assume a JOIN is possible.

My desktop at work is a Windows machine. Why? Because it gives me what I need — shell access to servers so I can do real work on the machines, viagra dosage a text editor, information pills an e-mail client and a web browser. That’s really all I need to do my job. Sure, I could put in for a Macintosh or install a Unix variant. But if it gives me what I want, why would I spend all that work changing things around, just to ultimately get the same requirements — shell, web browser, text editor, e-mail client….????

I love MySQL, it’s a great database. But in order to meet its tenets, it has sacrificed features. When database religious wars start, it comes down to “MySQL stinks because it does not have the features,” or, nowadays, “It didn’t have [x feature] for a long time.” When that happens, my question is always, “If MySQL is so bad, why do so many people use it?”

Because it gives them the most important feature — SPEED. Speed is the #1 top priority in embedded databases, web applications and most desktop applications. (Am I missing a use of a database?) Companies will pay through the nose for training and licenses if it means their customers are happy because their product is speedy. (Meanwhile, MySQL is offering it very affordably, so folks do not have to pay through the nose.)

So to the folks who argue that MySQL stinks — I’ll agree, if you are talking about being feature-rich. However, MySQL has been growing in that department, so the argument is only relevant if you want to do a pivot table, or index a calculated field in a VIEW, or something complex like that. Perhaps MySQL isn’t appropriate for, say, a data warehouse. SQL Server is a better choice for that, as it has reporting modules and analysis wizards and all sorts of stuff.

MySQL is not perfect for everything, and it is not lousy for everything. But if you look at what most people need, it is speed. MySQL delivers that.

http://www.mysql.com/news-and-events/press-release/release_2006_35.html

MySQL won a contest, and it is proclaimed the fastest database.

I am happy, but I am not surprised. The basic tenets of MySQL are “fast, easy-to-use and bug-free”.

It is nice to know MySQL is actually meeting their goals. ‘Bug-free’ is not totally true, of course, but MySQL’s features are well-implemented. And being the fastest database is an achievement, even if it was one that was planned for.

We all know MySQL is the fastest. That’s why so many organizations have used it, even during the time when MyISAM was the only widely used storage engine. Before transactions, before fulltext indexing, before views and stored procedures and triggers, MySQL was widely used. A developer should not have to write transaction code in a programming language, but many organizations were happy to use bad coding techniques and zoom past their competitors with a speedy site.

Now the rest of the world knows that MySQL is the fastest. And, of course, MySQL is fast, easy-to-use, affordable AND full-featured. MySQL is doing a good job of marketing their new features, but another thing they should do is find out all the outdated information on websites and educate folks, that the arguments against MySQL are fewer and fewer as time goes on.
What is it about the folks on Planet MySQL having twin brothers?

Roland Bouman
Sheeri Kritzer
Jay Pipes
Zach Urlocker

That’s 4 of the top 25 posters to Planet MySQL. Anyone else want to reveal having a twin? Anyone on here have a twin sister? If you’re a twin and aren’t on the Planet, purchase note that here too……
At the July MySQL User Group Meeting, cystitis Jim Starkey wondered aloud, “What happens when I COMMIT on a memory table?” I wrote the question down, to research it later.

The obvious answer is “COMMIT on a non-transactional table does nothing.”

Tonight I was thinking about this, and I realized I do not actually COMMIT “on a table.”

The manual page at: http://dev.mysql.com/doc/refman/4.1/en/commit.html (and the 5.0 and 5.1 equivalents) state:

By default, MySQL runs with autocommit mode enabled. This means that as soon as you execute a statement that updates (modifies) a table, MySQL stores the update on disk.

If you are using a transaction-safe storage engine (such as InnoDB, BDB, or NDB Cluster), you can disable autocommit mode with the following statement:

SET AUTOCOMMIT=0;

But what does If you are using a transaction-safe storage engine really mean? I ask because I do not specify a table type with that command. So I tried it on a fresh MySQL install (5.0.19 GA, on a RedHat Linux Enterprise machine):
Continue reading

Good Ideas Implemented Poorly

http://tinyurl.com/gvwml

And the winner is….MySQL.
http://tinyurl.com/gvwml

And the winner is….MySQL.
Matt Asay wrote an article about open source leakage. It’s quite good, malady and got me thinking.

First I thought, “Open source companies do not ‘lose’ revenue to non-paying customers, they just do not gain revenue from them.” But that’s based on the model of open-source software I have in my head that open source software usually starts out as a free, collaborative effort, and if enough folks get enough steam and come up with a business model (aka “a way to get paid”), then they form a company around the open source software.

Simplifying that model: open source software is free until it’s not.

Saying there is leakage does not do justice to the fact that the river flowed freely until the company came along and dammed up the river. Sure, maybe there’s a big leak, but there’s a lot more not leaking than there is leaking.

But open source != free. And it’s not required, either.

Take a for-pay e-book. You buy a license for a personal copy of a book, and read it. You’re not supposed to make copies of the e-book, or redistribute it, etc under the terms of your license.

However, you can “delve into the source code” of an e-book. You cannot change it and redistribute it claiming you authored it. You can, however, change the words in the book to make it more meaningful for yourself. There’s nothing to stop you from annotating the work. The source is open — all the words are there for you to play with.

Now, open source is like that e-book. There’s nothing that says open source HAS to be free. By convention, it has been. Patents are good for keeping secrets and making money. The open source movement shuns patents. But they’re not shunning the making money. They’re shunning the secretive nature of it.

I once had a housemate who was vegan, whose brother owned a restaurant 3,000 miles away. She made the best vegan pancakes, and refused to give out the recipe because it was her brother’s secret recipe. Now, vegan pancakes are not that complicated. There are about 5 ingredients that could go into them. Why the need for the secret? Because her brother would lose business? Restaurants produce cookbooks all the time; I doubt business would die if the recipe got out.

And that’s what open source is all about — “I have this great recipe for vegan pancakes, and I want to share it with you.”

Let me be clear: I think that open source companies deserve to be paid for their work. Much of the time the products are excellent. That does not mean it’s bug-free. (I live in the United States, and I think it’s one of the best countries to live in, but that does not mean we do everything right….far from it!) Most of this is a semantic rant.

I find it amusing that it used to be difficult to convince big companies that open source was good, because upper management equated free with bad. Now that we’ve convinced some of them, we’re upset that it’s difficult to convince big companies that they should pay for something we give them for free.

I think MySQL actually has a sane licensing policy, and I think they’re going in the right direction with MySQL Network. Having free software and for-pay technical service and support seems like a good mix….for MySQL. I can certainly see that being abused by a company that has a bad product, intentionally, to get more $$ out of customers because they are forced to get support — much like Remedy requires lots of customization before it actually can work. MySQL is much better than that.

I think MySQL in particular would do well to offer “Optimization Consulting” for a fee. I know they offer that already, but particularly call it that, as I am always hearing about companies looking for a MySQL consultant for a few weeks to help them optimize their servers.
http://tinyurl.com/gvwml

And the winner is….MySQL.
Matt Asay wrote an article about open source leakage. It’s quite good, malady and got me thinking.

First I thought, “Open source companies do not ‘lose’ revenue to non-paying customers, they just do not gain revenue from them.” But that’s based on the model of open-source software I have in my head that open source software usually starts out as a free, collaborative effort, and if enough folks get enough steam and come up with a business model (aka “a way to get paid”), then they form a company around the open source software.

Simplifying that model: open source software is free until it’s not.

Saying there is leakage does not do justice to the fact that the river flowed freely until the company came along and dammed up the river. Sure, maybe there’s a big leak, but there’s a lot more not leaking than there is leaking.

But open source != free. And it’s not required, either.

Take a for-pay e-book. You buy a license for a personal copy of a book, and read it. You’re not supposed to make copies of the e-book, or redistribute it, etc under the terms of your license.

However, you can “delve into the source code” of an e-book. You cannot change it and redistribute it claiming you authored it. You can, however, change the words in the book to make it more meaningful for yourself. There’s nothing to stop you from annotating the work. The source is open — all the words are there for you to play with.

Now, open source is like that e-book. There’s nothing that says open source HAS to be free. By convention, it has been. Patents are good for keeping secrets and making money. The open source movement shuns patents. But they’re not shunning the making money. They’re shunning the secretive nature of it.

I once had a housemate who was vegan, whose brother owned a restaurant 3,000 miles away. She made the best vegan pancakes, and refused to give out the recipe because it was her brother’s secret recipe. Now, vegan pancakes are not that complicated. There are about 5 ingredients that could go into them. Why the need for the secret? Because her brother would lose business? Restaurants produce cookbooks all the time; I doubt business would die if the recipe got out.

And that’s what open source is all about — “I have this great recipe for vegan pancakes, and I want to share it with you.”

Let me be clear: I think that open source companies deserve to be paid for their work. Much of the time the products are excellent. That does not mean it’s bug-free. (I live in the United States, and I think it’s one of the best countries to live in, but that does not mean we do everything right….far from it!) Most of this is a semantic rant.

I find it amusing that it used to be difficult to convince big companies that open source was good, because upper management equated free with bad. Now that we’ve convinced some of them, we’re upset that it’s difficult to convince big companies that they should pay for something we give them for free.

I think MySQL actually has a sane licensing policy, and I think they’re going in the right direction with MySQL Network. Having free software and for-pay technical service and support seems like a good mix….for MySQL. I can certainly see that being abused by a company that has a bad product, intentionally, to get more $$ out of customers because they are forced to get support — much like Remedy requires lots of customization before it actually can work. MySQL is much better than that.

I think MySQL in particular would do well to offer “Optimization Consulting” for a fee. I know they offer that already, but particularly call it that, as I am always hearing about companies looking for a MySQL consultant for a few weeks to help them optimize their servers.
Note the title is “How I have a successful MySQL User Group.” There’s more than one way to do it, tuberculosis I’m sure. There are 3 basic principles:

1) Try to do as little work as possible.

2) Make your colleagues do as little work as possible.

3) Always have a topic/presentation

These three principles will get you far, site and should be weighted equally. Do not use principle # 1 as an excuse to not follow principle #3. As well, “doing work” includes “paying money”. With that being said:

  1. Make your user group easy to get to. This has different meanings for different areas. It may mean near a major highway interchange, it may mean near a mass transit station. Whatever it means for you, make it easy.
  2. When the Boston MySQL User Group first started, we had free space in an office building right in the city of Boston.

    Pros Cons
    Free Not enough parking
    Close to the subway and train station No free parking
    The street was clearly labeled We had to have a person stand by the door to let people in
    The building was clearly labeled
    There was a great pub next door

    Then we moved to a space at MIT:

    Pros Cons
    Free Passers-by eat the pizza and drink the soda
    Close to the subway and train station We had to have good maps to find the location
    Plenty of free parking

    Now, you may not be so lucky to find a place that will give you space for free. Consider local universities. Also, local libraries (university or otherwise) usually have some meeting space where they might be able to host you. Our first space, an office space, was gotten through a contact at MySQL. Contact some companies who use MySQL and ask if they’ll lend you space — most tech companies have folks there at night anyway, and it’s free advertising for the company! There is a book company near you, or a Pearson VUE center. Remember: it does not hurt to ask. The worst people can say is “no”, and if it’s contributing toward education, a book company (Pearson Education, Apress, O’Reilly, etc) might sponsor it.

    Getting a free space is key, particularly if it has A/V equipment for you to use (if you want to have lectures).

    Granted, if you want to just have a freeform discussion, any pub or coffee shop or location will do.

    You may think about having it in your home. Consider issues of personal safety, domestic distractions (kids, pets, etc), parking and transit, and the fact that most people feel comfortable going to their first meeting on “neutral” ground. However, if the culture surrounding where you live encourages it, go for it!

    If you must pay for a location, get a company to sponsor it. Get their monetary and time commitment in writing if you can (ie, they will sponsor for a year).

  3. Make the meetings easy to remember. Have them on the same day of the month (ie, 2nd Monday). Do not make them conflict with something similar (ie, PHP group, linux group). Try to have it at the same place each time.
  4. Figure out what your group wants and give it to them.
  5. Do they want lectures on specific topics? Maybe go to a place with wireless access and a “troubleshooting” meeting every so often. Maybe they want a blog with news, or war stories. Maybe a lending library would work. Anyone that comes to your meeting is looking for something. Ask each new member what they’re looking for. If they say “to learn more” then go with lectures — make sure to accomodate all skill sets, and have advanced lectures as well as beginning lectures. If they want to get to know each other better, go to a pub and talk about MySQL while you down a pint. Or run a charity fundraiser and donate the proceeds to a not-for-profit like the Electronic Freedom Foundation. There’s plenty the group can do, including having contests, or even starting a business together if you’re plucky.

  6. Have incentives.
  7. But remember rule #2, so keep costs as low as possible. Ask companies to sponsor dinner or light refreshments. Don’t be shy; ask book companies for books, T-shirts and buttons to give away as prizes. Again, the worst they can say is “no”, and I’ve found book companies and MySQL AB to be VERY accomodating, as well as local businesses. Ask company techies to present. Ask MySQL to send someone out to present.

  8. Be available, and follow up when you say you will. Often times folks will have a question. Being able to respond in a timely manner is important. But if you speak out and make regular annoucements, and solicit feedback, folks will know you are accomodating.
  9. Do not do work others can do. When someone asks if you can forward a job listing, have them post it to your group members. Or, better yet, have them come to your meeting and make an announcement in the first 5 minutes. This includes advertising! Find your local MySQL Sales Rep and ask them to help promote your group. Promote it on any site you can think of — in a blog, on Craigslist, heck, make flyers and post it around town if you have to. Get the word out there!
  10. Have the right attitude. It’s not your meeting, it’s your group’s meeting. If they want different topics, ask someone to research and present. It’s OK if the research isn’t perfect or complete; many times your group will fill in with their experiences.
  11. It’s OK to be wrong. No, really. If you don’t know the answer to a question, look it up later on, and follow up. Or ask the group if anyone knows. Also, presentations take time to make, and if you feel compelled to get every single detail right, they’ll take a lot longer. Keep an offline copy of the manual so you can look up anything you need to.
  12. Ask others for help. Even if you think you can do it better or faster. Your group should be able to survive without you, so if you can’t make a meeting all is not lost. This includes asking others to do a presentation of any length — it can be a 10 minute “this is how we use MySQL and this is what our setup looks like” to an hour or more. It does not have to be intimidating. Ask folks to present workshops they’re presenting at conferences, or present notes from conferences they go to. People need to feel like part of the group if the group is to be successful.
  13. If you can’t get a presentation, download one. MySQL is excellent at providing past webinars. Download it to your computer, and have a free lecture from an expert, that you can pause and rewind if you need to. MySQL Webinars. The Boston MySQL User Group videorecords the presentations and puts them online — see the links under “Presentations” at http://www.sheeri.net

Note: The Boston MySQL User Group is successful, but we don’t quite do everything above. I’d love to have more socialization, and I think once summer comes I’ll think of something geeky and fun to do outdoors, with little cost. Or maybe just a bowling trip.
http://tinyurl.com/gvwml

And the winner is….MySQL.
Matt Asay wrote an article about open source leakage. It’s quite good, malady and got me thinking.

First I thought, “Open source companies do not ‘lose’ revenue to non-paying customers, they just do not gain revenue from them.” But that’s based on the model of open-source software I have in my head that open source software usually starts out as a free, collaborative effort, and if enough folks get enough steam and come up with a business model (aka “a way to get paid”), then they form a company around the open source software.

Simplifying that model: open source software is free until it’s not.

Saying there is leakage does not do justice to the fact that the river flowed freely until the company came along and dammed up the river. Sure, maybe there’s a big leak, but there’s a lot more not leaking than there is leaking.

But open source != free. And it’s not required, either.

Take a for-pay e-book. You buy a license for a personal copy of a book, and read it. You’re not supposed to make copies of the e-book, or redistribute it, etc under the terms of your license.

However, you can “delve into the source code” of an e-book. You cannot change it and redistribute it claiming you authored it. You can, however, change the words in the book to make it more meaningful for yourself. There’s nothing to stop you from annotating the work. The source is open — all the words are there for you to play with.

Now, open source is like that e-book. There’s nothing that says open source HAS to be free. By convention, it has been. Patents are good for keeping secrets and making money. The open source movement shuns patents. But they’re not shunning the making money. They’re shunning the secretive nature of it.

I once had a housemate who was vegan, whose brother owned a restaurant 3,000 miles away. She made the best vegan pancakes, and refused to give out the recipe because it was her brother’s secret recipe. Now, vegan pancakes are not that complicated. There are about 5 ingredients that could go into them. Why the need for the secret? Because her brother would lose business? Restaurants produce cookbooks all the time; I doubt business would die if the recipe got out.

And that’s what open source is all about — “I have this great recipe for vegan pancakes, and I want to share it with you.”

Let me be clear: I think that open source companies deserve to be paid for their work. Much of the time the products are excellent. That does not mean it’s bug-free. (I live in the United States, and I think it’s one of the best countries to live in, but that does not mean we do everything right….far from it!) Most of this is a semantic rant.

I find it amusing that it used to be difficult to convince big companies that open source was good, because upper management equated free with bad. Now that we’ve convinced some of them, we’re upset that it’s difficult to convince big companies that they should pay for something we give them for free.

I think MySQL actually has a sane licensing policy, and I think they’re going in the right direction with MySQL Network. Having free software and for-pay technical service and support seems like a good mix….for MySQL. I can certainly see that being abused by a company that has a bad product, intentionally, to get more $$ out of customers because they are forced to get support — much like Remedy requires lots of customization before it actually can work. MySQL is much better than that.

I think MySQL in particular would do well to offer “Optimization Consulting” for a fee. I know they offer that already, but particularly call it that, as I am always hearing about companies looking for a MySQL consultant for a few weeks to help them optimize their servers.
Note the title is “How I have a successful MySQL User Group.” There’s more than one way to do it, tuberculosis I’m sure. There are 3 basic principles:

1) Try to do as little work as possible.

2) Make your colleagues do as little work as possible.

3) Always have a topic/presentation

These three principles will get you far, site and should be weighted equally. Do not use principle # 1 as an excuse to not follow principle #3. As well, “doing work” includes “paying money”. With that being said:

  1. Make your user group easy to get to. This has different meanings for different areas. It may mean near a major highway interchange, it may mean near a mass transit station. Whatever it means for you, make it easy.
  2. When the Boston MySQL User Group first started, we had free space in an office building right in the city of Boston.

    Pros Cons
    Free Not enough parking
    Close to the subway and train station No free parking
    The street was clearly labeled We had to have a person stand by the door to let people in
    The building was clearly labeled
    There was a great pub next door

    Then we moved to a space at MIT:

    Pros Cons
    Free Passers-by eat the pizza and drink the soda
    Close to the subway and train station We had to have good maps to find the location
    Plenty of free parking

    Now, you may not be so lucky to find a place that will give you space for free. Consider local universities. Also, local libraries (university or otherwise) usually have some meeting space where they might be able to host you. Our first space, an office space, was gotten through a contact at MySQL. Contact some companies who use MySQL and ask if they’ll lend you space — most tech companies have folks there at night anyway, and it’s free advertising for the company! There is a book company near you, or a Pearson VUE center. Remember: it does not hurt to ask. The worst people can say is “no”, and if it’s contributing toward education, a book company (Pearson Education, Apress, O’Reilly, etc) might sponsor it.

    Getting a free space is key, particularly if it has A/V equipment for you to use (if you want to have lectures).

    Granted, if you want to just have a freeform discussion, any pub or coffee shop or location will do.

    You may think about having it in your home. Consider issues of personal safety, domestic distractions (kids, pets, etc), parking and transit, and the fact that most people feel comfortable going to their first meeting on “neutral” ground. However, if the culture surrounding where you live encourages it, go for it!

    If you must pay for a location, get a company to sponsor it. Get their monetary and time commitment in writing if you can (ie, they will sponsor for a year).

  3. Make the meetings easy to remember. Have them on the same day of the month (ie, 2nd Monday). Do not make them conflict with something similar (ie, PHP group, linux group). Try to have it at the same place each time.
  4. Figure out what your group wants and give it to them.
  5. Do they want lectures on specific topics? Maybe go to a place with wireless access and a “troubleshooting” meeting every so often. Maybe they want a blog with news, or war stories. Maybe a lending library would work. Anyone that comes to your meeting is looking for something. Ask each new member what they’re looking for. If they say “to learn more” then go with lectures — make sure to accomodate all skill sets, and have advanced lectures as well as beginning lectures. If they want to get to know each other better, go to a pub and talk about MySQL while you down a pint. Or run a charity fundraiser and donate the proceeds to a not-for-profit like the Electronic Freedom Foundation. There’s plenty the group can do, including having contests, or even starting a business together if you’re plucky.

  6. Have incentives.
  7. But remember rule #2, so keep costs as low as possible. Ask companies to sponsor dinner or light refreshments. Don’t be shy; ask book companies for books, T-shirts and buttons to give away as prizes. Again, the worst they can say is “no”, and I’ve found book companies and MySQL AB to be VERY accomodating, as well as local businesses. Ask company techies to present. Ask MySQL to send someone out to present.

  8. Be available, and follow up when you say you will. Often times folks will have a question. Being able to respond in a timely manner is important. But if you speak out and make regular annoucements, and solicit feedback, folks will know you are accomodating.
  9. Do not do work others can do. When someone asks if you can forward a job listing, have them post it to your group members. Or, better yet, have them come to your meeting and make an announcement in the first 5 minutes. This includes advertising! Find your local MySQL Sales Rep and ask them to help promote your group. Promote it on any site you can think of — in a blog, on Craigslist, heck, make flyers and post it around town if you have to. Get the word out there!
  10. Have the right attitude. It’s not your meeting, it’s your group’s meeting. If they want different topics, ask someone to research and present. It’s OK if the research isn’t perfect or complete; many times your group will fill in with their experiences.
  11. It’s OK to be wrong. No, really. If you don’t know the answer to a question, look it up later on, and follow up. Or ask the group if anyone knows. Also, presentations take time to make, and if you feel compelled to get every single detail right, they’ll take a lot longer. Keep an offline copy of the manual so you can look up anything you need to.
  12. Ask others for help. Even if you think you can do it better or faster. Your group should be able to survive without you, so if you can’t make a meeting all is not lost. This includes asking others to do a presentation of any length — it can be a 10 minute “this is how we use MySQL and this is what our setup looks like” to an hour or more. It does not have to be intimidating. Ask folks to present workshops they’re presenting at conferences, or present notes from conferences they go to. People need to feel like part of the group if the group is to be successful.
  13. If you can’t get a presentation, download one. MySQL is excellent at providing past webinars. Download it to your computer, and have a free lecture from an expert, that you can pause and rewind if you need to. MySQL Webinars. The Boston MySQL User Group videorecords the presentations and puts them online — see the links under “Presentations” at http://www.sheeri.net

Note: The Boston MySQL User Group is successful, but we don’t quite do everything above. I’d love to have more socialization, and I think once summer comes I’ll think of something geeky and fun to do outdoors, with little cost. Or maybe just a bowling trip.
This falls under “I knew I could do this but I didn’t realize I could apply it this way!”

You can do

SELECT 1 from table1;

Which will return n rows, treatment each row having 1 field whose value is 1. n is the number of rows in table1.

SELECT "string" from table1 works similarly.

However, refractionist I never considered using

SELECT "string" as "debug statement" to debug code.

For instance,

mysql> SELECT "SELECT foo from bar where baz>0" as "debug";;
+---------------------------------+
| debug |
+---------------------------------+
| SELECT foo from bar where baz>0 |
+---------------------------------+
1 row in set (0.00 sec)

Neat trick! This is why I follow the MySQL Users general list, because every so often a gem like this comes up. Plus, I can’t resist helping folks out. And if I’m not accurate, someone else will step up.

Some more random thoughts, as I’m thinking about them and did not really want to make a separate post for them — at the MySQL Users Conference, the rank on PlanetMySQL.com came up, from “the conference postings will increase my rank” to “so-and-so cheats and makes lots of little posts.”

Now, I was just happy to make the list. Of course, now that I did post a lot at the conference, my rank went up from about #12 to about #7. As an experiment, I’m attempting to publish an average of one post a day for May and see what my ranking is at the end of May. So far, so good. And of course, as always, all of my posts will have content!

I think it would be neat to have a little line on planetmysql.com that shows where “1 post a day”, “1 post a week” and “1 post a month” fall for the top posters.
http://tinyurl.com/gvwml

And the winner is….MySQL.
Matt Asay wrote an article about open source leakage. It’s quite good, malady and got me thinking.

First I thought, “Open source companies do not ‘lose’ revenue to non-paying customers, they just do not gain revenue from them.” But that’s based on the model of open-source software I have in my head that open source software usually starts out as a free, collaborative effort, and if enough folks get enough steam and come up with a business model (aka “a way to get paid”), then they form a company around the open source software.

Simplifying that model: open source software is free until it’s not.

Saying there is leakage does not do justice to the fact that the river flowed freely until the company came along and dammed up the river. Sure, maybe there’s a big leak, but there’s a lot more not leaking than there is leaking.

But open source != free. And it’s not required, either.

Take a for-pay e-book. You buy a license for a personal copy of a book, and read it. You’re not supposed to make copies of the e-book, or redistribute it, etc under the terms of your license.

However, you can “delve into the source code” of an e-book. You cannot change it and redistribute it claiming you authored it. You can, however, change the words in the book to make it more meaningful for yourself. There’s nothing to stop you from annotating the work. The source is open — all the words are there for you to play with.

Now, open source is like that e-book. There’s nothing that says open source HAS to be free. By convention, it has been. Patents are good for keeping secrets and making money. The open source movement shuns patents. But they’re not shunning the making money. They’re shunning the secretive nature of it.

I once had a housemate who was vegan, whose brother owned a restaurant 3,000 miles away. She made the best vegan pancakes, and refused to give out the recipe because it was her brother’s secret recipe. Now, vegan pancakes are not that complicated. There are about 5 ingredients that could go into them. Why the need for the secret? Because her brother would lose business? Restaurants produce cookbooks all the time; I doubt business would die if the recipe got out.

And that’s what open source is all about — “I have this great recipe for vegan pancakes, and I want to share it with you.”

Let me be clear: I think that open source companies deserve to be paid for their work. Much of the time the products are excellent. That does not mean it’s bug-free. (I live in the United States, and I think it’s one of the best countries to live in, but that does not mean we do everything right….far from it!) Most of this is a semantic rant.

I find it amusing that it used to be difficult to convince big companies that open source was good, because upper management equated free with bad. Now that we’ve convinced some of them, we’re upset that it’s difficult to convince big companies that they should pay for something we give them for free.

I think MySQL actually has a sane licensing policy, and I think they’re going in the right direction with MySQL Network. Having free software and for-pay technical service and support seems like a good mix….for MySQL. I can certainly see that being abused by a company that has a bad product, intentionally, to get more $$ out of customers because they are forced to get support — much like Remedy requires lots of customization before it actually can work. MySQL is much better than that.

I think MySQL in particular would do well to offer “Optimization Consulting” for a fee. I know they offer that already, but particularly call it that, as I am always hearing about companies looking for a MySQL consultant for a few weeks to help them optimize their servers.
Note the title is “How I have a successful MySQL User Group.” There’s more than one way to do it, tuberculosis I’m sure. There are 3 basic principles:

1) Try to do as little work as possible.

2) Make your colleagues do as little work as possible.

3) Always have a topic/presentation

These three principles will get you far, site and should be weighted equally. Do not use principle # 1 as an excuse to not follow principle #3. As well, “doing work” includes “paying money”. With that being said:

  1. Make your user group easy to get to. This has different meanings for different areas. It may mean near a major highway interchange, it may mean near a mass transit station. Whatever it means for you, make it easy.
  2. When the Boston MySQL User Group first started, we had free space in an office building right in the city of Boston.

    Pros Cons
    Free Not enough parking
    Close to the subway and train station No free parking
    The street was clearly labeled We had to have a person stand by the door to let people in
    The building was clearly labeled
    There was a great pub next door

    Then we moved to a space at MIT:

    Pros Cons
    Free Passers-by eat the pizza and drink the soda
    Close to the subway and train station We had to have good maps to find the location
    Plenty of free parking

    Now, you may not be so lucky to find a place that will give you space for free. Consider local universities. Also, local libraries (university or otherwise) usually have some meeting space where they might be able to host you. Our first space, an office space, was gotten through a contact at MySQL. Contact some companies who use MySQL and ask if they’ll lend you space — most tech companies have folks there at night anyway, and it’s free advertising for the company! There is a book company near you, or a Pearson VUE center. Remember: it does not hurt to ask. The worst people can say is “no”, and if it’s contributing toward education, a book company (Pearson Education, Apress, O’Reilly, etc) might sponsor it.

    Getting a free space is key, particularly if it has A/V equipment for you to use (if you want to have lectures).

    Granted, if you want to just have a freeform discussion, any pub or coffee shop or location will do.

    You may think about having it in your home. Consider issues of personal safety, domestic distractions (kids, pets, etc), parking and transit, and the fact that most people feel comfortable going to their first meeting on “neutral” ground. However, if the culture surrounding where you live encourages it, go for it!

    If you must pay for a location, get a company to sponsor it. Get their monetary and time commitment in writing if you can (ie, they will sponsor for a year).

  3. Make the meetings easy to remember. Have them on the same day of the month (ie, 2nd Monday). Do not make them conflict with something similar (ie, PHP group, linux group). Try to have it at the same place each time.
  4. Figure out what your group wants and give it to them.
  5. Do they want lectures on specific topics? Maybe go to a place with wireless access and a “troubleshooting” meeting every so often. Maybe they want a blog with news, or war stories. Maybe a lending library would work. Anyone that comes to your meeting is looking for something. Ask each new member what they’re looking for. If they say “to learn more” then go with lectures — make sure to accomodate all skill sets, and have advanced lectures as well as beginning lectures. If they want to get to know each other better, go to a pub and talk about MySQL while you down a pint. Or run a charity fundraiser and donate the proceeds to a not-for-profit like the Electronic Freedom Foundation. There’s plenty the group can do, including having contests, or even starting a business together if you’re plucky.

  6. Have incentives.
  7. But remember rule #2, so keep costs as low as possible. Ask companies to sponsor dinner or light refreshments. Don’t be shy; ask book companies for books, T-shirts and buttons to give away as prizes. Again, the worst they can say is “no”, and I’ve found book companies and MySQL AB to be VERY accomodating, as well as local businesses. Ask company techies to present. Ask MySQL to send someone out to present.

  8. Be available, and follow up when you say you will. Often times folks will have a question. Being able to respond in a timely manner is important. But if you speak out and make regular annoucements, and solicit feedback, folks will know you are accomodating.
  9. Do not do work others can do. When someone asks if you can forward a job listing, have them post it to your group members. Or, better yet, have them come to your meeting and make an announcement in the first 5 minutes. This includes advertising! Find your local MySQL Sales Rep and ask them to help promote your group. Promote it on any site you can think of — in a blog, on Craigslist, heck, make flyers and post it around town if you have to. Get the word out there!
  10. Have the right attitude. It’s not your meeting, it’s your group’s meeting. If they want different topics, ask someone to research and present. It’s OK if the research isn’t perfect or complete; many times your group will fill in with their experiences.
  11. It’s OK to be wrong. No, really. If you don’t know the answer to a question, look it up later on, and follow up. Or ask the group if anyone knows. Also, presentations take time to make, and if you feel compelled to get every single detail right, they’ll take a lot longer. Keep an offline copy of the manual so you can look up anything you need to.
  12. Ask others for help. Even if you think you can do it better or faster. Your group should be able to survive without you, so if you can’t make a meeting all is not lost. This includes asking others to do a presentation of any length — it can be a 10 minute “this is how we use MySQL and this is what our setup looks like” to an hour or more. It does not have to be intimidating. Ask folks to present workshops they’re presenting at conferences, or present notes from conferences they go to. People need to feel like part of the group if the group is to be successful.
  13. If you can’t get a presentation, download one. MySQL is excellent at providing past webinars. Download it to your computer, and have a free lecture from an expert, that you can pause and rewind if you need to. MySQL Webinars. The Boston MySQL User Group videorecords the presentations and puts them online — see the links under “Presentations” at http://www.sheeri.net

Note: The Boston MySQL User Group is successful, but we don’t quite do everything above. I’d love to have more socialization, and I think once summer comes I’ll think of something geeky and fun to do outdoors, with little cost. Or maybe just a bowling trip.
This falls under “I knew I could do this but I didn’t realize I could apply it this way!”

You can do

SELECT 1 from table1;

Which will return n rows, treatment each row having 1 field whose value is 1. n is the number of rows in table1.

SELECT "string" from table1 works similarly.

However, refractionist I never considered using

SELECT "string" as "debug statement" to debug code.

For instance,

mysql> SELECT "SELECT foo from bar where baz>0" as "debug";;
+---------------------------------+
| debug |
+---------------------------------+
| SELECT foo from bar where baz>0 |
+---------------------------------+
1 row in set (0.00 sec)

Neat trick! This is why I follow the MySQL Users general list, because every so often a gem like this comes up. Plus, I can’t resist helping folks out. And if I’m not accurate, someone else will step up.

Some more random thoughts, as I’m thinking about them and did not really want to make a separate post for them — at the MySQL Users Conference, the rank on PlanetMySQL.com came up, from “the conference postings will increase my rank” to “so-and-so cheats and makes lots of little posts.”

Now, I was just happy to make the list. Of course, now that I did post a lot at the conference, my rank went up from about #12 to about #7. As an experiment, I’m attempting to publish an average of one post a day for May and see what my ranking is at the end of May. So far, so good. And of course, as always, all of my posts will have content!

I think it would be neat to have a little line on planetmysql.com that shows where “1 post a day”, “1 post a week” and “1 post a month” fall for the top posters.
One of the things I did this weekend was knit the pattern I’d made for Sakila, link the dolphin in the MySQL logo. Click on the image for a bigger picture:

The only problem is I have no idea what to do with it. I have more of the orange and blue yarn. I thought I would make it into a purse but it turned out much wider than I expected. I could make it into a big handbag but I don’t think I’d use it. Any suggestions?

The pattern itself:

Continue reading

InnoDB information

Formed a MySQL Quiz team
Met all the requirements for the MySQL Quiz
Took a Certification exam

everyone root for Team Prokrasti Nation!
Formed a MySQL Quiz team
Met all the requirements for the MySQL Quiz
Took a Certification exam

everyone root for Team Prokrasti Nation!
I was told that teams had to have a physical instantiation of a mascot, treatment so I said, hygiene “maybe I’ll knit something.” Well, pfizer I didn’t knit something, but I did hand-craft an origami butterfly for Team Prokrasti Nation’s mascot:

(click picture for larger image).

Oh, and I won a fun game from O’Reilly for submitting speaker evaluations.
Formed a MySQL Quiz team
Met all the requirements for the MySQL Quiz
Took a Certification exam

everyone root for Team Prokrasti Nation!
I was told that teams had to have a physical instantiation of a mascot, treatment so I said, hygiene “maybe I’ll knit something.” Well, pfizer I didn’t knit something, but I did hand-craft an origami butterfly for Team Prokrasti Nation’s mascot:

(click picture for larger image).

Oh, and I won a fun game from O’Reilly for submitting speaker evaluations.
talk by Roland Mallmann

MaxDB is older than I am, site in 1977 started at University of Berlin. Owned by SAP today. Today it’s open source under GPL, purchase or commercial license from SAP or MySQL AB.

Why Max DB is so great:
Low cost of ownership
Few config parameters
no size estimates for indvidual db objects

no reorg — space management done automatically — space no longer needed is returned immediately to the db, data occupied vs. free (holes) ration is highest as possible. This is done by matching logical pages to physical on disk with the Converter, and I/O and space management.

Space management done automatically
No reorganization is needed (ie, OPTIMIZE TABLE)
Gaps are not allowed, therefore updates and deletes are in place, and sorts happen AFTER an insertion.
Space freed is immediately returned to DB
Done by Converter, matches logical pages to physical disk.
Data is stored in B* Trees (b star tree) for almost all objects (Tables, indexes, secondary indexes, BLOBs)

Concurrent asynchronous I/O
Manages free blocks
Auto balancing of disk I/O
Savepoints
Backup Integration (including incremental)
Segmentation of the data cache
A 10 minutes cycle of changes flushed to disk
Flushing data pages to disk is spread out over the 10 minutes

Online Backup and Restore
Consistent backups, no need to apply logs
Savepoint issued before db backup, savepoint includes undo information for remaining open transactions.
Can do incremental, full data, or log backup
can restore, restore from a medium, or backup from history, or backup to a point in time.

Snapshots
Can make complete database backup
Can make a snapshot for replication
Can make incremental on master and restore snapshot on replication as a backup strategy (as long as there isn’t a newer snapshot, because then incremental backup logs are reset)

Standby Database
A standby is made possible using log shipping.
Master and slave share backup media (shared disk)
Init once with complete master backup
Redo available logs

In case of emergency: start slave, back up last log piece from master in case it hasn’t been shipped. Redo all ‘open’ log backups (should be none), redo final piece, start slave, it’s now the master!

Synchronization Manager
no permanent attention required
unattended desktop/laptop installation and operation

database snapshot functionality!

Formed a MySQL Quiz team
Met all the requirements for the MySQL Quiz
Took a Certification exam

everyone root for Team Prokrasti Nation!
I was told that teams had to have a physical instantiation of a mascot, treatment so I said, hygiene “maybe I’ll knit something.” Well, pfizer I didn’t knit something, but I did hand-craft an origami butterfly for Team Prokrasti Nation’s mascot:

(click picture for larger image).

Oh, and I won a fun game from O’Reilly for submitting speaker evaluations.
talk by Roland Mallmann

MaxDB is older than I am, site in 1977 started at University of Berlin. Owned by SAP today. Today it’s open source under GPL, purchase or commercial license from SAP or MySQL AB.

Why Max DB is so great:
Low cost of ownership
Few config parameters
no size estimates for indvidual db objects

no reorg — space management done automatically — space no longer needed is returned immediately to the db, data occupied vs. free (holes) ration is highest as possible. This is done by matching logical pages to physical on disk with the Converter, and I/O and space management.

Space management done automatically
No reorganization is needed (ie, OPTIMIZE TABLE)
Gaps are not allowed, therefore updates and deletes are in place, and sorts happen AFTER an insertion.
Space freed is immediately returned to DB
Done by Converter, matches logical pages to physical disk.
Data is stored in B* Trees (b star tree) for almost all objects (Tables, indexes, secondary indexes, BLOBs)

Concurrent asynchronous I/O
Manages free blocks
Auto balancing of disk I/O
Savepoints
Backup Integration (including incremental)
Segmentation of the data cache
A 10 minutes cycle of changes flushed to disk
Flushing data pages to disk is spread out over the 10 minutes

Online Backup and Restore
Consistent backups, no need to apply logs
Savepoint issued before db backup, savepoint includes undo information for remaining open transactions.
Can do incremental, full data, or log backup
can restore, restore from a medium, or backup from history, or backup to a point in time.

Snapshots
Can make complete database backup
Can make a snapshot for replication
Can make incremental on master and restore snapshot on replication as a backup strategy (as long as there isn’t a newer snapshot, because then incremental backup logs are reset)

Standby Database
A standby is made possible using log shipping.
Master and slave share backup media (shared disk)
Init once with complete master backup
Redo available logs

In case of emergency: start slave, back up last log piece from master in case it hasn’t been shipped. Redo all ‘open’ log backups (should be none), redo final piece, start slave, it’s now the master!

Synchronization Manager
no permanent attention required
unattended desktop/laptop installation and operation

database snapshot functionality!

Some of these may be conflicting, therapy not applicable to everyone.

1) think horizontal — everything, patient not just the web servers. Micro optimizations are boring, as or other details
2) benchmarking techniques;. Not “how fast” but “how many”. test force, not speed.
3) bigger and faster vertical scaling is the enemy.
4) horizontal scaling = add another box
5) implementation, scale your system a few times, but scale your ARCHITECTUREa dozens or hundreds of time.
6) start from the beginning with architecture implementation.
7) don’t have “The server” for anything
8) stateless good, stateful bad
9) “shared nothing” good
10) don’t keep state within app server
11) caching good.
12) generate static pages periodically, works well for not millions of pages or changes.
13) cache full output in application
14) include cookies in the “cache key” so diff browsers can get diff info too
15) use cache when this, not when that
16) use regexp to insert customized content into the cahed page
17) set Expires header to control cache times, or rewrite rule to generate page if the cached file does not exist (rails does this)
18) if content is dynamic this does not work, but great for caching “dynamic” images
19) parial pages — pre-generate static page snippets, have handler just assemble pieces.
20) cache little snippets, ie sidebar
21) don’t spend more time managing the cadche than you sav
22) cache data that’s too slow to query, fetch, calc.
23) generate page from cached data
24) use same data to generate api responss
25) moves load to web servers
26) start with things you hit all the time
27) if you don’t use it, don’t cache it, check db logs
28) don’t depend on MySQL Query cache unless it actually helps
29) local file system not so good because you copy page for every server
30) use process memory, not shared
31) mysql cache table — id is the “cache key” type is the “namespace”, metadata for things like headers for cached http responses; purge_key to make it easier to delete data from cache (make it an index, too, primary index on id,type, expire index on expire field) fields
32) why 31 fails, how do you load balance, what if mysql server died, now no cache
33) but you can use mysql scaling techniques to deal, like dual-master replication
34) use memcached, like lj, slashdot, wikipedia — memory based, linux 2.6(epoll) or FreeBsD(kqueue), low overhead for lots of cxns, no master, simple!
35) how to scale the db horizontally, use MySQL, use replication to share the load, write to one master, read from many slaves, good for heavy read apps (or insert delayed, if you don’t need to write right away) — check out “High Performance MySQL”
36) relay slave replication if too much bandwidth on the master, use a replication slave to replicate to other slaves.
37) writing does not scale with replication — all servers need to do the same writes. 5.1’s row-level replication might help.
38) so partition the data, divide and conquer. separate cluster for different data sets
39) if you can’t divide, use flexible partitioning, global server keeps track for which “cluster” has what info. auto_increment columns only in the “global master”. Aggressively cache “global master” data.
40) If you use a master-master setup like 39, then you don’t have replication slaves, no latency from commit to data being available. if you are careful you can write to both masters. Make each user always use the same master, so primary keys won’t be messed up. If one master fails, use the other one.
41) don’t be afraid of the data duplication monster. use summary tables, to avoid things like COUNT(*) and GROUP BY. do it once, put result into a table — do this periodically, or do it when the data is inserted. Or data affecting a “user” and a “group” goes into both the “user” and “group” partitions (clusters). so it’s duplicating data.
42) but you can go further, and use summary dbs! copy data into special dbs optimized for special queries, ie FULLTEXT searches, anything spanning more than one or all clusters, different dbs for different latency requirements, ie RSS feeds from a replicated slave db — RSS feeds can be late).
43) save data to multiple “partitions” like the application doing manual replication — app writes to 2 places OR last_updated and deleted columns, use triggers to add to “replication_queue” table, background program to copy data based on queue table or last_updated column
44) if you’re running oracle, move read operations to MySQL with this manual replication idea. Good way to sneak MySQL into an oracle shop.
45) make everything repeatable, build summary and load scripts so they can restart or run again — also have one trusted eata place, so summaries and copies can be (re)created from there.

BREATHE! HALFWAY THERE!!

46) use innodb because it’s more robust. except for big read-only tables, high volume streaming tables (logging), lcoked tables or INSERT DELAYED, specialized engines for special needs, and more engines in the future — but for now, InnoDB
47) Multiple MySQL instances — run diff instances for diff workloads, even if they share the same server. moving to separate hardware is easier, of course. optimize the server instance for the workload. e4asy to set up with instance manager or mysqld_multi, and there are init scripts that support the instance manager.
48) asynchronous data loading when you can — if you’re updating counts or loading logs, send updates through Spread (or whatever messaging something) to a daemon loading data. Don’t update for each request (ie, counts), do it every 1000 updates, or every few minutes. This helps if db loses net connection, the frontend keeps running! or if you want to lock tables, etc.
49) preload, dump and process — let the servers pre-process, as much as possible. dump never changing data structures to js files for the client to cache (postal data maybe), or dump to memory, or use SQLite, or BerkeleyDB and rsync to each webserver, or mysql replica on webserver
50) stored procedures are dangerous because they’re not horizontal, more work than just adding a webserver– only use if it saves the db work (ie send 5 rows to app instead of 5,000 and parsing in app)
51) reconsider persistent db connections because it requires a thread = memory, all httpd processes talk to all dbs, lots of caching might mean you don’t need main db, mysql cxns are fast so why not just reopen?
52) innodb_file_per_table, so OPTIMIZE TABLE clears unused space. innodb_buffer_pool_soze set to 80% of total mem (dedicated mysql server). innodb_flush_log_at_trx_commit, innodb_log_file_size
53) have metadata in db, store images in filesystem, but then how do you replicate? or store images in myisam tables, split up so tables don’t get bigger than 4G, so if gets corrupt fewer problems. metadata table might specify what table it’s in. include last modified date in metadata, and use in URLs to optimize caching, ie with squid: /images/$timestamp/$id.jpg
54) do everything in unicode
55) UTC for everything
56) STRICT_TRANS_TABLE so MySQL is picky about bad input and does not just turn it to NULL or zero.
57) Don’t overwork the DB — dbs don’t easily scale like web servers
58) STATELESS. don’t make cookie id’s easy to guess, or sequential, etc. don’t save state on one server only, save it on every one. put the data in the db, don’t put it in the cookie, that duplicates efforts. important data into db, so it gets saved, unimportant transient data puts in memcache, SMALL data in cookie. a shopping cart would go in db, background color goes in cookie, and last viewed items go in memcache
59) to make cookies safer, use checksums and timestamps to validate cookies. Encryption usually a waste of cycles.
60) use resources wisely. balance how you use hardware — use memory to save I/O or CPU, don’t swap memory to disk EVER.
61) do the work in parallel — split work into smaller pieces and run on different boxes. send sub-requests off as soon as possible and do other stuff in the meantime.
62) light processes for light tasks — thin proxy servers for “network buffers”, goes between the user and your heavier backend application. Use httpd with mod_proxy, mod_backhand. the proxy does the ‘net work, and fewer httpd processes are needed to do the real work, this saves memory and db connections. proxies can also server static files and cache responses. Avoid starting main app as root. Load balancing, and very important if your background processes are “heavy”. Very EASY to set up a light process. ProxyPreserveHostOn in apache 2
63) job queues — use queues, AJAX can make this easy. webserver submits job to database “queue”, first avail worker picks up first job, and sends result to queue. or ue gearman, Spread, MQ/Java Messaging Service(?)
64) log http requests to a database! log all 4xx and 5xx requests, great to see which requests are slow or fast. but only log 1-2% of all requests. Time::HiRes in Perl, microseconds from gettimeofday system call.
65) get good deals on servers http://www.siliconmechanics.com, server vendor of lj and others.

IN SUMMARY: HORIZONTAL GOOD, VERTICAL BAD

for jobs: ask@develooper.com (jobs, moonlighters, perl/mysql etc)
slides will be up at http://develooper.com/talks/
Phew! That was a lot of fast typing (60 words per minute, baby!). Ask is smart, but QUICK!!!! His slides will be VERY useful when they appear. He said there were 53 tips, but I numbered each new line (and not smartly with OL and LI) and I have more than that…
Formed a MySQL Quiz team
Met all the requirements for the MySQL Quiz
Took a Certification exam

everyone root for Team Prokrasti Nation!
I was told that teams had to have a physical instantiation of a mascot, treatment so I said, hygiene “maybe I’ll knit something.” Well, pfizer I didn’t knit something, but I did hand-craft an origami butterfly for Team Prokrasti Nation’s mascot:

(click picture for larger image).

Oh, and I won a fun game from O’Reilly for submitting speaker evaluations.
talk by Roland Mallmann

MaxDB is older than I am, site in 1977 started at University of Berlin. Owned by SAP today. Today it’s open source under GPL, purchase or commercial license from SAP or MySQL AB.

Why Max DB is so great:
Low cost of ownership
Few config parameters
no size estimates for indvidual db objects

no reorg — space management done automatically — space no longer needed is returned immediately to the db, data occupied vs. free (holes) ration is highest as possible. This is done by matching logical pages to physical on disk with the Converter, and I/O and space management.

Space management done automatically
No reorganization is needed (ie, OPTIMIZE TABLE)
Gaps are not allowed, therefore updates and deletes are in place, and sorts happen AFTER an insertion.
Space freed is immediately returned to DB
Done by Converter, matches logical pages to physical disk.
Data is stored in B* Trees (b star tree) for almost all objects (Tables, indexes, secondary indexes, BLOBs)

Concurrent asynchronous I/O
Manages free blocks
Auto balancing of disk I/O
Savepoints
Backup Integration (including incremental)
Segmentation of the data cache
A 10 minutes cycle of changes flushed to disk
Flushing data pages to disk is spread out over the 10 minutes

Online Backup and Restore
Consistent backups, no need to apply logs
Savepoint issued before db backup, savepoint includes undo information for remaining open transactions.
Can do incremental, full data, or log backup
can restore, restore from a medium, or backup from history, or backup to a point in time.

Snapshots
Can make complete database backup
Can make a snapshot for replication
Can make incremental on master and restore snapshot on replication as a backup strategy (as long as there isn’t a newer snapshot, because then incremental backup logs are reset)

Standby Database
A standby is made possible using log shipping.
Master and slave share backup media (shared disk)
Init once with complete master backup
Redo available logs

In case of emergency: start slave, back up last log piece from master in case it hasn’t been shipped. Redo all ‘open’ log backups (should be none), redo final piece, start slave, it’s now the master!

Synchronization Manager
no permanent attention required
unattended desktop/laptop installation and operation

database snapshot functionality!

Some of these may be conflicting, therapy not applicable to everyone.

1) think horizontal — everything, patient not just the web servers. Micro optimizations are boring, as or other details
2) benchmarking techniques;. Not “how fast” but “how many”. test force, not speed.
3) bigger and faster vertical scaling is the enemy.
4) horizontal scaling = add another box
5) implementation, scale your system a few times, but scale your ARCHITECTUREa dozens or hundreds of time.
6) start from the beginning with architecture implementation.
7) don’t have “The server” for anything
8) stateless good, stateful bad
9) “shared nothing” good
10) don’t keep state within app server
11) caching good.
12) generate static pages periodically, works well for not millions of pages or changes.
13) cache full output in application
14) include cookies in the “cache key” so diff browsers can get diff info too
15) use cache when this, not when that
16) use regexp to insert customized content into the cahed page
17) set Expires header to control cache times, or rewrite rule to generate page if the cached file does not exist (rails does this)
18) if content is dynamic this does not work, but great for caching “dynamic” images
19) parial pages — pre-generate static page snippets, have handler just assemble pieces.
20) cache little snippets, ie sidebar
21) don’t spend more time managing the cadche than you sav
22) cache data that’s too slow to query, fetch, calc.
23) generate page from cached data
24) use same data to generate api responss
25) moves load to web servers
26) start with things you hit all the time
27) if you don’t use it, don’t cache it, check db logs
28) don’t depend on MySQL Query cache unless it actually helps
29) local file system not so good because you copy page for every server
30) use process memory, not shared
31) mysql cache table — id is the “cache key” type is the “namespace”, metadata for things like headers for cached http responses; purge_key to make it easier to delete data from cache (make it an index, too, primary index on id,type, expire index on expire field) fields
32) why 31 fails, how do you load balance, what if mysql server died, now no cache
33) but you can use mysql scaling techniques to deal, like dual-master replication
34) use memcached, like lj, slashdot, wikipedia — memory based, linux 2.6(epoll) or FreeBsD(kqueue), low overhead for lots of cxns, no master, simple!
35) how to scale the db horizontally, use MySQL, use replication to share the load, write to one master, read from many slaves, good for heavy read apps (or insert delayed, if you don’t need to write right away) — check out “High Performance MySQL”
36) relay slave replication if too much bandwidth on the master, use a replication slave to replicate to other slaves.
37) writing does not scale with replication — all servers need to do the same writes. 5.1’s row-level replication might help.
38) so partition the data, divide and conquer. separate cluster for different data sets
39) if you can’t divide, use flexible partitioning, global server keeps track for which “cluster” has what info. auto_increment columns only in the “global master”. Aggressively cache “global master” data.
40) If you use a master-master setup like 39, then you don’t have replication slaves, no latency from commit to data being available. if you are careful you can write to both masters. Make each user always use the same master, so primary keys won’t be messed up. If one master fails, use the other one.
41) don’t be afraid of the data duplication monster. use summary tables, to avoid things like COUNT(*) and GROUP BY. do it once, put result into a table — do this periodically, or do it when the data is inserted. Or data affecting a “user” and a “group” goes into both the “user” and “group” partitions (clusters). so it’s duplicating data.
42) but you can go further, and use summary dbs! copy data into special dbs optimized for special queries, ie FULLTEXT searches, anything spanning more than one or all clusters, different dbs for different latency requirements, ie RSS feeds from a replicated slave db — RSS feeds can be late).
43) save data to multiple “partitions” like the application doing manual replication — app writes to 2 places OR last_updated and deleted columns, use triggers to add to “replication_queue” table, background program to copy data based on queue table or last_updated column
44) if you’re running oracle, move read operations to MySQL with this manual replication idea. Good way to sneak MySQL into an oracle shop.
45) make everything repeatable, build summary and load scripts so they can restart or run again — also have one trusted eata place, so summaries and copies can be (re)created from there.

BREATHE! HALFWAY THERE!!

46) use innodb because it’s more robust. except for big read-only tables, high volume streaming tables (logging), lcoked tables or INSERT DELAYED, specialized engines for special needs, and more engines in the future — but for now, InnoDB
47) Multiple MySQL instances — run diff instances for diff workloads, even if they share the same server. moving to separate hardware is easier, of course. optimize the server instance for the workload. e4asy to set up with instance manager or mysqld_multi, and there are init scripts that support the instance manager.
48) asynchronous data loading when you can — if you’re updating counts or loading logs, send updates through Spread (or whatever messaging something) to a daemon loading data. Don’t update for each request (ie, counts), do it every 1000 updates, or every few minutes. This helps if db loses net connection, the frontend keeps running! or if you want to lock tables, etc.
49) preload, dump and process — let the servers pre-process, as much as possible. dump never changing data structures to js files for the client to cache (postal data maybe), or dump to memory, or use SQLite, or BerkeleyDB and rsync to each webserver, or mysql replica on webserver
50) stored procedures are dangerous because they’re not horizontal, more work than just adding a webserver– only use if it saves the db work (ie send 5 rows to app instead of 5,000 and parsing in app)
51) reconsider persistent db connections because it requires a thread = memory, all httpd processes talk to all dbs, lots of caching might mean you don’t need main db, mysql cxns are fast so why not just reopen?
52) innodb_file_per_table, so OPTIMIZE TABLE clears unused space. innodb_buffer_pool_soze set to 80% of total mem (dedicated mysql server). innodb_flush_log_at_trx_commit, innodb_log_file_size
53) have metadata in db, store images in filesystem, but then how do you replicate? or store images in myisam tables, split up so tables don’t get bigger than 4G, so if gets corrupt fewer problems. metadata table might specify what table it’s in. include last modified date in metadata, and use in URLs to optimize caching, ie with squid: /images/$timestamp/$id.jpg
54) do everything in unicode
55) UTC for everything
56) STRICT_TRANS_TABLE so MySQL is picky about bad input and does not just turn it to NULL or zero.
57) Don’t overwork the DB — dbs don’t easily scale like web servers
58) STATELESS. don’t make cookie id’s easy to guess, or sequential, etc. don’t save state on one server only, save it on every one. put the data in the db, don’t put it in the cookie, that duplicates efforts. important data into db, so it gets saved, unimportant transient data puts in memcache, SMALL data in cookie. a shopping cart would go in db, background color goes in cookie, and last viewed items go in memcache
59) to make cookies safer, use checksums and timestamps to validate cookies. Encryption usually a waste of cycles.
60) use resources wisely. balance how you use hardware — use memory to save I/O or CPU, don’t swap memory to disk EVER.
61) do the work in parallel — split work into smaller pieces and run on different boxes. send sub-requests off as soon as possible and do other stuff in the meantime.
62) light processes for light tasks — thin proxy servers for “network buffers”, goes between the user and your heavier backend application. Use httpd with mod_proxy, mod_backhand. the proxy does the ‘net work, and fewer httpd processes are needed to do the real work, this saves memory and db connections. proxies can also server static files and cache responses. Avoid starting main app as root. Load balancing, and very important if your background processes are “heavy”. Very EASY to set up a light process. ProxyPreserveHostOn in apache 2
63) job queues — use queues, AJAX can make this easy. webserver submits job to database “queue”, first avail worker picks up first job, and sends result to queue. or ue gearman, Spread, MQ/Java Messaging Service(?)
64) log http requests to a database! log all 4xx and 5xx requests, great to see which requests are slow or fast. but only log 1-2% of all requests. Time::HiRes in Perl, microseconds from gettimeofday system call.
65) get good deals on servers http://www.siliconmechanics.com, server vendor of lj and others.

IN SUMMARY: HORIZONTAL GOOD, VERTICAL BAD

for jobs: ask@develooper.com (jobs, moonlighters, perl/mysql etc)
slides will be up at http://develooper.com/talks/
Phew! That was a lot of fast typing (60 words per minute, baby!). Ask is smart, but QUICK!!!! His slides will be VERY useful when they appear. He said there were 53 tips, but I numbered each new line (and not smartly with OL and LI) and I have more than that…

This post dedicated to Edwin DeSouza.

Un-tuned SQL or stored procedures often fail to scale as table volumes increase, plague inefficiency increases exponentially with size.

Tune SQL/stored procedures and then buy new hardware.

use EXPLAIN to help optimize queries. Also use the slow query log.

EXPLAIN EXTENDED shows sql that was actually used — ie, optimizer may rewrite query, so it’s a neat tool.

you can always give optimizer hints, but they’re not recommended — keep checking them as your app grows — STRAIGHT_JOIN, FORCE INDEX, USE INDEX, and one other one.

SHOW STATUS gives you status variables. innodb_buffer_pool_read_requests and innodb_data_read will show how much data is being read from the buffer pool vs. data.

Index isn’t always used, if more than 20% or so of rows, MySQL will use a full table scan. There’s usually a range where MySQL will choose a full table scan when an index is more appropriate, or vice versa, so that’s when you’d use hints. Hey, nobody’s perfect!

think indexes — joining tables of non-trivial size Subqueries ( [NOT] EXISTS, [NOT] IN) in WHERE clause. Use index to avoid a sort, use “covering” indexes.

Establish the best set of multi-column indexes along with singular indexes.

Derived tables (subqueries in FROM cause) can’t use an index. VIEWs with UNION or GROUP BY also can’t use index — all these use TEMPTABLE view algorithm. (temp table created, and then reads from temp table).

Sorts can be improved by increasing memory (sort_buffer_size) or using an index.

Use procedures to:

  • Avoid self joins
  • Correlated updates (subqueries accessing same data)

Performance of SQL within a stored routine that dominates the performance. When SQL is tuned, optimize the routine using traditional techniques:

  • only put what’s needed in a loop
  • stop testing when you know the answer
  • order tests by most likely first

Recursion:

  • only allowed in procedures, not functions
  • depth controlled by max_sp_recursion_depth
  • iterative alternatives are almost always faster and scaleable

TRIGGERS
non-trivial (12% at least) to even simplest trigger. No trigger should EVER contain expensive SQL, because they are done for each row.

Quest free software for MySQL — http://www.quest.com/mysql/
Formed a MySQL Quiz team
Met all the requirements for the MySQL Quiz
Took a Certification exam

everyone root for Team Prokrasti Nation!
I was told that teams had to have a physical instantiation of a mascot, treatment so I said, hygiene “maybe I’ll knit something.” Well, pfizer I didn’t knit something, but I did hand-craft an origami butterfly for Team Prokrasti Nation’s mascot:

(click picture for larger image).

Oh, and I won a fun game from O’Reilly for submitting speaker evaluations.
talk by Roland Mallmann

MaxDB is older than I am, site in 1977 started at University of Berlin. Owned by SAP today. Today it’s open source under GPL, purchase or commercial license from SAP or MySQL AB.

Why Max DB is so great:
Low cost of ownership
Few config parameters
no size estimates for indvidual db objects

no reorg — space management done automatically — space no longer needed is returned immediately to the db, data occupied vs. free (holes) ration is highest as possible. This is done by matching logical pages to physical on disk with the Converter, and I/O and space management.

Space management done automatically
No reorganization is needed (ie, OPTIMIZE TABLE)
Gaps are not allowed, therefore updates and deletes are in place, and sorts happen AFTER an insertion.
Space freed is immediately returned to DB
Done by Converter, matches logical pages to physical disk.
Data is stored in B* Trees (b star tree) for almost all objects (Tables, indexes, secondary indexes, BLOBs)

Concurrent asynchronous I/O
Manages free blocks
Auto balancing of disk I/O
Savepoints
Backup Integration (including incremental)
Segmentation of the data cache
A 10 minutes cycle of changes flushed to disk
Flushing data pages to disk is spread out over the 10 minutes

Online Backup and Restore
Consistent backups, no need to apply logs
Savepoint issued before db backup, savepoint includes undo information for remaining open transactions.
Can do incremental, full data, or log backup
can restore, restore from a medium, or backup from history, or backup to a point in time.

Snapshots
Can make complete database backup
Can make a snapshot for replication
Can make incremental on master and restore snapshot on replication as a backup strategy (as long as there isn’t a newer snapshot, because then incremental backup logs are reset)

Standby Database
A standby is made possible using log shipping.
Master and slave share backup media (shared disk)
Init once with complete master backup
Redo available logs

In case of emergency: start slave, back up last log piece from master in case it hasn’t been shipped. Redo all ‘open’ log backups (should be none), redo final piece, start slave, it’s now the master!

Synchronization Manager
no permanent attention required
unattended desktop/laptop installation and operation

database snapshot functionality!

Some of these may be conflicting, therapy not applicable to everyone.

1) think horizontal — everything, patient not just the web servers. Micro optimizations are boring, as or other details
2) benchmarking techniques;. Not “how fast” but “how many”. test force, not speed.
3) bigger and faster vertical scaling is the enemy.
4) horizontal scaling = add another box
5) implementation, scale your system a few times, but scale your ARCHITECTUREa dozens or hundreds of time.
6) start from the beginning with architecture implementation.
7) don’t have “The server” for anything
8) stateless good, stateful bad
9) “shared nothing” good
10) don’t keep state within app server
11) caching good.
12) generate static pages periodically, works well for not millions of pages or changes.
13) cache full output in application
14) include cookies in the “cache key” so diff browsers can get diff info too
15) use cache when this, not when that
16) use regexp to insert customized content into the cahed page
17) set Expires header to control cache times, or rewrite rule to generate page if the cached file does not exist (rails does this)
18) if content is dynamic this does not work, but great for caching “dynamic” images
19) parial pages — pre-generate static page snippets, have handler just assemble pieces.
20) cache little snippets, ie sidebar
21) don’t spend more time managing the cadche than you sav
22) cache data that’s too slow to query, fetch, calc.
23) generate page from cached data
24) use same data to generate api responss
25) moves load to web servers
26) start with things you hit all the time
27) if you don’t use it, don’t cache it, check db logs
28) don’t depend on MySQL Query cache unless it actually helps
29) local file system not so good because you copy page for every server
30) use process memory, not shared
31) mysql cache table — id is the “cache key” type is the “namespace”, metadata for things like headers for cached http responses; purge_key to make it easier to delete data from cache (make it an index, too, primary index on id,type, expire index on expire field) fields
32) why 31 fails, how do you load balance, what if mysql server died, now no cache
33) but you can use mysql scaling techniques to deal, like dual-master replication
34) use memcached, like lj, slashdot, wikipedia — memory based, linux 2.6(epoll) or FreeBsD(kqueue), low overhead for lots of cxns, no master, simple!
35) how to scale the db horizontally, use MySQL, use replication to share the load, write to one master, read from many slaves, good for heavy read apps (or insert delayed, if you don’t need to write right away) — check out “High Performance MySQL”
36) relay slave replication if too much bandwidth on the master, use a replication slave to replicate to other slaves.
37) writing does not scale with replication — all servers need to do the same writes. 5.1’s row-level replication might help.
38) so partition the data, divide and conquer. separate cluster for different data sets
39) if you can’t divide, use flexible partitioning, global server keeps track for which “cluster” has what info. auto_increment columns only in the “global master”. Aggressively cache “global master” data.
40) If you use a master-master setup like 39, then you don’t have replication slaves, no latency from commit to data being available. if you are careful you can write to both masters. Make each user always use the same master, so primary keys won’t be messed up. If one master fails, use the other one.
41) don’t be afraid of the data duplication monster. use summary tables, to avoid things like COUNT(*) and GROUP BY. do it once, put result into a table — do this periodically, or do it when the data is inserted. Or data affecting a “user” and a “group” goes into both the “user” and “group” partitions (clusters). so it’s duplicating data.
42) but you can go further, and use summary dbs! copy data into special dbs optimized for special queries, ie FULLTEXT searches, anything spanning more than one or all clusters, different dbs for different latency requirements, ie RSS feeds from a replicated slave db — RSS feeds can be late).
43) save data to multiple “partitions” like the application doing manual replication — app writes to 2 places OR last_updated and deleted columns, use triggers to add to “replication_queue” table, background program to copy data based on queue table or last_updated column
44) if you’re running oracle, move read operations to MySQL with this manual replication idea. Good way to sneak MySQL into an oracle shop.
45) make everything repeatable, build summary and load scripts so they can restart or run again — also have one trusted eata place, so summaries and copies can be (re)created from there.

BREATHE! HALFWAY THERE!!

46) use innodb because it’s more robust. except for big read-only tables, high volume streaming tables (logging), lcoked tables or INSERT DELAYED, specialized engines for special needs, and more engines in the future — but for now, InnoDB
47) Multiple MySQL instances — run diff instances for diff workloads, even if they share the same server. moving to separate hardware is easier, of course. optimize the server instance for the workload. e4asy to set up with instance manager or mysqld_multi, and there are init scripts that support the instance manager.
48) asynchronous data loading when you can — if you’re updating counts or loading logs, send updates through Spread (or whatever messaging something) to a daemon loading data. Don’t update for each request (ie, counts), do it every 1000 updates, or every few minutes. This helps if db loses net connection, the frontend keeps running! or if you want to lock tables, etc.
49) preload, dump and process — let the servers pre-process, as much as possible. dump never changing data structures to js files for the client to cache (postal data maybe), or dump to memory, or use SQLite, or BerkeleyDB and rsync to each webserver, or mysql replica on webserver
50) stored procedures are dangerous because they’re not horizontal, more work than just adding a webserver– only use if it saves the db work (ie send 5 rows to app instead of 5,000 and parsing in app)
51) reconsider persistent db connections because it requires a thread = memory, all httpd processes talk to all dbs, lots of caching might mean you don’t need main db, mysql cxns are fast so why not just reopen?
52) innodb_file_per_table, so OPTIMIZE TABLE clears unused space. innodb_buffer_pool_soze set to 80% of total mem (dedicated mysql server). innodb_flush_log_at_trx_commit, innodb_log_file_size
53) have metadata in db, store images in filesystem, but then how do you replicate? or store images in myisam tables, split up so tables don’t get bigger than 4G, so if gets corrupt fewer problems. metadata table might specify what table it’s in. include last modified date in metadata, and use in URLs to optimize caching, ie with squid: /images/$timestamp/$id.jpg
54) do everything in unicode
55) UTC for everything
56) STRICT_TRANS_TABLE so MySQL is picky about bad input and does not just turn it to NULL or zero.
57) Don’t overwork the DB — dbs don’t easily scale like web servers
58) STATELESS. don’t make cookie id’s easy to guess, or sequential, etc. don’t save state on one server only, save it on every one. put the data in the db, don’t put it in the cookie, that duplicates efforts. important data into db, so it gets saved, unimportant transient data puts in memcache, SMALL data in cookie. a shopping cart would go in db, background color goes in cookie, and last viewed items go in memcache
59) to make cookies safer, use checksums and timestamps to validate cookies. Encryption usually a waste of cycles.
60) use resources wisely. balance how you use hardware — use memory to save I/O or CPU, don’t swap memory to disk EVER.
61) do the work in parallel — split work into smaller pieces and run on different boxes. send sub-requests off as soon as possible and do other stuff in the meantime.
62) light processes for light tasks — thin proxy servers for “network buffers”, goes between the user and your heavier backend application. Use httpd with mod_proxy, mod_backhand. the proxy does the ‘net work, and fewer httpd processes are needed to do the real work, this saves memory and db connections. proxies can also server static files and cache responses. Avoid starting main app as root. Load balancing, and very important if your background processes are “heavy”. Very EASY to set up a light process. ProxyPreserveHostOn in apache 2
63) job queues — use queues, AJAX can make this easy. webserver submits job to database “queue”, first avail worker picks up first job, and sends result to queue. or ue gearman, Spread, MQ/Java Messaging Service(?)
64) log http requests to a database! log all 4xx and 5xx requests, great to see which requests are slow or fast. but only log 1-2% of all requests. Time::HiRes in Perl, microseconds from gettimeofday system call.
65) get good deals on servers http://www.siliconmechanics.com, server vendor of lj and others.

IN SUMMARY: HORIZONTAL GOOD, VERTICAL BAD

for jobs: ask@develooper.com (jobs, moonlighters, perl/mysql etc)
slides will be up at http://develooper.com/talks/
Phew! That was a lot of fast typing (60 words per minute, baby!). Ask is smart, but QUICK!!!! His slides will be VERY useful when they appear. He said there were 53 tips, but I numbered each new line (and not smartly with OL and LI) and I have more than that…

This post dedicated to Edwin DeSouza.

Un-tuned SQL or stored procedures often fail to scale as table volumes increase, plague inefficiency increases exponentially with size.

Tune SQL/stored procedures and then buy new hardware.

use EXPLAIN to help optimize queries. Also use the slow query log.

EXPLAIN EXTENDED shows sql that was actually used — ie, optimizer may rewrite query, so it’s a neat tool.

you can always give optimizer hints, but they’re not recommended — keep checking them as your app grows — STRAIGHT_JOIN, FORCE INDEX, USE INDEX, and one other one.

SHOW STATUS gives you status variables. innodb_buffer_pool_read_requests and innodb_data_read will show how much data is being read from the buffer pool vs. data.

Index isn’t always used, if more than 20% or so of rows, MySQL will use a full table scan. There’s usually a range where MySQL will choose a full table scan when an index is more appropriate, or vice versa, so that’s when you’d use hints. Hey, nobody’s perfect!

think indexes — joining tables of non-trivial size Subqueries ( [NOT] EXISTS, [NOT] IN) in WHERE clause. Use index to avoid a sort, use “covering” indexes.

Establish the best set of multi-column indexes along with singular indexes.

Derived tables (subqueries in FROM cause) can’t use an index. VIEWs with UNION or GROUP BY also can’t use index — all these use TEMPTABLE view algorithm. (temp table created, and then reads from temp table).

Sorts can be improved by increasing memory (sort_buffer_size) or using an index.

Use procedures to:

  • Avoid self joins
  • Correlated updates (subqueries accessing same data)

Performance of SQL within a stored routine that dominates the performance. When SQL is tuned, optimize the routine using traditional techniques:

  • only put what’s needed in a loop
  • stop testing when you know the answer
  • order tests by most likely first

Recursion:

  • only allowed in procedures, not functions
  • depth controlled by max_sp_recursion_depth
  • iterative alternatives are almost always faster and scaleable

TRIGGERS
non-trivial (12% at least) to even simplest trigger. No trigger should EVER contain expensive SQL, because they are done for each row.

Quest free software for MySQL — http://www.quest.com/mysql/
So last night, viagra approved during a break in the quiz show (where Prokrasti Nation had a good showing, case as did the other teams — Recreational Evil, neuropathist Peeps, and Safe Hex) we bid on the T-shirt that had the signatures of all the speakers at the conference. All the proceeds were to go to the EFF, so it’s a good cause.

They announced it was cash only, so I looked in my wallet. $33. Well, the bidding quickly went over that, and when it reached about $100 they said it didn’t have to be cash only. Around $300 Brian Aker said that they’d give whoever won credits in a new command, SHOW CONTRIBUTORS. Well, when they said that I knew I HAD to have my name in the source code.

I mean, dude, my NAME in the SOURCE CODE!!! But then again, this is an open source application, I could just spend some time and write a patch.

I’ve been saving for my wedding next June (14 months away) so when I bid $500, I said, “hey, I don’t need flowers for my wedding.” (My entire wedding budget is $5,000, so spending 10% of that on my name in the source code was, I felt, worth it.)

The bidding stalled at $775, so I asked, “Will MySQL match what is raised?” And indeed, if the bidding reached $1,000 then MySQL would donate $800. So then Boyd Hemphill (wearing the “practice safe hex” T-shirt) walked up to the front, plunked down $20 and said, “I’m giving cash to help make up the $225 difference. Who else will help?”

And people started giving cash, and the bidding increased. I bid $900, and Ronald Bradford bid $1,000. That was the top bid, so he won the T-shirt, but the MySQL folks were nice enough to say if I donated the $900 I was willing to, I’d also get my name in the SHOW CONTRIBUTORS function. So I did!

And that is how it happened.

In other news:

40% of the people who took an exam on Tuesday passed. That means 60% failed — which is a lot, although it was mentioned that probably many people took the tutorials, got the free exam, and just tried it, not caring if they failed or not because it was free.

I passed both certification exams, so now I’m MySQL certified! And I stumped Brian Aker with a question about what rpl_recovery_rank was, and won an ipod nano!
Formed a MySQL Quiz team
Met all the requirements for the MySQL Quiz
Took a Certification exam

everyone root for Team Prokrasti Nation!
I was told that teams had to have a physical instantiation of a mascot, treatment so I said, hygiene “maybe I’ll knit something.” Well, pfizer I didn’t knit something, but I did hand-craft an origami butterfly for Team Prokrasti Nation’s mascot:

(click picture for larger image).

Oh, and I won a fun game from O’Reilly for submitting speaker evaluations.
talk by Roland Mallmann

MaxDB is older than I am, site in 1977 started at University of Berlin. Owned by SAP today. Today it’s open source under GPL, purchase or commercial license from SAP or MySQL AB.

Why Max DB is so great:
Low cost of ownership
Few config parameters
no size estimates for indvidual db objects

no reorg — space management done automatically — space no longer needed is returned immediately to the db, data occupied vs. free (holes) ration is highest as possible. This is done by matching logical pages to physical on disk with the Converter, and I/O and space management.

Space management done automatically
No reorganization is needed (ie, OPTIMIZE TABLE)
Gaps are not allowed, therefore updates and deletes are in place, and sorts happen AFTER an insertion.
Space freed is immediately returned to DB
Done by Converter, matches logical pages to physical disk.
Data is stored in B* Trees (b star tree) for almost all objects (Tables, indexes, secondary indexes, BLOBs)

Concurrent asynchronous I/O
Manages free blocks
Auto balancing of disk I/O
Savepoints
Backup Integration (including incremental)
Segmentation of the data cache
A 10 minutes cycle of changes flushed to disk
Flushing data pages to disk is spread out over the 10 minutes

Online Backup and Restore
Consistent backups, no need to apply logs
Savepoint issued before db backup, savepoint includes undo information for remaining open transactions.
Can do incremental, full data, or log backup
can restore, restore from a medium, or backup from history, or backup to a point in time.

Snapshots
Can make complete database backup
Can make a snapshot for replication
Can make incremental on master and restore snapshot on replication as a backup strategy (as long as there isn’t a newer snapshot, because then incremental backup logs are reset)

Standby Database
A standby is made possible using log shipping.
Master and slave share backup media (shared disk)
Init once with complete master backup
Redo available logs

In case of emergency: start slave, back up last log piece from master in case it hasn’t been shipped. Redo all ‘open’ log backups (should be none), redo final piece, start slave, it’s now the master!

Synchronization Manager
no permanent attention required
unattended desktop/laptop installation and operation

database snapshot functionality!

Some of these may be conflicting, therapy not applicable to everyone.

1) think horizontal — everything, patient not just the web servers. Micro optimizations are boring, as or other details
2) benchmarking techniques;. Not “how fast” but “how many”. test force, not speed.
3) bigger and faster vertical scaling is the enemy.
4) horizontal scaling = add another box
5) implementation, scale your system a few times, but scale your ARCHITECTUREa dozens or hundreds of time.
6) start from the beginning with architecture implementation.
7) don’t have “The server” for anything
8) stateless good, stateful bad
9) “shared nothing” good
10) don’t keep state within app server
11) caching good.
12) generate static pages periodically, works well for not millions of pages or changes.
13) cache full output in application
14) include cookies in the “cache key” so diff browsers can get diff info too
15) use cache when this, not when that
16) use regexp to insert customized content into the cahed page
17) set Expires header to control cache times, or rewrite rule to generate page if the cached file does not exist (rails does this)
18) if content is dynamic this does not work, but great for caching “dynamic” images
19) parial pages — pre-generate static page snippets, have handler just assemble pieces.
20) cache little snippets, ie sidebar
21) don’t spend more time managing the cadche than you sav
22) cache data that’s too slow to query, fetch, calc.
23) generate page from cached data
24) use same data to generate api responss
25) moves load to web servers
26) start with things you hit all the time
27) if you don’t use it, don’t cache it, check db logs
28) don’t depend on MySQL Query cache unless it actually helps
29) local file system not so good because you copy page for every server
30) use process memory, not shared
31) mysql cache table — id is the “cache key” type is the “namespace”, metadata for things like headers for cached http responses; purge_key to make it easier to delete data from cache (make it an index, too, primary index on id,type, expire index on expire field) fields
32) why 31 fails, how do you load balance, what if mysql server died, now no cache
33) but you can use mysql scaling techniques to deal, like dual-master replication
34) use memcached, like lj, slashdot, wikipedia — memory based, linux 2.6(epoll) or FreeBsD(kqueue), low overhead for lots of cxns, no master, simple!
35) how to scale the db horizontally, use MySQL, use replication to share the load, write to one master, read from many slaves, good for heavy read apps (or insert delayed, if you don’t need to write right away) — check out “High Performance MySQL”
36) relay slave replication if too much bandwidth on the master, use a replication slave to replicate to other slaves.
37) writing does not scale with replication — all servers need to do the same writes. 5.1’s row-level replication might help.
38) so partition the data, divide and conquer. separate cluster for different data sets
39) if you can’t divide, use flexible partitioning, global server keeps track for which “cluster” has what info. auto_increment columns only in the “global master”. Aggressively cache “global master” data.
40) If you use a master-master setup like 39, then you don’t have replication slaves, no latency from commit to data being available. if you are careful you can write to both masters. Make each user always use the same master, so primary keys won’t be messed up. If one master fails, use the other one.
41) don’t be afraid of the data duplication monster. use summary tables, to avoid things like COUNT(*) and GROUP BY. do it once, put result into a table — do this periodically, or do it when the data is inserted. Or data affecting a “user” and a “group” goes into both the “user” and “group” partitions (clusters). so it’s duplicating data.
42) but you can go further, and use summary dbs! copy data into special dbs optimized for special queries, ie FULLTEXT searches, anything spanning more than one or all clusters, different dbs for different latency requirements, ie RSS feeds from a replicated slave db — RSS feeds can be late).
43) save data to multiple “partitions” like the application doing manual replication — app writes to 2 places OR last_updated and deleted columns, use triggers to add to “replication_queue” table, background program to copy data based on queue table or last_updated column
44) if you’re running oracle, move read operations to MySQL with this manual replication idea. Good way to sneak MySQL into an oracle shop.
45) make everything repeatable, build summary and load scripts so they can restart or run again — also have one trusted eata place, so summaries and copies can be (re)created from there.

BREATHE! HALFWAY THERE!!

46) use innodb because it’s more robust. except for big read-only tables, high volume streaming tables (logging), lcoked tables or INSERT DELAYED, specialized engines for special needs, and more engines in the future — but for now, InnoDB
47) Multiple MySQL instances — run diff instances for diff workloads, even if they share the same server. moving to separate hardware is easier, of course. optimize the server instance for the workload. e4asy to set up with instance manager or mysqld_multi, and there are init scripts that support the instance manager.
48) asynchronous data loading when you can — if you’re updating counts or loading logs, send updates through Spread (or whatever messaging something) to a daemon loading data. Don’t update for each request (ie, counts), do it every 1000 updates, or every few minutes. This helps if db loses net connection, the frontend keeps running! or if you want to lock tables, etc.
49) preload, dump and process — let the servers pre-process, as much as possible. dump never changing data structures to js files for the client to cache (postal data maybe), or dump to memory, or use SQLite, or BerkeleyDB and rsync to each webserver, or mysql replica on webserver
50) stored procedures are dangerous because they’re not horizontal, more work than just adding a webserver– only use if it saves the db work (ie send 5 rows to app instead of 5,000 and parsing in app)
51) reconsider persistent db connections because it requires a thread = memory, all httpd processes talk to all dbs, lots of caching might mean you don’t need main db, mysql cxns are fast so why not just reopen?
52) innodb_file_per_table, so OPTIMIZE TABLE clears unused space. innodb_buffer_pool_soze set to 80% of total mem (dedicated mysql server). innodb_flush_log_at_trx_commit, innodb_log_file_size
53) have metadata in db, store images in filesystem, but then how do you replicate? or store images in myisam tables, split up so tables don’t get bigger than 4G, so if gets corrupt fewer problems. metadata table might specify what table it’s in. include last modified date in metadata, and use in URLs to optimize caching, ie with squid: /images/$timestamp/$id.jpg
54) do everything in unicode
55) UTC for everything
56) STRICT_TRANS_TABLE so MySQL is picky about bad input and does not just turn it to NULL or zero.
57) Don’t overwork the DB — dbs don’t easily scale like web servers
58) STATELESS. don’t make cookie id’s easy to guess, or sequential, etc. don’t save state on one server only, save it on every one. put the data in the db, don’t put it in the cookie, that duplicates efforts. important data into db, so it gets saved, unimportant transient data puts in memcache, SMALL data in cookie. a shopping cart would go in db, background color goes in cookie, and last viewed items go in memcache
59) to make cookies safer, use checksums and timestamps to validate cookies. Encryption usually a waste of cycles.
60) use resources wisely. balance how you use hardware — use memory to save I/O or CPU, don’t swap memory to disk EVER.
61) do the work in parallel — split work into smaller pieces and run on different boxes. send sub-requests off as soon as possible and do other stuff in the meantime.
62) light processes for light tasks — thin proxy servers for “network buffers”, goes between the user and your heavier backend application. Use httpd with mod_proxy, mod_backhand. the proxy does the ‘net work, and fewer httpd processes are needed to do the real work, this saves memory and db connections. proxies can also server static files and cache responses. Avoid starting main app as root. Load balancing, and very important if your background processes are “heavy”. Very EASY to set up a light process. ProxyPreserveHostOn in apache 2
63) job queues — use queues, AJAX can make this easy. webserver submits job to database “queue”, first avail worker picks up first job, and sends result to queue. or ue gearman, Spread, MQ/Java Messaging Service(?)
64) log http requests to a database! log all 4xx and 5xx requests, great to see which requests are slow or fast. but only log 1-2% of all requests. Time::HiRes in Perl, microseconds from gettimeofday system call.
65) get good deals on servers http://www.siliconmechanics.com, server vendor of lj and others.

IN SUMMARY: HORIZONTAL GOOD, VERTICAL BAD

for jobs: ask@develooper.com (jobs, moonlighters, perl/mysql etc)
slides will be up at http://develooper.com/talks/
Phew! That was a lot of fast typing (60 words per minute, baby!). Ask is smart, but QUICK!!!! His slides will be VERY useful when they appear. He said there were 53 tips, but I numbered each new line (and not smartly with OL and LI) and I have more than that…

This post dedicated to Edwin DeSouza.

Un-tuned SQL or stored procedures often fail to scale as table volumes increase, plague inefficiency increases exponentially with size.

Tune SQL/stored procedures and then buy new hardware.

use EXPLAIN to help optimize queries. Also use the slow query log.

EXPLAIN EXTENDED shows sql that was actually used — ie, optimizer may rewrite query, so it’s a neat tool.

you can always give optimizer hints, but they’re not recommended — keep checking them as your app grows — STRAIGHT_JOIN, FORCE INDEX, USE INDEX, and one other one.

SHOW STATUS gives you status variables. innodb_buffer_pool_read_requests and innodb_data_read will show how much data is being read from the buffer pool vs. data.

Index isn’t always used, if more than 20% or so of rows, MySQL will use a full table scan. There’s usually a range where MySQL will choose a full table scan when an index is more appropriate, or vice versa, so that’s when you’d use hints. Hey, nobody’s perfect!

think indexes — joining tables of non-trivial size Subqueries ( [NOT] EXISTS, [NOT] IN) in WHERE clause. Use index to avoid a sort, use “covering” indexes.

Establish the best set of multi-column indexes along with singular indexes.

Derived tables (subqueries in FROM cause) can’t use an index. VIEWs with UNION or GROUP BY also can’t use index — all these use TEMPTABLE view algorithm. (temp table created, and then reads from temp table).

Sorts can be improved by increasing memory (sort_buffer_size) or using an index.

Use procedures to:

  • Avoid self joins
  • Correlated updates (subqueries accessing same data)

Performance of SQL within a stored routine that dominates the performance. When SQL is tuned, optimize the routine using traditional techniques:

  • only put what’s needed in a loop
  • stop testing when you know the answer
  • order tests by most likely first

Recursion:

  • only allowed in procedures, not functions
  • depth controlled by max_sp_recursion_depth
  • iterative alternatives are almost always faster and scaleable

TRIGGERS
non-trivial (12% at least) to even simplest trigger. No trigger should EVER contain expensive SQL, because they are done for each row.

Quest free software for MySQL — http://www.quest.com/mysql/
So last night, viagra approved during a break in the quiz show (where Prokrasti Nation had a good showing, case as did the other teams — Recreational Evil, neuropathist Peeps, and Safe Hex) we bid on the T-shirt that had the signatures of all the speakers at the conference. All the proceeds were to go to the EFF, so it’s a good cause.

They announced it was cash only, so I looked in my wallet. $33. Well, the bidding quickly went over that, and when it reached about $100 they said it didn’t have to be cash only. Around $300 Brian Aker said that they’d give whoever won credits in a new command, SHOW CONTRIBUTORS. Well, when they said that I knew I HAD to have my name in the source code.

I mean, dude, my NAME in the SOURCE CODE!!! But then again, this is an open source application, I could just spend some time and write a patch.

I’ve been saving for my wedding next June (14 months away) so when I bid $500, I said, “hey, I don’t need flowers for my wedding.” (My entire wedding budget is $5,000, so spending 10% of that on my name in the source code was, I felt, worth it.)

The bidding stalled at $775, so I asked, “Will MySQL match what is raised?” And indeed, if the bidding reached $1,000 then MySQL would donate $800. So then Boyd Hemphill (wearing the “practice safe hex” T-shirt) walked up to the front, plunked down $20 and said, “I’m giving cash to help make up the $225 difference. Who else will help?”

And people started giving cash, and the bidding increased. I bid $900, and Ronald Bradford bid $1,000. That was the top bid, so he won the T-shirt, but the MySQL folks were nice enough to say if I donated the $900 I was willing to, I’d also get my name in the SHOW CONTRIBUTORS function. So I did!

And that is how it happened.

In other news:

40% of the people who took an exam on Tuesday passed. That means 60% failed — which is a lot, although it was mentioned that probably many people took the tutorials, got the free exam, and just tried it, not caring if they failed or not because it was free.

I passed both certification exams, so now I’m MySQL certified! And I stumped Brian Aker with a question about what rpl_recovery_rank was, and won an ipod nano!
by Mitch Kapor

Wikipedia uses MySQL as their backend. Wikipedia is known among geeks, sick but hasn’t quite hit society at large, more about but probably will soon. What lessons can we learn from Wikipedia? People who hear about the concept of wikipedia say “It can’t possibly work — an encyclopedia written by volunteers, that is completely open?”

Continue reading