Category Archives: Tools

This Just In: The LAMP Stack is Popular

Dorsal Source has a list of where you can get MySQL binaries — official and unofficial — up at:

http://www.dorsalsource.org
Dorsal Source has a list of where you can get MySQL binaries — official and unofficial — up at:

http://www.dorsalsource.org
In this episode, angina I go over database normalization in general and explain 1st Normal Form (1NF) in depth.

Direct play episode 7 at:
http://technocation.org/content/oursql-episode-7%3A-what%2526%2523039%3Bs-it-be-normal%3F-1

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

Links:
MySQL binaries centralized repository: http://www.dorsalsource.org

SQLzoo

http://www.sqlzoo.net

Links about database normalization:
http://en.wikipedia.org/wiki/1NF

http://www.datamodel.org/NormalizationRules.html

http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html

http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html

Acknowledgements

http://www.technocation.org

http://music.podshow.com

http://www.russellwolff.com

http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish

Feedback

If you have any feedback about this podcast, dosage or want to suggest topics to cover in future podcasts, physician please email

podcast@technocation.org

You can also:

Call the comment line at +1 617-674-2369

Or use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Or use the Technocation forums:
http://technocation.org/forum
Dorsal Source has a list of where you can get MySQL binaries — official and unofficial — up at:

http://www.dorsalsource.org
In this episode, angina I go over database normalization in general and explain 1st Normal Form (1NF) in depth.

Direct play episode 7 at:
http://technocation.org/content/oursql-episode-7%3A-what%2526%2523039%3Bs-it-be-normal%3F-1

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

Links:
MySQL binaries centralized repository: http://www.dorsalsource.org

SQLzoo

http://www.sqlzoo.net

Links about database normalization:
http://en.wikipedia.org/wiki/1NF

http://www.datamodel.org/NormalizationRules.html

http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html

http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html

Acknowledgements

http://www.technocation.org

http://music.podshow.com

http://www.russellwolff.com

http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish

Feedback

If you have any feedback about this podcast, dosage or want to suggest topics to cover in future podcasts, physician please email

podcast@technocation.org

You can also:

Call the comment line at +1 617-674-2369

Or use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Or use the Technocation forums:
http://technocation.org/forum
Listener feedback:

MySQL will go public. Would you buy stock if you had the money? Why or why not?
Call the comment line at +1 617-674-2369 (US phone number)

Use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Leave a message at the Technocation forums:
http://technocation.org/forum

Send an e-mail to podcast@technocation.org

Episode 8 Show Notes:
This episode’s feature is basic MySQL Security. Not only will we discuss what the basic security is, order but we’ll discuss the *why*s, not just the how’s.

Direct play this episode at:
http://technocation.org/content/oursql-episode-8%3A-basic-mysql-security-0

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

News
MySQL offers an unlimited number of Gold licenses per year for $40,000:
http://mysql.com/products/enterprise/unlimited.html
http://mysql.com/products/enterprise/features.html

MySQL begins to talk about going public: http://www.businessreviewonline.com/os/archives/2007/01/mysql_set_to_jo.html

Learning Resource:
http://www.hackmysql.com

Feature — MySQL Security:
Bruce Scneier’s latest Crypto-Gram newsletter refers to an article where a person gets on an airplane, having bypassed all airport security via climbing a fence.
http://www.schneier.com/crypto-gram-0701.html
http://www.newsobserver.com/102/story/523482.html

Feedback
To leave a comment, suggestion, question or other feedback:

Call the comment line at +1 617-674-2369 (US phone number)

Use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Leave a message at the Technocation forums:
http://technocation.org/forum

Send an e-mail to podcast@technocation.org

Acknowledgements/Sponsors
www.technocation.org
http://music.podshow.com
www.russellwolff.com
http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish
Dorsal Source has a list of where you can get MySQL binaries — official and unofficial — up at:

http://www.dorsalsource.org
In this episode, angina I go over database normalization in general and explain 1st Normal Form (1NF) in depth.

Direct play episode 7 at:
http://technocation.org/content/oursql-episode-7%3A-what%2526%2523039%3Bs-it-be-normal%3F-1

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

Links:
MySQL binaries centralized repository: http://www.dorsalsource.org

SQLzoo

http://www.sqlzoo.net

Links about database normalization:
http://en.wikipedia.org/wiki/1NF

http://www.datamodel.org/NormalizationRules.html

http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html

http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html

Acknowledgements

http://www.technocation.org

http://music.podshow.com

http://www.russellwolff.com

http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish

Feedback

If you have any feedback about this podcast, dosage or want to suggest topics to cover in future podcasts, physician please email

podcast@technocation.org

You can also:

Call the comment line at +1 617-674-2369

Or use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Or use the Technocation forums:
http://technocation.org/forum
Listener feedback:

MySQL will go public. Would you buy stock if you had the money? Why or why not?
Call the comment line at +1 617-674-2369 (US phone number)

Use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Leave a message at the Technocation forums:
http://technocation.org/forum

Send an e-mail to podcast@technocation.org

Episode 8 Show Notes:
This episode’s feature is basic MySQL Security. Not only will we discuss what the basic security is, order but we’ll discuss the *why*s, not just the how’s.

Direct play this episode at:
http://technocation.org/content/oursql-episode-8%3A-basic-mysql-security-0

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

News
MySQL offers an unlimited number of Gold licenses per year for $40,000:
http://mysql.com/products/enterprise/unlimited.html
http://mysql.com/products/enterprise/features.html

MySQL begins to talk about going public: http://www.businessreviewonline.com/os/archives/2007/01/mysql_set_to_jo.html

Learning Resource:
http://www.hackmysql.com

Feature — MySQL Security:
Bruce Scneier’s latest Crypto-Gram newsletter refers to an article where a person gets on an airplane, having bypassed all airport security via climbing a fence.
http://www.schneier.com/crypto-gram-0701.html
http://www.newsobserver.com/102/story/523482.html

Feedback
To leave a comment, suggestion, question or other feedback:

Call the comment line at +1 617-674-2369 (US phone number)

Use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Leave a message at the Technocation forums:
http://technocation.org/forum

Send an e-mail to podcast@technocation.org

Acknowledgements/Sponsors
www.technocation.org
http://music.podshow.com
www.russellwolff.com
http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish
Thanx to Rich McIver for passing along this link:

http://www.businessintelligencelowdown.com/2007/02/top_10_largest_.html

I’m amused mostly because the article interchanges “database” with “data storage” — many of the sites have “digital documents” included in their count, oncologist and YouTube is in there completely with the amount of space their videos take up. But is all this stuff stored in databases? I do not think so. Anyone know for sure?
Dorsal Source has a list of where you can get MySQL binaries — official and unofficial — up at:

http://www.dorsalsource.org
In this episode, angina I go over database normalization in general and explain 1st Normal Form (1NF) in depth.

Direct play episode 7 at:
http://technocation.org/content/oursql-episode-7%3A-what%2526%2523039%3Bs-it-be-normal%3F-1

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

Links:
MySQL binaries centralized repository: http://www.dorsalsource.org

SQLzoo

http://www.sqlzoo.net

Links about database normalization:
http://en.wikipedia.org/wiki/1NF

http://www.datamodel.org/NormalizationRules.html

http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html

http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html

Acknowledgements

http://www.technocation.org

http://music.podshow.com

http://www.russellwolff.com

http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish

Feedback

If you have any feedback about this podcast, dosage or want to suggest topics to cover in future podcasts, physician please email

podcast@technocation.org

You can also:

Call the comment line at +1 617-674-2369

Or use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Or use the Technocation forums:
http://technocation.org/forum
Listener feedback:

MySQL will go public. Would you buy stock if you had the money? Why or why not?
Call the comment line at +1 617-674-2369 (US phone number)

Use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Leave a message at the Technocation forums:
http://technocation.org/forum

Send an e-mail to podcast@technocation.org

Episode 8 Show Notes:
This episode’s feature is basic MySQL Security. Not only will we discuss what the basic security is, order but we’ll discuss the *why*s, not just the how’s.

Direct play this episode at:
http://technocation.org/content/oursql-episode-8%3A-basic-mysql-security-0

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

News
MySQL offers an unlimited number of Gold licenses per year for $40,000:
http://mysql.com/products/enterprise/unlimited.html
http://mysql.com/products/enterprise/features.html

MySQL begins to talk about going public: http://www.businessreviewonline.com/os/archives/2007/01/mysql_set_to_jo.html

Learning Resource:
http://www.hackmysql.com

Feature — MySQL Security:
Bruce Scneier’s latest Crypto-Gram newsletter refers to an article where a person gets on an airplane, having bypassed all airport security via climbing a fence.
http://www.schneier.com/crypto-gram-0701.html
http://www.newsobserver.com/102/story/523482.html

Feedback
To leave a comment, suggestion, question or other feedback:

Call the comment line at +1 617-674-2369 (US phone number)

Use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Leave a message at the Technocation forums:
http://technocation.org/forum

Send an e-mail to podcast@technocation.org

Acknowledgements/Sponsors
www.technocation.org
http://music.podshow.com
www.russellwolff.com
http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish
Thanx to Rich McIver for passing along this link:

http://www.businessintelligencelowdown.com/2007/02/top_10_largest_.html

I’m amused mostly because the article interchanges “database” with “data storage” — many of the sites have “digital documents” included in their count, oncologist and YouTube is in there completely with the amount of space their videos take up. But is all this stuff stored in databases? I do not think so. Anyone know for sure?
http://www.regdeveloper.co.uk/2006/07/18/multivalued_datatypes_access/

This is an interesting read — it would be awesome if MySQL just used the “SET” or “ENUM” data types to be a placeholder for a join table, doctor that it would create automatically for you. Of course, that’s a new level of functionality — MySQL does not implicitly create permanent tables with any commands. But it would be neat.
Dorsal Source has a list of where you can get MySQL binaries — official and unofficial — up at:

http://www.dorsalsource.org
In this episode, angina I go over database normalization in general and explain 1st Normal Form (1NF) in depth.

Direct play episode 7 at:
http://technocation.org/content/oursql-episode-7%3A-what%2526%2523039%3Bs-it-be-normal%3F-1

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

Links:
MySQL binaries centralized repository: http://www.dorsalsource.org

SQLzoo

http://www.sqlzoo.net

Links about database normalization:
http://en.wikipedia.org/wiki/1NF

http://www.datamodel.org/NormalizationRules.html

http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html

http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html

Acknowledgements

http://www.technocation.org

http://music.podshow.com

http://www.russellwolff.com

http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish

Feedback

If you have any feedback about this podcast, dosage or want to suggest topics to cover in future podcasts, physician please email

podcast@technocation.org

You can also:

Call the comment line at +1 617-674-2369

Or use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Or use the Technocation forums:
http://technocation.org/forum
Listener feedback:

MySQL will go public. Would you buy stock if you had the money? Why or why not?
Call the comment line at +1 617-674-2369 (US phone number)

Use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Leave a message at the Technocation forums:
http://technocation.org/forum

Send an e-mail to podcast@technocation.org

Episode 8 Show Notes:
This episode’s feature is basic MySQL Security. Not only will we discuss what the basic security is, order but we’ll discuss the *why*s, not just the how’s.

Direct play this episode at:
http://technocation.org/content/oursql-episode-8%3A-basic-mysql-security-0

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

News
MySQL offers an unlimited number of Gold licenses per year for $40,000:
http://mysql.com/products/enterprise/unlimited.html
http://mysql.com/products/enterprise/features.html

MySQL begins to talk about going public: http://www.businessreviewonline.com/os/archives/2007/01/mysql_set_to_jo.html

Learning Resource:
http://www.hackmysql.com

Feature — MySQL Security:
Bruce Scneier’s latest Crypto-Gram newsletter refers to an article where a person gets on an airplane, having bypassed all airport security via climbing a fence.
http://www.schneier.com/crypto-gram-0701.html
http://www.newsobserver.com/102/story/523482.html

Feedback
To leave a comment, suggestion, question or other feedback:

Call the comment line at +1 617-674-2369 (US phone number)

Use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Leave a message at the Technocation forums:
http://technocation.org/forum

Send an e-mail to podcast@technocation.org

Acknowledgements/Sponsors
www.technocation.org
http://music.podshow.com
www.russellwolff.com
http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish
Thanx to Rich McIver for passing along this link:

http://www.businessintelligencelowdown.com/2007/02/top_10_largest_.html

I’m amused mostly because the article interchanges “database” with “data storage” — many of the sites have “digital documents” included in their count, oncologist and YouTube is in there completely with the amount of space their videos take up. But is all this stuff stored in databases? I do not think so. Anyone know for sure?
http://www.regdeveloper.co.uk/2006/07/18/multivalued_datatypes_access/

This is an interesting read — it would be awesome if MySQL just used the “SET” or “ENUM” data types to be a placeholder for a join table, doctor that it would create automatically for you. Of course, that’s a new level of functionality — MySQL does not implicitly create permanent tables with any commands. But it would be neat.
What happened to the MySQL Winter of Code? Are they waiting for winter in Australia?

I live near Boston, more about MA and I can tell you it’s definitely winter in the northern hemisphere….

So what are we waiting for?

Well, I can say this — we’re waiting for people. The Winter of Code idea is a great one, particularly since if MySQL works with academic institutions they could help students find Master’s Projects or part of Ph.D. work. Imagine someone writing a new storage engine and having that earn them a Master’s degree. This is exactly what MySQL needs — more people who understand database internals and best theoretical practices to start coding and see where it goes. Note the “more people” — they already have staff that does this.

I’m guessing the Winter of Code is nonexistent because of other big announcements that have been happening; still, I would love to see some collaboration with institutions and universities to give incentives to participants and push them to do it. Class credit or fulfilling graduate requirements would be perfect, and there would be many submissions.

Tying together MySQL and universities would be a great leap forward and a very important move for MySQL, as it would generate more contributions to the code. And the contest!
Dorsal Source has a list of where you can get MySQL binaries — official and unofficial — up at:

http://www.dorsalsource.org
In this episode, angina I go over database normalization in general and explain 1st Normal Form (1NF) in depth.

Direct play episode 7 at:
http://technocation.org/content/oursql-episode-7%3A-what%2526%2523039%3Bs-it-be-normal%3F-1

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

Links:
MySQL binaries centralized repository: http://www.dorsalsource.org

SQLzoo

http://www.sqlzoo.net

Links about database normalization:
http://en.wikipedia.org/wiki/1NF

http://www.datamodel.org/NormalizationRules.html

http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html

http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html

Acknowledgements

http://www.technocation.org

http://music.podshow.com

http://www.russellwolff.com

http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish

Feedback

If you have any feedback about this podcast, dosage or want to suggest topics to cover in future podcasts, physician please email

podcast@technocation.org

You can also:

Call the comment line at +1 617-674-2369

Or use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Or use the Technocation forums:
http://technocation.org/forum
Listener feedback:

MySQL will go public. Would you buy stock if you had the money? Why or why not?
Call the comment line at +1 617-674-2369 (US phone number)

Use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Leave a message at the Technocation forums:
http://technocation.org/forum

Send an e-mail to podcast@technocation.org

Episode 8 Show Notes:
This episode’s feature is basic MySQL Security. Not only will we discuss what the basic security is, order but we’ll discuss the *why*s, not just the how’s.

Direct play this episode at:
http://technocation.org/content/oursql-episode-8%3A-basic-mysql-security-0

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

News
MySQL offers an unlimited number of Gold licenses per year for $40,000:
http://mysql.com/products/enterprise/unlimited.html
http://mysql.com/products/enterprise/features.html

MySQL begins to talk about going public: http://www.businessreviewonline.com/os/archives/2007/01/mysql_set_to_jo.html

Learning Resource:
http://www.hackmysql.com

Feature — MySQL Security:
Bruce Scneier’s latest Crypto-Gram newsletter refers to an article where a person gets on an airplane, having bypassed all airport security via climbing a fence.
http://www.schneier.com/crypto-gram-0701.html
http://www.newsobserver.com/102/story/523482.html

Feedback
To leave a comment, suggestion, question or other feedback:

Call the comment line at +1 617-674-2369 (US phone number)

Use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Leave a message at the Technocation forums:
http://technocation.org/forum

Send an e-mail to podcast@technocation.org

Acknowledgements/Sponsors
www.technocation.org
http://music.podshow.com
www.russellwolff.com
http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish
Thanx to Rich McIver for passing along this link:

http://www.businessintelligencelowdown.com/2007/02/top_10_largest_.html

I’m amused mostly because the article interchanges “database” with “data storage” — many of the sites have “digital documents” included in their count, oncologist and YouTube is in there completely with the amount of space their videos take up. But is all this stuff stored in databases? I do not think so. Anyone know for sure?
http://www.regdeveloper.co.uk/2006/07/18/multivalued_datatypes_access/

This is an interesting read — it would be awesome if MySQL just used the “SET” or “ENUM” data types to be a placeholder for a join table, doctor that it would create automatically for you. Of course, that’s a new level of functionality — MySQL does not implicitly create permanent tables with any commands. But it would be neat.
What happened to the MySQL Winter of Code? Are they waiting for winter in Australia?

I live near Boston, more about MA and I can tell you it’s definitely winter in the northern hemisphere….

So what are we waiting for?

Well, I can say this — we’re waiting for people. The Winter of Code idea is a great one, particularly since if MySQL works with academic institutions they could help students find Master’s Projects or part of Ph.D. work. Imagine someone writing a new storage engine and having that earn them a Master’s degree. This is exactly what MySQL needs — more people who understand database internals and best theoretical practices to start coding and see where it goes. Note the “more people” — they already have staff that does this.

I’m guessing the Winter of Code is nonexistent because of other big announcements that have been happening; still, I would love to see some collaboration with institutions and universities to give incentives to participants and push them to do it. Class credit or fulfilling graduate requirements would be perfect, and there would be many submissions.

Tying together MySQL and universities would be a great leap forward and a very important move for MySQL, as it would generate more contributions to the code. And the contest!
I work as a QA Engineer in a “stealth mode” startup building a network storage appliance. I am looking for “real world” datasets to load into our appliance to profile performance and scalability of the product given different schema models populated real world distribution of data. I envision looking for two significantly different datasets. One is the “flat file” schema like historical or logging data from Web Server Access and Error logs. The other would be a relational (preferably star schema) database like reservation database or inventory control database.

The data doesn’t need to be current. And it can be scrubbed to remove “real” data. The data won’t be used outside the QA lab. Again, misbirth this is to test “how does the product work when data that lives in the outside world is loaded.”

Ultimately, I am looking for 2 to 10 Terabytes of composite data at the end of the project.
Dorsal Source has a list of where you can get MySQL binaries — official and unofficial — up at:

http://www.dorsalsource.org
In this episode, angina I go over database normalization in general and explain 1st Normal Form (1NF) in depth.

Direct play episode 7 at:
http://technocation.org/content/oursql-episode-7%3A-what%2526%2523039%3Bs-it-be-normal%3F-1

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

Links:
MySQL binaries centralized repository: http://www.dorsalsource.org

SQLzoo

http://www.sqlzoo.net

Links about database normalization:
http://en.wikipedia.org/wiki/1NF

http://www.datamodel.org/NormalizationRules.html

http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html

http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html

Acknowledgements

http://www.technocation.org

http://music.podshow.com

http://www.russellwolff.com

http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish

Feedback

If you have any feedback about this podcast, dosage or want to suggest topics to cover in future podcasts, physician please email

podcast@technocation.org

You can also:

Call the comment line at +1 617-674-2369

Or use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Or use the Technocation forums:
http://technocation.org/forum
Listener feedback:

MySQL will go public. Would you buy stock if you had the money? Why or why not?
Call the comment line at +1 617-674-2369 (US phone number)

Use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Leave a message at the Technocation forums:
http://technocation.org/forum

Send an e-mail to podcast@technocation.org

Episode 8 Show Notes:
This episode’s feature is basic MySQL Security. Not only will we discuss what the basic security is, order but we’ll discuss the *why*s, not just the how’s.

Direct play this episode at:
http://technocation.org/content/oursql-episode-8%3A-basic-mysql-security-0

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

News
MySQL offers an unlimited number of Gold licenses per year for $40,000:
http://mysql.com/products/enterprise/unlimited.html
http://mysql.com/products/enterprise/features.html

MySQL begins to talk about going public: http://www.businessreviewonline.com/os/archives/2007/01/mysql_set_to_jo.html

Learning Resource:
http://www.hackmysql.com

Feature — MySQL Security:
Bruce Scneier’s latest Crypto-Gram newsletter refers to an article where a person gets on an airplane, having bypassed all airport security via climbing a fence.
http://www.schneier.com/crypto-gram-0701.html
http://www.newsobserver.com/102/story/523482.html

Feedback
To leave a comment, suggestion, question or other feedback:

Call the comment line at +1 617-674-2369 (US phone number)

Use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Leave a message at the Technocation forums:
http://technocation.org/forum

Send an e-mail to podcast@technocation.org

Acknowledgements/Sponsors
www.technocation.org
http://music.podshow.com
www.russellwolff.com
http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish
Thanx to Rich McIver for passing along this link:

http://www.businessintelligencelowdown.com/2007/02/top_10_largest_.html

I’m amused mostly because the article interchanges “database” with “data storage” — many of the sites have “digital documents” included in their count, oncologist and YouTube is in there completely with the amount of space their videos take up. But is all this stuff stored in databases? I do not think so. Anyone know for sure?
http://www.regdeveloper.co.uk/2006/07/18/multivalued_datatypes_access/

This is an interesting read — it would be awesome if MySQL just used the “SET” or “ENUM” data types to be a placeholder for a join table, doctor that it would create automatically for you. Of course, that’s a new level of functionality — MySQL does not implicitly create permanent tables with any commands. But it would be neat.
What happened to the MySQL Winter of Code? Are they waiting for winter in Australia?

I live near Boston, more about MA and I can tell you it’s definitely winter in the northern hemisphere….

So what are we waiting for?

Well, I can say this — we’re waiting for people. The Winter of Code idea is a great one, particularly since if MySQL works with academic institutions they could help students find Master’s Projects or part of Ph.D. work. Imagine someone writing a new storage engine and having that earn them a Master’s degree. This is exactly what MySQL needs — more people who understand database internals and best theoretical practices to start coding and see where it goes. Note the “more people” — they already have staff that does this.

I’m guessing the Winter of Code is nonexistent because of other big announcements that have been happening; still, I would love to see some collaboration with institutions and universities to give incentives to participants and push them to do it. Class credit or fulfilling graduate requirements would be perfect, and there would be many submissions.

Tying together MySQL and universities would be a great leap forward and a very important move for MySQL, as it would generate more contributions to the code. And the contest!
I work as a QA Engineer in a “stealth mode” startup building a network storage appliance. I am looking for “real world” datasets to load into our appliance to profile performance and scalability of the product given different schema models populated real world distribution of data. I envision looking for two significantly different datasets. One is the “flat file” schema like historical or logging data from Web Server Access and Error logs. The other would be a relational (preferably star schema) database like reservation database or inventory control database.

The data doesn’t need to be current. And it can be scrubbed to remove “real” data. The data won’t be used outside the QA lab. Again, misbirth this is to test “how does the product work when data that lives in the outside world is loaded.”

Ultimately, I am looking for 2 to 10 Terabytes of composite data at the end of the project.
The February Boston MySQL User Group meeting was great! I spoke about MySQL security; you can now download the slides and the video. I continue to be impressed with the sound quality of the video camera I have, medications but you can clearly hear it in the audio (well, I could when I was wearing headphones, but I also have pretty bad hearing).

Special thanks to http://technocation.org for hosting the bandwidth for the videos.

Topics covered in the talk:
ACLs
Test dbs & anonymous accounts
OS files and permissions
Application data flow
SQL Injection
XSS (Cross-site scripting)

PDF of slides (1.4M):
http://www.sheeri.com/presentations/MySQLSecurity2007_02_08.pdf

Slides in Flash (107K):
http://www.sheeri.com/presentations/MySQLSecurity2007_02_08.swf

Video of presentation (large, 289M)
http://technocation.org/videos/original/mysqlsecurity2007_02_08large.wmv

Video of presentation (small, 27M)
http://technocation.org/videos/original/mysqlsecurity2007_02_08small.wmv

Dorsal Source has a list of where you can get MySQL binaries — official and unofficial — up at:

http://www.dorsalsource.org
In this episode, angina I go over database normalization in general and explain 1st Normal Form (1NF) in depth.

Direct play episode 7 at:
http://technocation.org/content/oursql-episode-7%3A-what%2526%2523039%3Bs-it-be-normal%3F-1

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

Links:
MySQL binaries centralized repository: http://www.dorsalsource.org

SQLzoo

http://www.sqlzoo.net

Links about database normalization:
http://en.wikipedia.org/wiki/1NF

http://www.datamodel.org/NormalizationRules.html

http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html

http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html

Acknowledgements

http://www.technocation.org

http://music.podshow.com

http://www.russellwolff.com

http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish

Feedback

If you have any feedback about this podcast, dosage or want to suggest topics to cover in future podcasts, physician please email

podcast@technocation.org

You can also:

Call the comment line at +1 617-674-2369

Or use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Or use the Technocation forums:
http://technocation.org/forum
Listener feedback:

MySQL will go public. Would you buy stock if you had the money? Why or why not?
Call the comment line at +1 617-674-2369 (US phone number)

Use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Leave a message at the Technocation forums:
http://technocation.org/forum

Send an e-mail to podcast@technocation.org

Episode 8 Show Notes:
This episode’s feature is basic MySQL Security. Not only will we discuss what the basic security is, order but we’ll discuss the *why*s, not just the how’s.

Direct play this episode at:
http://technocation.org/content/oursql-episode-8%3A-basic-mysql-security-0

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

News
MySQL offers an unlimited number of Gold licenses per year for $40,000:
http://mysql.com/products/enterprise/unlimited.html
http://mysql.com/products/enterprise/features.html

MySQL begins to talk about going public: http://www.businessreviewonline.com/os/archives/2007/01/mysql_set_to_jo.html

Learning Resource:
http://www.hackmysql.com

Feature — MySQL Security:
Bruce Scneier’s latest Crypto-Gram newsletter refers to an article where a person gets on an airplane, having bypassed all airport security via climbing a fence.
http://www.schneier.com/crypto-gram-0701.html
http://www.newsobserver.com/102/story/523482.html

Feedback
To leave a comment, suggestion, question or other feedback:

Call the comment line at +1 617-674-2369 (US phone number)

Use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Leave a message at the Technocation forums:
http://technocation.org/forum

Send an e-mail to podcast@technocation.org

Acknowledgements/Sponsors
www.technocation.org
http://music.podshow.com
www.russellwolff.com
http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish
Thanx to Rich McIver for passing along this link:

http://www.businessintelligencelowdown.com/2007/02/top_10_largest_.html

I’m amused mostly because the article interchanges “database” with “data storage” — many of the sites have “digital documents” included in their count, oncologist and YouTube is in there completely with the amount of space their videos take up. But is all this stuff stored in databases? I do not think so. Anyone know for sure?
http://www.regdeveloper.co.uk/2006/07/18/multivalued_datatypes_access/

This is an interesting read — it would be awesome if MySQL just used the “SET” or “ENUM” data types to be a placeholder for a join table, doctor that it would create automatically for you. Of course, that’s a new level of functionality — MySQL does not implicitly create permanent tables with any commands. But it would be neat.
What happened to the MySQL Winter of Code? Are they waiting for winter in Australia?

I live near Boston, more about MA and I can tell you it’s definitely winter in the northern hemisphere….

So what are we waiting for?

Well, I can say this — we’re waiting for people. The Winter of Code idea is a great one, particularly since if MySQL works with academic institutions they could help students find Master’s Projects or part of Ph.D. work. Imagine someone writing a new storage engine and having that earn them a Master’s degree. This is exactly what MySQL needs — more people who understand database internals and best theoretical practices to start coding and see where it goes. Note the “more people” — they already have staff that does this.

I’m guessing the Winter of Code is nonexistent because of other big announcements that have been happening; still, I would love to see some collaboration with institutions and universities to give incentives to participants and push them to do it. Class credit or fulfilling graduate requirements would be perfect, and there would be many submissions.

Tying together MySQL and universities would be a great leap forward and a very important move for MySQL, as it would generate more contributions to the code. And the contest!
I work as a QA Engineer in a “stealth mode” startup building a network storage appliance. I am looking for “real world” datasets to load into our appliance to profile performance and scalability of the product given different schema models populated real world distribution of data. I envision looking for two significantly different datasets. One is the “flat file” schema like historical or logging data from Web Server Access and Error logs. The other would be a relational (preferably star schema) database like reservation database or inventory control database.

The data doesn’t need to be current. And it can be scrubbed to remove “real” data. The data won’t be used outside the QA lab. Again, misbirth this is to test “how does the product work when data that lives in the outside world is loaded.”

Ultimately, I am looking for 2 to 10 Terabytes of composite data at the end of the project.
The February Boston MySQL User Group meeting was great! I spoke about MySQL security; you can now download the slides and the video. I continue to be impressed with the sound quality of the video camera I have, medications but you can clearly hear it in the audio (well, I could when I was wearing headphones, but I also have pretty bad hearing).

Special thanks to http://technocation.org for hosting the bandwidth for the videos.

Topics covered in the talk:
ACLs
Test dbs & anonymous accounts
OS files and permissions
Application data flow
SQL Injection
XSS (Cross-site scripting)

PDF of slides (1.4M):
http://www.sheeri.com/presentations/MySQLSecurity2007_02_08.pdf

Slides in Flash (107K):
http://www.sheeri.com/presentations/MySQLSecurity2007_02_08.swf

Video of presentation (large, 289M)
http://technocation.org/videos/original/mysqlsecurity2007_02_08large.wmv

Video of presentation (small, 27M)
http://technocation.org/videos/original/mysqlsecurity2007_02_08small.wmv

http://www.webpronews.com/topnews/2007/02/26/techcrunch-others-love-linux-mysql

I’m not quite sure what to say about this article, web except that a sample of 7 “big” sites showed that the LAMP[hp] stack was heavily used. Perhaps, “And this is news?”

MySQL Winter of Code

Dorsal Source has a list of where you can get MySQL binaries — official and unofficial — up at:

http://www.dorsalsource.org
Dorsal Source has a list of where you can get MySQL binaries — official and unofficial — up at:

http://www.dorsalsource.org
In this episode, angina I go over database normalization in general and explain 1st Normal Form (1NF) in depth.

Direct play episode 7 at:
http://technocation.org/content/oursql-episode-7%3A-what%2526%2523039%3Bs-it-be-normal%3F-1

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

Links:
MySQL binaries centralized repository: http://www.dorsalsource.org

SQLzoo

http://www.sqlzoo.net

Links about database normalization:
http://en.wikipedia.org/wiki/1NF

http://www.datamodel.org/NormalizationRules.html

http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html

http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html

Acknowledgements

http://www.technocation.org

http://music.podshow.com

http://www.russellwolff.com

http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish

Feedback

If you have any feedback about this podcast, dosage or want to suggest topics to cover in future podcasts, physician please email

podcast@technocation.org

You can also:

Call the comment line at +1 617-674-2369

Or use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Or use the Technocation forums:
http://technocation.org/forum
Dorsal Source has a list of where you can get MySQL binaries — official and unofficial — up at:

http://www.dorsalsource.org
In this episode, angina I go over database normalization in general and explain 1st Normal Form (1NF) in depth.

Direct play episode 7 at:
http://technocation.org/content/oursql-episode-7%3A-what%2526%2523039%3Bs-it-be-normal%3F-1

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

Links:
MySQL binaries centralized repository: http://www.dorsalsource.org

SQLzoo

http://www.sqlzoo.net

Links about database normalization:
http://en.wikipedia.org/wiki/1NF

http://www.datamodel.org/NormalizationRules.html

http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html

http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html

Acknowledgements

http://www.technocation.org

http://music.podshow.com

http://www.russellwolff.com

http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish

Feedback

If you have any feedback about this podcast, dosage or want to suggest topics to cover in future podcasts, physician please email

podcast@technocation.org

You can also:

Call the comment line at +1 617-674-2369

Or use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Or use the Technocation forums:
http://technocation.org/forum
Listener feedback:

MySQL will go public. Would you buy stock if you had the money? Why or why not?
Call the comment line at +1 617-674-2369 (US phone number)

Use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Leave a message at the Technocation forums:
http://technocation.org/forum

Send an e-mail to podcast@technocation.org

Episode 8 Show Notes:
This episode’s feature is basic MySQL Security. Not only will we discuss what the basic security is, order but we’ll discuss the *why*s, not just the how’s.

Direct play this episode at:
http://technocation.org/content/oursql-episode-8%3A-basic-mysql-security-0

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

News
MySQL offers an unlimited number of Gold licenses per year for $40,000:
http://mysql.com/products/enterprise/unlimited.html
http://mysql.com/products/enterprise/features.html

MySQL begins to talk about going public: http://www.businessreviewonline.com/os/archives/2007/01/mysql_set_to_jo.html

Learning Resource:
http://www.hackmysql.com

Feature — MySQL Security:
Bruce Scneier’s latest Crypto-Gram newsletter refers to an article where a person gets on an airplane, having bypassed all airport security via climbing a fence.
http://www.schneier.com/crypto-gram-0701.html
http://www.newsobserver.com/102/story/523482.html

Feedback
To leave a comment, suggestion, question or other feedback:

Call the comment line at +1 617-674-2369 (US phone number)

Use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Leave a message at the Technocation forums:
http://technocation.org/forum

Send an e-mail to podcast@technocation.org

Acknowledgements/Sponsors
www.technocation.org
http://music.podshow.com
www.russellwolff.com
http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish
Dorsal Source has a list of where you can get MySQL binaries — official and unofficial — up at:

http://www.dorsalsource.org
In this episode, angina I go over database normalization in general and explain 1st Normal Form (1NF) in depth.

Direct play episode 7 at:
http://technocation.org/content/oursql-episode-7%3A-what%2526%2523039%3Bs-it-be-normal%3F-1

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

Links:
MySQL binaries centralized repository: http://www.dorsalsource.org

SQLzoo

http://www.sqlzoo.net

Links about database normalization:
http://en.wikipedia.org/wiki/1NF

http://www.datamodel.org/NormalizationRules.html

http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html

http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html

Acknowledgements

http://www.technocation.org

http://music.podshow.com

http://www.russellwolff.com

http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish

Feedback

If you have any feedback about this podcast, dosage or want to suggest topics to cover in future podcasts, physician please email

podcast@technocation.org

You can also:

Call the comment line at +1 617-674-2369

Or use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Or use the Technocation forums:
http://technocation.org/forum
Listener feedback:

MySQL will go public. Would you buy stock if you had the money? Why or why not?
Call the comment line at +1 617-674-2369 (US phone number)

Use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Leave a message at the Technocation forums:
http://technocation.org/forum

Send an e-mail to podcast@technocation.org

Episode 8 Show Notes:
This episode’s feature is basic MySQL Security. Not only will we discuss what the basic security is, order but we’ll discuss the *why*s, not just the how’s.

Direct play this episode at:
http://technocation.org/content/oursql-episode-8%3A-basic-mysql-security-0

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

News
MySQL offers an unlimited number of Gold licenses per year for $40,000:
http://mysql.com/products/enterprise/unlimited.html
http://mysql.com/products/enterprise/features.html

MySQL begins to talk about going public: http://www.businessreviewonline.com/os/archives/2007/01/mysql_set_to_jo.html

Learning Resource:
http://www.hackmysql.com

Feature — MySQL Security:
Bruce Scneier’s latest Crypto-Gram newsletter refers to an article where a person gets on an airplane, having bypassed all airport security via climbing a fence.
http://www.schneier.com/crypto-gram-0701.html
http://www.newsobserver.com/102/story/523482.html

Feedback
To leave a comment, suggestion, question or other feedback:

Call the comment line at +1 617-674-2369 (US phone number)

Use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Leave a message at the Technocation forums:
http://technocation.org/forum

Send an e-mail to podcast@technocation.org

Acknowledgements/Sponsors
www.technocation.org
http://music.podshow.com
www.russellwolff.com
http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish
Thanx to Rich McIver for passing along this link:

http://www.businessintelligencelowdown.com/2007/02/top_10_largest_.html

I’m amused mostly because the article interchanges “database” with “data storage” — many of the sites have “digital documents” included in their count, oncologist and YouTube is in there completely with the amount of space their videos take up. But is all this stuff stored in databases? I do not think so. Anyone know for sure?
Dorsal Source has a list of where you can get MySQL binaries — official and unofficial — up at:

http://www.dorsalsource.org
In this episode, angina I go over database normalization in general and explain 1st Normal Form (1NF) in depth.

Direct play episode 7 at:
http://technocation.org/content/oursql-episode-7%3A-what%2526%2523039%3Bs-it-be-normal%3F-1

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

Links:
MySQL binaries centralized repository: http://www.dorsalsource.org

SQLzoo

http://www.sqlzoo.net

Links about database normalization:
http://en.wikipedia.org/wiki/1NF

http://www.datamodel.org/NormalizationRules.html

http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html

http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html

Acknowledgements

http://www.technocation.org

http://music.podshow.com

http://www.russellwolff.com

http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish

Feedback

If you have any feedback about this podcast, dosage or want to suggest topics to cover in future podcasts, physician please email

podcast@technocation.org

You can also:

Call the comment line at +1 617-674-2369

Or use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Or use the Technocation forums:
http://technocation.org/forum
Listener feedback:

MySQL will go public. Would you buy stock if you had the money? Why or why not?
Call the comment line at +1 617-674-2369 (US phone number)

Use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Leave a message at the Technocation forums:
http://technocation.org/forum

Send an e-mail to podcast@technocation.org

Episode 8 Show Notes:
This episode’s feature is basic MySQL Security. Not only will we discuss what the basic security is, order but we’ll discuss the *why*s, not just the how’s.

Direct play this episode at:
http://technocation.org/content/oursql-episode-8%3A-basic-mysql-security-0

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

News
MySQL offers an unlimited number of Gold licenses per year for $40,000:
http://mysql.com/products/enterprise/unlimited.html
http://mysql.com/products/enterprise/features.html

MySQL begins to talk about going public: http://www.businessreviewonline.com/os/archives/2007/01/mysql_set_to_jo.html

Learning Resource:
http://www.hackmysql.com

Feature — MySQL Security:
Bruce Scneier’s latest Crypto-Gram newsletter refers to an article where a person gets on an airplane, having bypassed all airport security via climbing a fence.
http://www.schneier.com/crypto-gram-0701.html
http://www.newsobserver.com/102/story/523482.html

Feedback
To leave a comment, suggestion, question or other feedback:

Call the comment line at +1 617-674-2369 (US phone number)

Use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Leave a message at the Technocation forums:
http://technocation.org/forum

Send an e-mail to podcast@technocation.org

Acknowledgements/Sponsors
www.technocation.org
http://music.podshow.com
www.russellwolff.com
http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish
Thanx to Rich McIver for passing along this link:

http://www.businessintelligencelowdown.com/2007/02/top_10_largest_.html

I’m amused mostly because the article interchanges “database” with “data storage” — many of the sites have “digital documents” included in their count, oncologist and YouTube is in there completely with the amount of space their videos take up. But is all this stuff stored in databases? I do not think so. Anyone know for sure?
http://www.regdeveloper.co.uk/2006/07/18/multivalued_datatypes_access/

This is an interesting read — it would be awesome if MySQL just used the “SET” or “ENUM” data types to be a placeholder for a join table, doctor that it would create automatically for you. Of course, that’s a new level of functionality — MySQL does not implicitly create permanent tables with any commands. But it would be neat.
Dorsal Source has a list of where you can get MySQL binaries — official and unofficial — up at:

http://www.dorsalsource.org
In this episode, angina I go over database normalization in general and explain 1st Normal Form (1NF) in depth.

Direct play episode 7 at:
http://technocation.org/content/oursql-episode-7%3A-what%2526%2523039%3Bs-it-be-normal%3F-1

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

Links:
MySQL binaries centralized repository: http://www.dorsalsource.org

SQLzoo

http://www.sqlzoo.net

Links about database normalization:
http://en.wikipedia.org/wiki/1NF

http://www.datamodel.org/NormalizationRules.html

http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html

http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html

Acknowledgements

http://www.technocation.org

http://music.podshow.com

http://www.russellwolff.com

http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish

Feedback

If you have any feedback about this podcast, dosage or want to suggest topics to cover in future podcasts, physician please email

podcast@technocation.org

You can also:

Call the comment line at +1 617-674-2369

Or use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Or use the Technocation forums:
http://technocation.org/forum
Listener feedback:

MySQL will go public. Would you buy stock if you had the money? Why or why not?
Call the comment line at +1 617-674-2369 (US phone number)

Use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Leave a message at the Technocation forums:
http://technocation.org/forum

Send an e-mail to podcast@technocation.org

Episode 8 Show Notes:
This episode’s feature is basic MySQL Security. Not only will we discuss what the basic security is, order but we’ll discuss the *why*s, not just the how’s.

Direct play this episode at:
http://technocation.org/content/oursql-episode-8%3A-basic-mysql-security-0

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

News
MySQL offers an unlimited number of Gold licenses per year for $40,000:
http://mysql.com/products/enterprise/unlimited.html
http://mysql.com/products/enterprise/features.html

MySQL begins to talk about going public: http://www.businessreviewonline.com/os/archives/2007/01/mysql_set_to_jo.html

Learning Resource:
http://www.hackmysql.com

Feature — MySQL Security:
Bruce Scneier’s latest Crypto-Gram newsletter refers to an article where a person gets on an airplane, having bypassed all airport security via climbing a fence.
http://www.schneier.com/crypto-gram-0701.html
http://www.newsobserver.com/102/story/523482.html

Feedback
To leave a comment, suggestion, question or other feedback:

Call the comment line at +1 617-674-2369 (US phone number)

Use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Leave a message at the Technocation forums:
http://technocation.org/forum

Send an e-mail to podcast@technocation.org

Acknowledgements/Sponsors
www.technocation.org
http://music.podshow.com
www.russellwolff.com
http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish
Thanx to Rich McIver for passing along this link:

http://www.businessintelligencelowdown.com/2007/02/top_10_largest_.html

I’m amused mostly because the article interchanges “database” with “data storage” — many of the sites have “digital documents” included in their count, oncologist and YouTube is in there completely with the amount of space their videos take up. But is all this stuff stored in databases? I do not think so. Anyone know for sure?
http://www.regdeveloper.co.uk/2006/07/18/multivalued_datatypes_access/

This is an interesting read — it would be awesome if MySQL just used the “SET” or “ENUM” data types to be a placeholder for a join table, doctor that it would create automatically for you. Of course, that’s a new level of functionality — MySQL does not implicitly create permanent tables with any commands. But it would be neat.
What happened to the MySQL Winter of Code? Are they waiting for winter in Australia?

I live near Boston, more about MA and I can tell you it’s definitely winter in the northern hemisphere….

So what are we waiting for?

Well, I can say this — we’re waiting for people. The Winter of Code idea is a great one, particularly since if MySQL works with academic institutions they could help students find Master’s Projects or part of Ph.D. work. Imagine someone writing a new storage engine and having that earn them a Master’s degree. This is exactly what MySQL needs — more people who understand database internals and best theoretical practices to start coding and see where it goes. Note the “more people” — they already have staff that does this.

I’m guessing the Winter of Code is nonexistent because of other big announcements that have been happening; still, I would love to see some collaboration with institutions and universities to give incentives to participants and push them to do it. Class credit or fulfilling graduate requirements would be perfect, and there would be many submissions.

Tying together MySQL and universities would be a great leap forward and a very important move for MySQL, as it would generate more contributions to the code. And the contest!

Donate to Help Folks Get to the MySQL Users Conference

I stumbled across this on the fabulous http://www.everythingsysadmin.com :

from:  http://www.socallinuxexpo.org/wios07/

The Southern California Linux Expo (SCALE) will host a Women in Open Source Event as part of their upcoming 2007 conference, information pills physician SCALE 5x.

The focus of this event is on the women in the open source and free software communities. The goal of this event is to encourage women to use technology, open source and free software, and to explore the obstacles that women face in breaking into the technology industry. The Women in Open Source event will be held on February 9, 2007 at the Los Angeles Airport Westin Hotel.

I have seen the dearth of women, in system administration as well as database administration, having worked in both fields.  Network administration has fewer women than both system administration and database administration. 

I have noted that there are more women in database administration than system administration.  The MySQL Conference I attended last year had the biggest percentage of women I’ve ever seen at a technical conference.  Checking out the names at http://www.planetmysql.org the first name there is female (mine), and then a few dozen names go by, and most of them are clearly identifiable as male.  There are only a few names whom I can’t place a face/picture to, and to whom I cannot assign a gender.

I’m not denying that barriers and prejudices exist. In fact, one of the reasons I opted to get a Master’s Degree in Computer Science is that I have no idea who’s looking at me, and consciously or subconsciously are thinking, “She’s a woman, she’s not as good as a man.”

Certainly, there are the usual barriers to entry in the scientific world.  Is open source any different from any other type of “boys’ club”?  My partner is a sleight-of-hand card magician, and he laments that there are few women in the field.  He also laments that he does not see women interested at all — it’s not that they’re starting and being turned away, or getting discouraged, it’s that they’re not even starting.  How would we know if we reach that point in the open source world? 

I have not met with any barriers to my career.  Then again, I grew up with 2 brothers (my sister is 7 years older, so I never really “grew up” with her), and adapted to “living in a world of boys/men” a long time ago.  Mostly it’s just grabbing the bull by the horns and just doing.

My issue with the lack of women in open source is that I wonder how far we can get into it without bringing up the stereotypes.  I am not a shy, quiet woman.  I am not ladylike in many ways.  Many “qualities” that American society teaches their girls and women — don’t interrupt, have a lack of self-confidence, looks are your most important asset so do not seem to intelligent — are the barriers to entry I’ve seen, though not encountered.  That, and the struggle between family and career.  I have seen more and more men struggle with this as well, so it’s not as much of a “female” issue as it has been in the past.

Professionally, I make sure to wear my glasses on a job interview or important meeting, and dress conservatively when I do (I’ve fallen back to wearing glasses all the time because I’ve been too lazy to get more contact lenses).  I actually don’t want to look too pretty on a job interview, and consciously think of this.  So I’m definitely aware of the discrimination that exists, but so far I have been lucky enough to not encounter it, or at least I’m not aware of encountering it.

It’s my belief that women make less than men for the same job because they are less likely to negotiate and hold out for more money, and more likely to take one of the first offers — “good women” don’t “rock the boat”, after all.  (tongue in cheek, of course)

As someone who is rather type A (see http://en.wikipedia.org/wiki/Type_A_personality), I know there are colleagues and co-workers who find me to be too strict, which translates to “she’s a bitch.”  For men, this turns into “he’s too strict and type A.”  The double-standard is annoying, though not something I particularly worry about, since it has not gotten in the way of advancement or recognition, and has not targeted me for anything negative in particular (that I know about).

So, what’s the point of this?  I think it’s good to teach girls and women the skills they need to make it in a male-dominated field.  But I feel it should be taught and instilled at an early age, to both girls AND boys, as “this is the way American/Western society works, and these are ways folks are successful.”

I just feel as though singling out women, while it’s accurate, is saying, “we’re going to put you in the special room because you haven’t learned something you should have” and therefore is somewhat degrading.  Not a lot degrading, but somewhat.

powered by performancing firefox

Phorum needs to get to the MySQL Conference. Perhaps you do, asthma too? Or perhaps you want to help people get there? Technocation, troche Inc is a not-for-profit committed to helping folks get the education and networking contacts so important to IT professionals. So, buy they’re opening up their very first campaign!

Technocation, Inc. is a not-for-profit organization. Your contributions are tax-deductible to the fullest extent of the law. You may choose to donate money, goods or services. Money may be donated through PayPal at http://technocation.org/content/donate-now, and services should be arranged through e-mailing donate@technocation.org . To send payment by mail, see details at . Technocation’s EIN/Tax ID is 20-5445375
Currently, this campaign is looking for:

Monetary contributions
Graphic design for print media
Physical booth board materials
Transferable frequent flyer miles
Transferable Hyatt hotel points

Technocation will provide a grant application on Thursday, February 15th to anyone interested in applying for grants.

Money may be donated through PayPal at http://technocation.org/content/donate-now, and services should be arranged through e-mailing donate@technocation.org . Directed donations are accepted (ie, “for conference registration only” or “for travel for Phorum.org members”), simply attach a note with your payment.

Other questions may also be e-mailed to that address, including requesting donations of goods and services not currently on that list.

The Sincerest Form of Flattery is Imitation

While MySQL customers have been bitterly complaining about the move to package support and rigorous testing of binaries into a paid package, recipe Stephen Walli of Optaros has been thinking:

What if Microsoft SQL Server open sourced their codebase, cialis 40mg provided support and testing of binaries in a paid package similar to MySQL Network, look and “DB mashups” ensued?

http://stephesblog.blogs.com/my_weblog/2007/01/microsoft_and_m.html

It’s an interesting read to get you thinking. Most of my thought was, “that’d be neat….I wonder if folks would stop complaining about the MySQL Enterprise and Community models if that actually happened.”

Great Job Interview Snippet

Now, stomach I should probably be a good Planet MySQLer and check the MySQL Forge at http://forge.mysql.com, but Dean Swift’s “mystery festive stored procedure” linked at http://deepselect.blogspot.com/2006/12/merry-christmas.html is a pretty good interview question for a candidate. I would include the hint that it’s a Christmasy stored procedure.

I laughed out loud when I figured it out. I didn’t actually run it, but read the stored procedure to see if I could puzzle it out. And so I did. It took a few minutes, and I had to copy and paste it to a buffer that used word wrap and format it properly.

But this will show how good someone is under pressure. If you give it to them and walk away, there will be less pressure. Either way, you’ll see their reaction too — if it’s “ha ha very funny who cares?” or if it’s, “that’s pure genius!” or if it’s a big groan or hearty laugh, you’ll see a person’s personality and how they might fit in with the team.

MySQL Beats Oracle for Wireless Developers, is Beaten By Microsoft SQL Server

From: http://www.crn.com/sections/breakingnews/breakingnews.jhtml?articleId=196700062

The short version is that among 380 wireless application developers surveyed, noun
30% use Microsoft SQL Server as a backend, sickness
20% use MySQL, stuff
16% use Oracle.

Microsoft can offer bundles, so they can just offer SQL Server cheaply so long as companies have the MSDN service (I don’t know if they do, but they can). MySQL and Oracle are standalone offerings — no bundles. And as an embedded database, a MySQL license must be bought.

So this research shows that when customers have to pay for a database, they choose MySQL over Oracle. Granted, they choose Microsoft SQL Server over both.

Now, many folks say I’m a MySQL nut. I am, but that’s not the point — I do not advocate that MySQL is the only database to use. In fact, I think competition is wonderful, and I am happy about the existence of Oracle, PostgreSQL, Sybase and DB2. I have never tried to convert a company from using the solution that works for them to using MySQL. And MySQL has many problems.

That being said, I’m very excited about this news.

What’s Your Uptime? What’s Your Uptime Worth?

Life has been super busy, web dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
On Wednesday night, cialis 40mg I did some consulting, physician and it ended up taking twice as long as I thought it would. So I am rewarding myself by going to MySQLCamp!

Speaking of which, pill I updated the home page, adding explicitly that registration is free, and a section on travel information. I have no idea how housing is being organized, or if it is, and I am happy to take the lead on doing so.

The basics are that there are some good, cheap* 3-star hotels not too far away. Cheap = under $100, I even found some in the $60 price range! I would love to get a sense of what folks are doing for lodging, and if folks want, I can work on getting a group discount (some hotels will arrange one for a minimum of 10 rooms), arranging a suite for the “evening track”, etc.

Currently stating that you’re interested does not require a commitment to get a hotel room. If we have critical mass, I can see what the options are, and folks can reserve a room for themselves or, if it’s easier/cheaper, I can make the reservations for folks.

Alternatively, if someone speaks up and says, “Silly Sheeri! It’s all taken care of!” please point me in the right direction.

Public transit information would be great on the travel page, particularly from the airport to the venue. Also, knowing what time camp starts on Friday and ends on Monday would be great…..sure, they’re approximate….

* the hotels are good and cheap, therefore they cannot be fast.
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
On Wednesday night, cialis 40mg I did some consulting, physician and it ended up taking twice as long as I thought it would. So I am rewarding myself by going to MySQLCamp!

Speaking of which, pill I updated the home page, adding explicitly that registration is free, and a section on travel information. I have no idea how housing is being organized, or if it is, and I am happy to take the lead on doing so.

The basics are that there are some good, cheap* 3-star hotels not too far away. Cheap = under $100, I even found some in the $60 price range! I would love to get a sense of what folks are doing for lodging, and if folks want, I can work on getting a group discount (some hotels will arrange one for a minimum of 10 rooms), arranging a suite for the “evening track”, etc.

Currently stating that you’re interested does not require a commitment to get a hotel room. If we have critical mass, I can see what the options are, and folks can reserve a room for themselves or, if it’s easier/cheaper, I can make the reservations for folks.

Alternatively, if someone speaks up and says, “Silly Sheeri! It’s all taken care of!” please point me in the right direction.

Public transit information would be great on the travel page, particularly from the airport to the venue. Also, knowing what time camp starts on Friday and ends on Monday would be great…..sure, they’re approximate….

* the hotels are good and cheap, therefore they cannot be fast.
With recent posts by Frank Mash and Stewart Smith about password protecting, migraine I am reminded of all the privacy vs. security arguments we have going on in the United States. Basically, infertility I see a somewhat similar situation — how much privacy do folks give up for the sake of security is analogous to how much calculation, how many hoops to jump through, to ensure that data is secured properly.

On the one hand, the analogy falls apart, because encryption calculation times are much less of an “inconvenience” than an invasion of privacy, and thus the argument gets usurped. It’s just a function, or a few calculations, no big deal. We all use SSH instead of telnet, and hopefully SFTP instead of FTP, because plaintext passwords are bad.

As a retort, most folks do not use SSL-enabled MySQL, and some do not even use SSL-enabled http. Why? Because it’s slow! Well, we do not want anything slow! But the security is worth the slowness! What? You mean people will go to another web site if yours is too slow? But the competitor is not as secure!!!!! So the analogy works there.

The analogy also works, when you consider how valuable the data is that you are attempting to lock up. Financial and health institutions need as high a level of encryption as possible for passwords, and any organization that stores a federal ID number should encrypt that. Nobody wants their privacy violated by having their health information exposed, their identity stolen, their bank accounts drained and their credit cards maxxed out.

Many people worry about things that are basically public information. For instance, in the US, the bank account number and the routing number are both printed on every check. Electronic Fund Transfers (EFTs) require three numbers — the checking account number, the bank routing number, and the amount. Or at least, that is all I type in. Perhaps my account stores address information and it is checked against that, but I am not asked for my billing address, as I am with a credit card transaction.

Some people guard their bank account number with extreme privacy, but it is in plain sight on the bottom of every paper check written!

Similarly, does it matter if someone cracks my password in some applications? How much damage could someone do if they got my password to a newspaper site. What are they going to do, write a letter to the editor or a comment? You’re not liable if someone cracks your server and then uses it to port-scan government agencies. It’s just a pain when you have to wipe your OS and start over. But no valuable information is lost, just time and patience.

At work, sure, they could get their hands on trade secrets if they cracked my desktop, laptop, VPN, or email password.

What about a dating site? What about a community forum? Should my password on forums.mysql.com be stored as tightly as my password on mysqlcamp.org? What does it matter if either password is cracked? Sure, if they try that same password on paypal, and I am dumb enough to use the same password for important data that I do for non-important data, my password will be stolen.

This is, to me, one of the greatest things about wikis. Sure, people have vandalized wikis, but it’s much more satisfying for folks to vandalize a site that’s not “open”. Someone is going to steal a new $3,000 bicycle that’s not locked up, but nobody is going to touch the old beater with a rusty frame, missing front wheel, flat back tire.

And of course, your application probably falls somewhere in between “everyone wants it” and “nobody wants it”. One of the things I say over and over in the presentations I give is “think about it.” Think about the security you need. Do a risk analysis. If you want your data secure, write it on a piece of paper and have recipients swallow it after they’ve read it. For any other security method, think about the gaps, and think about what really matters.

As a user, think about the ramifications of your passwords, as well. Many sites without “important” information will e-mail your password if you forget it. There it is, in plaintext for the world to intercept. I keep a few passwords at any given time (and change them every so often) — “really secure” ones, for financial institutions and such, “somewhat secure” ones, for things like blogging sites, and then “throwaway” ones, for sites where the info is not important, and I would suffer very little if my password is “cracked”.

—————————-
As well, by highlighting the encryption functions (MD5() and SHA1()) in MySQL, both articles imply that applications should call the encryption functions within MySQL. If an application is using encryption at all, it should be done as close to the user as possible. A client-side encryption such as a Javascript function is much better, security-wise, than using MySQL. You want to encrypt it *before* it goes over the network. If someone’s sniffing the network, then running SELECT nickname FROM myUser WHERE username='sheeri' and password=SHA1('kritzer'); is pointless — even if you salt the data. Someone can sniff the packet and find the plaintext ‘kritzer’ — either between the client’s browser and the web server, or between the web server and the database server.

Stewart’s post did not mention that a JOIN is actually the best way to go — JOIN the words table with the passwords on the salted hash, and then you can possibly retrieve a password. As well, if I were a cracker, I wouldn’t care about using a VIEW, I’d just add a “saltedmd5” column to my table, index it, and then JOIN the tables (creating an index on the table field if need be). Because he was comparing md5 sums, not actually trying to compare passwords through an application, it means he had a backdoor to make database calls, so we could indeed assume a JOIN is possible.

Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
On Wednesday night, cialis 40mg I did some consulting, physician and it ended up taking twice as long as I thought it would. So I am rewarding myself by going to MySQLCamp!

Speaking of which, pill I updated the home page, adding explicitly that registration is free, and a section on travel information. I have no idea how housing is being organized, or if it is, and I am happy to take the lead on doing so.

The basics are that there are some good, cheap* 3-star hotels not too far away. Cheap = under $100, I even found some in the $60 price range! I would love to get a sense of what folks are doing for lodging, and if folks want, I can work on getting a group discount (some hotels will arrange one for a minimum of 10 rooms), arranging a suite for the “evening track”, etc.

Currently stating that you’re interested does not require a commitment to get a hotel room. If we have critical mass, I can see what the options are, and folks can reserve a room for themselves or, if it’s easier/cheaper, I can make the reservations for folks.

Alternatively, if someone speaks up and says, “Silly Sheeri! It’s all taken care of!” please point me in the right direction.

Public transit information would be great on the travel page, particularly from the airport to the venue. Also, knowing what time camp starts on Friday and ends on Monday would be great…..sure, they’re approximate….

* the hotels are good and cheap, therefore they cannot be fast.
With recent posts by Frank Mash and Stewart Smith about password protecting, migraine I am reminded of all the privacy vs. security arguments we have going on in the United States. Basically, infertility I see a somewhat similar situation — how much privacy do folks give up for the sake of security is analogous to how much calculation, how many hoops to jump through, to ensure that data is secured properly.

On the one hand, the analogy falls apart, because encryption calculation times are much less of an “inconvenience” than an invasion of privacy, and thus the argument gets usurped. It’s just a function, or a few calculations, no big deal. We all use SSH instead of telnet, and hopefully SFTP instead of FTP, because plaintext passwords are bad.

As a retort, most folks do not use SSL-enabled MySQL, and some do not even use SSL-enabled http. Why? Because it’s slow! Well, we do not want anything slow! But the security is worth the slowness! What? You mean people will go to another web site if yours is too slow? But the competitor is not as secure!!!!! So the analogy works there.

The analogy also works, when you consider how valuable the data is that you are attempting to lock up. Financial and health institutions need as high a level of encryption as possible for passwords, and any organization that stores a federal ID number should encrypt that. Nobody wants their privacy violated by having their health information exposed, their identity stolen, their bank accounts drained and their credit cards maxxed out.

Many people worry about things that are basically public information. For instance, in the US, the bank account number and the routing number are both printed on every check. Electronic Fund Transfers (EFTs) require three numbers — the checking account number, the bank routing number, and the amount. Or at least, that is all I type in. Perhaps my account stores address information and it is checked against that, but I am not asked for my billing address, as I am with a credit card transaction.

Some people guard their bank account number with extreme privacy, but it is in plain sight on the bottom of every paper check written!

Similarly, does it matter if someone cracks my password in some applications? How much damage could someone do if they got my password to a newspaper site. What are they going to do, write a letter to the editor or a comment? You’re not liable if someone cracks your server and then uses it to port-scan government agencies. It’s just a pain when you have to wipe your OS and start over. But no valuable information is lost, just time and patience.

At work, sure, they could get their hands on trade secrets if they cracked my desktop, laptop, VPN, or email password.

What about a dating site? What about a community forum? Should my password on forums.mysql.com be stored as tightly as my password on mysqlcamp.org? What does it matter if either password is cracked? Sure, if they try that same password on paypal, and I am dumb enough to use the same password for important data that I do for non-important data, my password will be stolen.

This is, to me, one of the greatest things about wikis. Sure, people have vandalized wikis, but it’s much more satisfying for folks to vandalize a site that’s not “open”. Someone is going to steal a new $3,000 bicycle that’s not locked up, but nobody is going to touch the old beater with a rusty frame, missing front wheel, flat back tire.

And of course, your application probably falls somewhere in between “everyone wants it” and “nobody wants it”. One of the things I say over and over in the presentations I give is “think about it.” Think about the security you need. Do a risk analysis. If you want your data secure, write it on a piece of paper and have recipients swallow it after they’ve read it. For any other security method, think about the gaps, and think about what really matters.

As a user, think about the ramifications of your passwords, as well. Many sites without “important” information will e-mail your password if you forget it. There it is, in plaintext for the world to intercept. I keep a few passwords at any given time (and change them every so often) — “really secure” ones, for financial institutions and such, “somewhat secure” ones, for things like blogging sites, and then “throwaway” ones, for sites where the info is not important, and I would suffer very little if my password is “cracked”.

—————————-
As well, by highlighting the encryption functions (MD5() and SHA1()) in MySQL, both articles imply that applications should call the encryption functions within MySQL. If an application is using encryption at all, it should be done as close to the user as possible. A client-side encryption such as a Javascript function is much better, security-wise, than using MySQL. You want to encrypt it *before* it goes over the network. If someone’s sniffing the network, then running SELECT nickname FROM myUser WHERE username='sheeri' and password=SHA1('kritzer'); is pointless — even if you salt the data. Someone can sniff the packet and find the plaintext ‘kritzer’ — either between the client’s browser and the web server, or between the web server and the database server.

Stewart’s post did not mention that a JOIN is actually the best way to go — JOIN the words table with the passwords on the salted hash, and then you can possibly retrieve a password. As well, if I were a cracker, I wouldn’t care about using a VIEW, I’d just add a “saltedmd5” column to my table, index it, and then JOIN the tables (creating an index on the table field if need be). Because he was comparing md5 sums, not actually trying to compare passwords through an application, it means he had a backdoor to make database calls, so we could indeed assume a JOIN is possible.

My desktop at work is a Windows machine. Why? Because it gives me what I need — shell access to servers so I can do real work on the machines, viagra dosage a text editor, information pills an e-mail client and a web browser. That’s really all I need to do my job. Sure, I could put in for a Macintosh or install a Unix variant. But if it gives me what I want, why would I spend all that work changing things around, just to ultimately get the same requirements — shell, web browser, text editor, e-mail client….????

I love MySQL, it’s a great database. But in order to meet its tenets, it has sacrificed features. When database religious wars start, it comes down to “MySQL stinks because it does not have the features,” or, nowadays, “It didn’t have [x feature] for a long time.” When that happens, my question is always, “If MySQL is so bad, why do so many people use it?”

Because it gives them the most important feature — SPEED. Speed is the #1 top priority in embedded databases, web applications and most desktop applications. (Am I missing a use of a database?) Companies will pay through the nose for training and licenses if it means their customers are happy because their product is speedy. (Meanwhile, MySQL is offering it very affordably, so folks do not have to pay through the nose.)

So to the folks who argue that MySQL stinks — I’ll agree, if you are talking about being feature-rich. However, MySQL has been growing in that department, so the argument is only relevant if you want to do a pivot table, or index a calculated field in a VIEW, or something complex like that. Perhaps MySQL isn’t appropriate for, say, a data warehouse. SQL Server is a better choice for that, as it has reporting modules and analysis wizards and all sorts of stuff.

MySQL is not perfect for everything, and it is not lousy for everything. But if you look at what most people need, it is speed. MySQL delivers that.

http://www.mysql.com/news-and-events/press-release/release_2006_35.html

MySQL won a contest, and it is proclaimed the fastest database.

I am happy, but I am not surprised. The basic tenets of MySQL are “fast, easy-to-use and bug-free”.

It is nice to know MySQL is actually meeting their goals. ‘Bug-free’ is not totally true, of course, but MySQL’s features are well-implemented. And being the fastest database is an achievement, even if it was one that was planned for.

We all know MySQL is the fastest. That’s why so many organizations have used it, even during the time when MyISAM was the only widely used storage engine. Before transactions, before fulltext indexing, before views and stored procedures and triggers, MySQL was widely used. A developer should not have to write transaction code in a programming language, but many organizations were happy to use bad coding techniques and zoom past their competitors with a speedy site.

Now the rest of the world knows that MySQL is the fastest. And, of course, MySQL is fast, easy-to-use, affordable AND full-featured. MySQL is doing a good job of marketing their new features, but another thing they should do is find out all the outdated information on websites and educate folks, that the arguments against MySQL are fewer and fewer as time goes on.
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
On Wednesday night, cialis 40mg I did some consulting, physician and it ended up taking twice as long as I thought it would. So I am rewarding myself by going to MySQLCamp!

Speaking of which, pill I updated the home page, adding explicitly that registration is free, and a section on travel information. I have no idea how housing is being organized, or if it is, and I am happy to take the lead on doing so.

The basics are that there are some good, cheap* 3-star hotels not too far away. Cheap = under $100, I even found some in the $60 price range! I would love to get a sense of what folks are doing for lodging, and if folks want, I can work on getting a group discount (some hotels will arrange one for a minimum of 10 rooms), arranging a suite for the “evening track”, etc.

Currently stating that you’re interested does not require a commitment to get a hotel room. If we have critical mass, I can see what the options are, and folks can reserve a room for themselves or, if it’s easier/cheaper, I can make the reservations for folks.

Alternatively, if someone speaks up and says, “Silly Sheeri! It’s all taken care of!” please point me in the right direction.

Public transit information would be great on the travel page, particularly from the airport to the venue. Also, knowing what time camp starts on Friday and ends on Monday would be great…..sure, they’re approximate….

* the hotels are good and cheap, therefore they cannot be fast.
With recent posts by Frank Mash and Stewart Smith about password protecting, migraine I am reminded of all the privacy vs. security arguments we have going on in the United States. Basically, infertility I see a somewhat similar situation — how much privacy do folks give up for the sake of security is analogous to how much calculation, how many hoops to jump through, to ensure that data is secured properly.

On the one hand, the analogy falls apart, because encryption calculation times are much less of an “inconvenience” than an invasion of privacy, and thus the argument gets usurped. It’s just a function, or a few calculations, no big deal. We all use SSH instead of telnet, and hopefully SFTP instead of FTP, because plaintext passwords are bad.

As a retort, most folks do not use SSL-enabled MySQL, and some do not even use SSL-enabled http. Why? Because it’s slow! Well, we do not want anything slow! But the security is worth the slowness! What? You mean people will go to another web site if yours is too slow? But the competitor is not as secure!!!!! So the analogy works there.

The analogy also works, when you consider how valuable the data is that you are attempting to lock up. Financial and health institutions need as high a level of encryption as possible for passwords, and any organization that stores a federal ID number should encrypt that. Nobody wants their privacy violated by having their health information exposed, their identity stolen, their bank accounts drained and their credit cards maxxed out.

Many people worry about things that are basically public information. For instance, in the US, the bank account number and the routing number are both printed on every check. Electronic Fund Transfers (EFTs) require three numbers — the checking account number, the bank routing number, and the amount. Or at least, that is all I type in. Perhaps my account stores address information and it is checked against that, but I am not asked for my billing address, as I am with a credit card transaction.

Some people guard their bank account number with extreme privacy, but it is in plain sight on the bottom of every paper check written!

Similarly, does it matter if someone cracks my password in some applications? How much damage could someone do if they got my password to a newspaper site. What are they going to do, write a letter to the editor or a comment? You’re not liable if someone cracks your server and then uses it to port-scan government agencies. It’s just a pain when you have to wipe your OS and start over. But no valuable information is lost, just time and patience.

At work, sure, they could get their hands on trade secrets if they cracked my desktop, laptop, VPN, or email password.

What about a dating site? What about a community forum? Should my password on forums.mysql.com be stored as tightly as my password on mysqlcamp.org? What does it matter if either password is cracked? Sure, if they try that same password on paypal, and I am dumb enough to use the same password for important data that I do for non-important data, my password will be stolen.

This is, to me, one of the greatest things about wikis. Sure, people have vandalized wikis, but it’s much more satisfying for folks to vandalize a site that’s not “open”. Someone is going to steal a new $3,000 bicycle that’s not locked up, but nobody is going to touch the old beater with a rusty frame, missing front wheel, flat back tire.

And of course, your application probably falls somewhere in between “everyone wants it” and “nobody wants it”. One of the things I say over and over in the presentations I give is “think about it.” Think about the security you need. Do a risk analysis. If you want your data secure, write it on a piece of paper and have recipients swallow it after they’ve read it. For any other security method, think about the gaps, and think about what really matters.

As a user, think about the ramifications of your passwords, as well. Many sites without “important” information will e-mail your password if you forget it. There it is, in plaintext for the world to intercept. I keep a few passwords at any given time (and change them every so often) — “really secure” ones, for financial institutions and such, “somewhat secure” ones, for things like blogging sites, and then “throwaway” ones, for sites where the info is not important, and I would suffer very little if my password is “cracked”.

—————————-
As well, by highlighting the encryption functions (MD5() and SHA1()) in MySQL, both articles imply that applications should call the encryption functions within MySQL. If an application is using encryption at all, it should be done as close to the user as possible. A client-side encryption such as a Javascript function is much better, security-wise, than using MySQL. You want to encrypt it *before* it goes over the network. If someone’s sniffing the network, then running SELECT nickname FROM myUser WHERE username='sheeri' and password=SHA1('kritzer'); is pointless — even if you salt the data. Someone can sniff the packet and find the plaintext ‘kritzer’ — either between the client’s browser and the web server, or between the web server and the database server.

Stewart’s post did not mention that a JOIN is actually the best way to go — JOIN the words table with the passwords on the salted hash, and then you can possibly retrieve a password. As well, if I were a cracker, I wouldn’t care about using a VIEW, I’d just add a “saltedmd5” column to my table, index it, and then JOIN the tables (creating an index on the table field if need be). Because he was comparing md5 sums, not actually trying to compare passwords through an application, it means he had a backdoor to make database calls, so we could indeed assume a JOIN is possible.

My desktop at work is a Windows machine. Why? Because it gives me what I need — shell access to servers so I can do real work on the machines, viagra dosage a text editor, information pills an e-mail client and a web browser. That’s really all I need to do my job. Sure, I could put in for a Macintosh or install a Unix variant. But if it gives me what I want, why would I spend all that work changing things around, just to ultimately get the same requirements — shell, web browser, text editor, e-mail client….????

I love MySQL, it’s a great database. But in order to meet its tenets, it has sacrificed features. When database religious wars start, it comes down to “MySQL stinks because it does not have the features,” or, nowadays, “It didn’t have [x feature] for a long time.” When that happens, my question is always, “If MySQL is so bad, why do so many people use it?”

Because it gives them the most important feature — SPEED. Speed is the #1 top priority in embedded databases, web applications and most desktop applications. (Am I missing a use of a database?) Companies will pay through the nose for training and licenses if it means their customers are happy because their product is speedy. (Meanwhile, MySQL is offering it very affordably, so folks do not have to pay through the nose.)

So to the folks who argue that MySQL stinks — I’ll agree, if you are talking about being feature-rich. However, MySQL has been growing in that department, so the argument is only relevant if you want to do a pivot table, or index a calculated field in a VIEW, or something complex like that. Perhaps MySQL isn’t appropriate for, say, a data warehouse. SQL Server is a better choice for that, as it has reporting modules and analysis wizards and all sorts of stuff.

MySQL is not perfect for everything, and it is not lousy for everything. But if you look at what most people need, it is speed. MySQL delivers that.

http://www.mysql.com/news-and-events/press-release/release_2006_35.html

MySQL won a contest, and it is proclaimed the fastest database.

I am happy, but I am not surprised. The basic tenets of MySQL are “fast, easy-to-use and bug-free”.

It is nice to know MySQL is actually meeting their goals. ‘Bug-free’ is not totally true, of course, but MySQL’s features are well-implemented. And being the fastest database is an achievement, even if it was one that was planned for.

We all know MySQL is the fastest. That’s why so many organizations have used it, even during the time when MyISAM was the only widely used storage engine. Before transactions, before fulltext indexing, before views and stored procedures and triggers, MySQL was widely used. A developer should not have to write transaction code in a programming language, but many organizations were happy to use bad coding techniques and zoom past their competitors with a speedy site.

Now the rest of the world knows that MySQL is the fastest. And, of course, MySQL is fast, easy-to-use, affordable AND full-featured. MySQL is doing a good job of marketing their new features, but another thing they should do is find out all the outdated information on websites and educate folks, that the arguments against MySQL are fewer and fewer as time goes on.
What is it about the folks on Planet MySQL having twin brothers?

Roland Bouman
Sheeri Kritzer
Jay Pipes
Zach Urlocker

That’s 4 of the top 25 posters to Planet MySQL. Anyone else want to reveal having a twin? Anyone on here have a twin sister? If you’re a twin and aren’t on the Planet, purchase note that here too……
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
On Wednesday night, cialis 40mg I did some consulting, physician and it ended up taking twice as long as I thought it would. So I am rewarding myself by going to MySQLCamp!

Speaking of which, pill I updated the home page, adding explicitly that registration is free, and a section on travel information. I have no idea how housing is being organized, or if it is, and I am happy to take the lead on doing so.

The basics are that there are some good, cheap* 3-star hotels not too far away. Cheap = under $100, I even found some in the $60 price range! I would love to get a sense of what folks are doing for lodging, and if folks want, I can work on getting a group discount (some hotels will arrange one for a minimum of 10 rooms), arranging a suite for the “evening track”, etc.

Currently stating that you’re interested does not require a commitment to get a hotel room. If we have critical mass, I can see what the options are, and folks can reserve a room for themselves or, if it’s easier/cheaper, I can make the reservations for folks.

Alternatively, if someone speaks up and says, “Silly Sheeri! It’s all taken care of!” please point me in the right direction.

Public transit information would be great on the travel page, particularly from the airport to the venue. Also, knowing what time camp starts on Friday and ends on Monday would be great…..sure, they’re approximate….

* the hotels are good and cheap, therefore they cannot be fast.
With recent posts by Frank Mash and Stewart Smith about password protecting, migraine I am reminded of all the privacy vs. security arguments we have going on in the United States. Basically, infertility I see a somewhat similar situation — how much privacy do folks give up for the sake of security is analogous to how much calculation, how many hoops to jump through, to ensure that data is secured properly.

On the one hand, the analogy falls apart, because encryption calculation times are much less of an “inconvenience” than an invasion of privacy, and thus the argument gets usurped. It’s just a function, or a few calculations, no big deal. We all use SSH instead of telnet, and hopefully SFTP instead of FTP, because plaintext passwords are bad.

As a retort, most folks do not use SSL-enabled MySQL, and some do not even use SSL-enabled http. Why? Because it’s slow! Well, we do not want anything slow! But the security is worth the slowness! What? You mean people will go to another web site if yours is too slow? But the competitor is not as secure!!!!! So the analogy works there.

The analogy also works, when you consider how valuable the data is that you are attempting to lock up. Financial and health institutions need as high a level of encryption as possible for passwords, and any organization that stores a federal ID number should encrypt that. Nobody wants their privacy violated by having their health information exposed, their identity stolen, their bank accounts drained and their credit cards maxxed out.

Many people worry about things that are basically public information. For instance, in the US, the bank account number and the routing number are both printed on every check. Electronic Fund Transfers (EFTs) require three numbers — the checking account number, the bank routing number, and the amount. Or at least, that is all I type in. Perhaps my account stores address information and it is checked against that, but I am not asked for my billing address, as I am with a credit card transaction.

Some people guard their bank account number with extreme privacy, but it is in plain sight on the bottom of every paper check written!

Similarly, does it matter if someone cracks my password in some applications? How much damage could someone do if they got my password to a newspaper site. What are they going to do, write a letter to the editor or a comment? You’re not liable if someone cracks your server and then uses it to port-scan government agencies. It’s just a pain when you have to wipe your OS and start over. But no valuable information is lost, just time and patience.

At work, sure, they could get their hands on trade secrets if they cracked my desktop, laptop, VPN, or email password.

What about a dating site? What about a community forum? Should my password on forums.mysql.com be stored as tightly as my password on mysqlcamp.org? What does it matter if either password is cracked? Sure, if they try that same password on paypal, and I am dumb enough to use the same password for important data that I do for non-important data, my password will be stolen.

This is, to me, one of the greatest things about wikis. Sure, people have vandalized wikis, but it’s much more satisfying for folks to vandalize a site that’s not “open”. Someone is going to steal a new $3,000 bicycle that’s not locked up, but nobody is going to touch the old beater with a rusty frame, missing front wheel, flat back tire.

And of course, your application probably falls somewhere in between “everyone wants it” and “nobody wants it”. One of the things I say over and over in the presentations I give is “think about it.” Think about the security you need. Do a risk analysis. If you want your data secure, write it on a piece of paper and have recipients swallow it after they’ve read it. For any other security method, think about the gaps, and think about what really matters.

As a user, think about the ramifications of your passwords, as well. Many sites without “important” information will e-mail your password if you forget it. There it is, in plaintext for the world to intercept. I keep a few passwords at any given time (and change them every so often) — “really secure” ones, for financial institutions and such, “somewhat secure” ones, for things like blogging sites, and then “throwaway” ones, for sites where the info is not important, and I would suffer very little if my password is “cracked”.

—————————-
As well, by highlighting the encryption functions (MD5() and SHA1()) in MySQL, both articles imply that applications should call the encryption functions within MySQL. If an application is using encryption at all, it should be done as close to the user as possible. A client-side encryption such as a Javascript function is much better, security-wise, than using MySQL. You want to encrypt it *before* it goes over the network. If someone’s sniffing the network, then running SELECT nickname FROM myUser WHERE username='sheeri' and password=SHA1('kritzer'); is pointless — even if you salt the data. Someone can sniff the packet and find the plaintext ‘kritzer’ — either between the client’s browser and the web server, or between the web server and the database server.

Stewart’s post did not mention that a JOIN is actually the best way to go — JOIN the words table with the passwords on the salted hash, and then you can possibly retrieve a password. As well, if I were a cracker, I wouldn’t care about using a VIEW, I’d just add a “saltedmd5” column to my table, index it, and then JOIN the tables (creating an index on the table field if need be). Because he was comparing md5 sums, not actually trying to compare passwords through an application, it means he had a backdoor to make database calls, so we could indeed assume a JOIN is possible.

My desktop at work is a Windows machine. Why? Because it gives me what I need — shell access to servers so I can do real work on the machines, viagra dosage a text editor, information pills an e-mail client and a web browser. That’s really all I need to do my job. Sure, I could put in for a Macintosh or install a Unix variant. But if it gives me what I want, why would I spend all that work changing things around, just to ultimately get the same requirements — shell, web browser, text editor, e-mail client….????

I love MySQL, it’s a great database. But in order to meet its tenets, it has sacrificed features. When database religious wars start, it comes down to “MySQL stinks because it does not have the features,” or, nowadays, “It didn’t have [x feature] for a long time.” When that happens, my question is always, “If MySQL is so bad, why do so many people use it?”

Because it gives them the most important feature — SPEED. Speed is the #1 top priority in embedded databases, web applications and most desktop applications. (Am I missing a use of a database?) Companies will pay through the nose for training and licenses if it means their customers are happy because their product is speedy. (Meanwhile, MySQL is offering it very affordably, so folks do not have to pay through the nose.)

So to the folks who argue that MySQL stinks — I’ll agree, if you are talking about being feature-rich. However, MySQL has been growing in that department, so the argument is only relevant if you want to do a pivot table, or index a calculated field in a VIEW, or something complex like that. Perhaps MySQL isn’t appropriate for, say, a data warehouse. SQL Server is a better choice for that, as it has reporting modules and analysis wizards and all sorts of stuff.

MySQL is not perfect for everything, and it is not lousy for everything. But if you look at what most people need, it is speed. MySQL delivers that.

http://www.mysql.com/news-and-events/press-release/release_2006_35.html

MySQL won a contest, and it is proclaimed the fastest database.

I am happy, but I am not surprised. The basic tenets of MySQL are “fast, easy-to-use and bug-free”.

It is nice to know MySQL is actually meeting their goals. ‘Bug-free’ is not totally true, of course, but MySQL’s features are well-implemented. And being the fastest database is an achievement, even if it was one that was planned for.

We all know MySQL is the fastest. That’s why so many organizations have used it, even during the time when MyISAM was the only widely used storage engine. Before transactions, before fulltext indexing, before views and stored procedures and triggers, MySQL was widely used. A developer should not have to write transaction code in a programming language, but many organizations were happy to use bad coding techniques and zoom past their competitors with a speedy site.

Now the rest of the world knows that MySQL is the fastest. And, of course, MySQL is fast, easy-to-use, affordable AND full-featured. MySQL is doing a good job of marketing their new features, but another thing they should do is find out all the outdated information on websites and educate folks, that the arguments against MySQL are fewer and fewer as time goes on.
What is it about the folks on Planet MySQL having twin brothers?

Roland Bouman
Sheeri Kritzer
Jay Pipes
Zach Urlocker

That’s 4 of the top 25 posters to Planet MySQL. Anyone else want to reveal having a twin? Anyone on here have a twin sister? If you’re a twin and aren’t on the Planet, purchase note that here too……
At the July MySQL User Group Meeting, cystitis Jim Starkey wondered aloud, “What happens when I COMMIT on a memory table?” I wrote the question down, to research it later.

The obvious answer is “COMMIT on a non-transactional table does nothing.”

Tonight I was thinking about this, and I realized I do not actually COMMIT “on a table.”

The manual page at: http://dev.mysql.com/doc/refman/4.1/en/commit.html (and the 5.0 and 5.1 equivalents) state:

By default, MySQL runs with autocommit mode enabled. This means that as soon as you execute a statement that updates (modifies) a table, MySQL stores the update on disk.

If you are using a transaction-safe storage engine (such as InnoDB, BDB, or NDB Cluster), you can disable autocommit mode with the following statement:

SET AUTOCOMMIT=0;

But what does If you are using a transaction-safe storage engine really mean? I ask because I do not specify a table type with that command. So I tried it on a fresh MySQL install (5.0.19 GA, on a RedHat Linux Enterprise machine):
Continue reading

“I Work Best Under Pressure”

Life has been super busy, web dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
On Wednesday night, cialis 40mg I did some consulting, physician and it ended up taking twice as long as I thought it would. So I am rewarding myself by going to MySQLCamp!

Speaking of which, pill I updated the home page, adding explicitly that registration is free, and a section on travel information. I have no idea how housing is being organized, or if it is, and I am happy to take the lead on doing so.

The basics are that there are some good, cheap* 3-star hotels not too far away. Cheap = under $100, I even found some in the $60 price range! I would love to get a sense of what folks are doing for lodging, and if folks want, I can work on getting a group discount (some hotels will arrange one for a minimum of 10 rooms), arranging a suite for the “evening track”, etc.

Currently stating that you’re interested does not require a commitment to get a hotel room. If we have critical mass, I can see what the options are, and folks can reserve a room for themselves or, if it’s easier/cheaper, I can make the reservations for folks.

Alternatively, if someone speaks up and says, “Silly Sheeri! It’s all taken care of!” please point me in the right direction.

Public transit information would be great on the travel page, particularly from the airport to the venue. Also, knowing what time camp starts on Friday and ends on Monday would be great…..sure, they’re approximate….

* the hotels are good and cheap, therefore they cannot be fast.
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
On Wednesday night, cialis 40mg I did some consulting, physician and it ended up taking twice as long as I thought it would. So I am rewarding myself by going to MySQLCamp!

Speaking of which, pill I updated the home page, adding explicitly that registration is free, and a section on travel information. I have no idea how housing is being organized, or if it is, and I am happy to take the lead on doing so.

The basics are that there are some good, cheap* 3-star hotels not too far away. Cheap = under $100, I even found some in the $60 price range! I would love to get a sense of what folks are doing for lodging, and if folks want, I can work on getting a group discount (some hotels will arrange one for a minimum of 10 rooms), arranging a suite for the “evening track”, etc.

Currently stating that you’re interested does not require a commitment to get a hotel room. If we have critical mass, I can see what the options are, and folks can reserve a room for themselves or, if it’s easier/cheaper, I can make the reservations for folks.

Alternatively, if someone speaks up and says, “Silly Sheeri! It’s all taken care of!” please point me in the right direction.

Public transit information would be great on the travel page, particularly from the airport to the venue. Also, knowing what time camp starts on Friday and ends on Monday would be great…..sure, they’re approximate….

* the hotels are good and cheap, therefore they cannot be fast.
With recent posts by Frank Mash and Stewart Smith about password protecting, migraine I am reminded of all the privacy vs. security arguments we have going on in the United States. Basically, infertility I see a somewhat similar situation — how much privacy do folks give up for the sake of security is analogous to how much calculation, how many hoops to jump through, to ensure that data is secured properly.

On the one hand, the analogy falls apart, because encryption calculation times are much less of an “inconvenience” than an invasion of privacy, and thus the argument gets usurped. It’s just a function, or a few calculations, no big deal. We all use SSH instead of telnet, and hopefully SFTP instead of FTP, because plaintext passwords are bad.

As a retort, most folks do not use SSL-enabled MySQL, and some do not even use SSL-enabled http. Why? Because it’s slow! Well, we do not want anything slow! But the security is worth the slowness! What? You mean people will go to another web site if yours is too slow? But the competitor is not as secure!!!!! So the analogy works there.

The analogy also works, when you consider how valuable the data is that you are attempting to lock up. Financial and health institutions need as high a level of encryption as possible for passwords, and any organization that stores a federal ID number should encrypt that. Nobody wants their privacy violated by having their health information exposed, their identity stolen, their bank accounts drained and their credit cards maxxed out.

Many people worry about things that are basically public information. For instance, in the US, the bank account number and the routing number are both printed on every check. Electronic Fund Transfers (EFTs) require three numbers — the checking account number, the bank routing number, and the amount. Or at least, that is all I type in. Perhaps my account stores address information and it is checked against that, but I am not asked for my billing address, as I am with a credit card transaction.

Some people guard their bank account number with extreme privacy, but it is in plain sight on the bottom of every paper check written!

Similarly, does it matter if someone cracks my password in some applications? How much damage could someone do if they got my password to a newspaper site. What are they going to do, write a letter to the editor or a comment? You’re not liable if someone cracks your server and then uses it to port-scan government agencies. It’s just a pain when you have to wipe your OS and start over. But no valuable information is lost, just time and patience.

At work, sure, they could get their hands on trade secrets if they cracked my desktop, laptop, VPN, or email password.

What about a dating site? What about a community forum? Should my password on forums.mysql.com be stored as tightly as my password on mysqlcamp.org? What does it matter if either password is cracked? Sure, if they try that same password on paypal, and I am dumb enough to use the same password for important data that I do for non-important data, my password will be stolen.

This is, to me, one of the greatest things about wikis. Sure, people have vandalized wikis, but it’s much more satisfying for folks to vandalize a site that’s not “open”. Someone is going to steal a new $3,000 bicycle that’s not locked up, but nobody is going to touch the old beater with a rusty frame, missing front wheel, flat back tire.

And of course, your application probably falls somewhere in between “everyone wants it” and “nobody wants it”. One of the things I say over and over in the presentations I give is “think about it.” Think about the security you need. Do a risk analysis. If you want your data secure, write it on a piece of paper and have recipients swallow it after they’ve read it. For any other security method, think about the gaps, and think about what really matters.

As a user, think about the ramifications of your passwords, as well. Many sites without “important” information will e-mail your password if you forget it. There it is, in plaintext for the world to intercept. I keep a few passwords at any given time (and change them every so often) — “really secure” ones, for financial institutions and such, “somewhat secure” ones, for things like blogging sites, and then “throwaway” ones, for sites where the info is not important, and I would suffer very little if my password is “cracked”.

—————————-
As well, by highlighting the encryption functions (MD5() and SHA1()) in MySQL, both articles imply that applications should call the encryption functions within MySQL. If an application is using encryption at all, it should be done as close to the user as possible. A client-side encryption such as a Javascript function is much better, security-wise, than using MySQL. You want to encrypt it *before* it goes over the network. If someone’s sniffing the network, then running SELECT nickname FROM myUser WHERE username='sheeri' and password=SHA1('kritzer'); is pointless — even if you salt the data. Someone can sniff the packet and find the plaintext ‘kritzer’ — either between the client’s browser and the web server, or between the web server and the database server.

Stewart’s post did not mention that a JOIN is actually the best way to go — JOIN the words table with the passwords on the salted hash, and then you can possibly retrieve a password. As well, if I were a cracker, I wouldn’t care about using a VIEW, I’d just add a “saltedmd5” column to my table, index it, and then JOIN the tables (creating an index on the table field if need be). Because he was comparing md5 sums, not actually trying to compare passwords through an application, it means he had a backdoor to make database calls, so we could indeed assume a JOIN is possible.

Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
On Wednesday night, cialis 40mg I did some consulting, physician and it ended up taking twice as long as I thought it would. So I am rewarding myself by going to MySQLCamp!

Speaking of which, pill I updated the home page, adding explicitly that registration is free, and a section on travel information. I have no idea how housing is being organized, or if it is, and I am happy to take the lead on doing so.

The basics are that there are some good, cheap* 3-star hotels not too far away. Cheap = under $100, I even found some in the $60 price range! I would love to get a sense of what folks are doing for lodging, and if folks want, I can work on getting a group discount (some hotels will arrange one for a minimum of 10 rooms), arranging a suite for the “evening track”, etc.

Currently stating that you’re interested does not require a commitment to get a hotel room. If we have critical mass, I can see what the options are, and folks can reserve a room for themselves or, if it’s easier/cheaper, I can make the reservations for folks.

Alternatively, if someone speaks up and says, “Silly Sheeri! It’s all taken care of!” please point me in the right direction.

Public transit information would be great on the travel page, particularly from the airport to the venue. Also, knowing what time camp starts on Friday and ends on Monday would be great…..sure, they’re approximate….

* the hotels are good and cheap, therefore they cannot be fast.
With recent posts by Frank Mash and Stewart Smith about password protecting, migraine I am reminded of all the privacy vs. security arguments we have going on in the United States. Basically, infertility I see a somewhat similar situation — how much privacy do folks give up for the sake of security is analogous to how much calculation, how many hoops to jump through, to ensure that data is secured properly.

On the one hand, the analogy falls apart, because encryption calculation times are much less of an “inconvenience” than an invasion of privacy, and thus the argument gets usurped. It’s just a function, or a few calculations, no big deal. We all use SSH instead of telnet, and hopefully SFTP instead of FTP, because plaintext passwords are bad.

As a retort, most folks do not use SSL-enabled MySQL, and some do not even use SSL-enabled http. Why? Because it’s slow! Well, we do not want anything slow! But the security is worth the slowness! What? You mean people will go to another web site if yours is too slow? But the competitor is not as secure!!!!! So the analogy works there.

The analogy also works, when you consider how valuable the data is that you are attempting to lock up. Financial and health institutions need as high a level of encryption as possible for passwords, and any organization that stores a federal ID number should encrypt that. Nobody wants their privacy violated by having their health information exposed, their identity stolen, their bank accounts drained and their credit cards maxxed out.

Many people worry about things that are basically public information. For instance, in the US, the bank account number and the routing number are both printed on every check. Electronic Fund Transfers (EFTs) require three numbers — the checking account number, the bank routing number, and the amount. Or at least, that is all I type in. Perhaps my account stores address information and it is checked against that, but I am not asked for my billing address, as I am with a credit card transaction.

Some people guard their bank account number with extreme privacy, but it is in plain sight on the bottom of every paper check written!

Similarly, does it matter if someone cracks my password in some applications? How much damage could someone do if they got my password to a newspaper site. What are they going to do, write a letter to the editor or a comment? You’re not liable if someone cracks your server and then uses it to port-scan government agencies. It’s just a pain when you have to wipe your OS and start over. But no valuable information is lost, just time and patience.

At work, sure, they could get their hands on trade secrets if they cracked my desktop, laptop, VPN, or email password.

What about a dating site? What about a community forum? Should my password on forums.mysql.com be stored as tightly as my password on mysqlcamp.org? What does it matter if either password is cracked? Sure, if they try that same password on paypal, and I am dumb enough to use the same password for important data that I do for non-important data, my password will be stolen.

This is, to me, one of the greatest things about wikis. Sure, people have vandalized wikis, but it’s much more satisfying for folks to vandalize a site that’s not “open”. Someone is going to steal a new $3,000 bicycle that’s not locked up, but nobody is going to touch the old beater with a rusty frame, missing front wheel, flat back tire.

And of course, your application probably falls somewhere in between “everyone wants it” and “nobody wants it”. One of the things I say over and over in the presentations I give is “think about it.” Think about the security you need. Do a risk analysis. If you want your data secure, write it on a piece of paper and have recipients swallow it after they’ve read it. For any other security method, think about the gaps, and think about what really matters.

As a user, think about the ramifications of your passwords, as well. Many sites without “important” information will e-mail your password if you forget it. There it is, in plaintext for the world to intercept. I keep a few passwords at any given time (and change them every so often) — “really secure” ones, for financial institutions and such, “somewhat secure” ones, for things like blogging sites, and then “throwaway” ones, for sites where the info is not important, and I would suffer very little if my password is “cracked”.

—————————-
As well, by highlighting the encryption functions (MD5() and SHA1()) in MySQL, both articles imply that applications should call the encryption functions within MySQL. If an application is using encryption at all, it should be done as close to the user as possible. A client-side encryption such as a Javascript function is much better, security-wise, than using MySQL. You want to encrypt it *before* it goes over the network. If someone’s sniffing the network, then running SELECT nickname FROM myUser WHERE username='sheeri' and password=SHA1('kritzer'); is pointless — even if you salt the data. Someone can sniff the packet and find the plaintext ‘kritzer’ — either between the client’s browser and the web server, or between the web server and the database server.

Stewart’s post did not mention that a JOIN is actually the best way to go — JOIN the words table with the passwords on the salted hash, and then you can possibly retrieve a password. As well, if I were a cracker, I wouldn’t care about using a VIEW, I’d just add a “saltedmd5” column to my table, index it, and then JOIN the tables (creating an index on the table field if need be). Because he was comparing md5 sums, not actually trying to compare passwords through an application, it means he had a backdoor to make database calls, so we could indeed assume a JOIN is possible.

My desktop at work is a Windows machine. Why? Because it gives me what I need — shell access to servers so I can do real work on the machines, viagra dosage a text editor, information pills an e-mail client and a web browser. That’s really all I need to do my job. Sure, I could put in for a Macintosh or install a Unix variant. But if it gives me what I want, why would I spend all that work changing things around, just to ultimately get the same requirements — shell, web browser, text editor, e-mail client….????

I love MySQL, it’s a great database. But in order to meet its tenets, it has sacrificed features. When database religious wars start, it comes down to “MySQL stinks because it does not have the features,” or, nowadays, “It didn’t have [x feature] for a long time.” When that happens, my question is always, “If MySQL is so bad, why do so many people use it?”

Because it gives them the most important feature — SPEED. Speed is the #1 top priority in embedded databases, web applications and most desktop applications. (Am I missing a use of a database?) Companies will pay through the nose for training and licenses if it means their customers are happy because their product is speedy. (Meanwhile, MySQL is offering it very affordably, so folks do not have to pay through the nose.)

So to the folks who argue that MySQL stinks — I’ll agree, if you are talking about being feature-rich. However, MySQL has been growing in that department, so the argument is only relevant if you want to do a pivot table, or index a calculated field in a VIEW, or something complex like that. Perhaps MySQL isn’t appropriate for, say, a data warehouse. SQL Server is a better choice for that, as it has reporting modules and analysis wizards and all sorts of stuff.

MySQL is not perfect for everything, and it is not lousy for everything. But if you look at what most people need, it is speed. MySQL delivers that.

http://www.mysql.com/news-and-events/press-release/release_2006_35.html

MySQL won a contest, and it is proclaimed the fastest database.

I am happy, but I am not surprised. The basic tenets of MySQL are “fast, easy-to-use and bug-free”.

It is nice to know MySQL is actually meeting their goals. ‘Bug-free’ is not totally true, of course, but MySQL’s features are well-implemented. And being the fastest database is an achievement, even if it was one that was planned for.

We all know MySQL is the fastest. That’s why so many organizations have used it, even during the time when MyISAM was the only widely used storage engine. Before transactions, before fulltext indexing, before views and stored procedures and triggers, MySQL was widely used. A developer should not have to write transaction code in a programming language, but many organizations were happy to use bad coding techniques and zoom past their competitors with a speedy site.

Now the rest of the world knows that MySQL is the fastest. And, of course, MySQL is fast, easy-to-use, affordable AND full-featured. MySQL is doing a good job of marketing their new features, but another thing they should do is find out all the outdated information on websites and educate folks, that the arguments against MySQL are fewer and fewer as time goes on.
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
On Wednesday night, cialis 40mg I did some consulting, physician and it ended up taking twice as long as I thought it would. So I am rewarding myself by going to MySQLCamp!

Speaking of which, pill I updated the home page, adding explicitly that registration is free, and a section on travel information. I have no idea how housing is being organized, or if it is, and I am happy to take the lead on doing so.

The basics are that there are some good, cheap* 3-star hotels not too far away. Cheap = under $100, I even found some in the $60 price range! I would love to get a sense of what folks are doing for lodging, and if folks want, I can work on getting a group discount (some hotels will arrange one for a minimum of 10 rooms), arranging a suite for the “evening track”, etc.

Currently stating that you’re interested does not require a commitment to get a hotel room. If we have critical mass, I can see what the options are, and folks can reserve a room for themselves or, if it’s easier/cheaper, I can make the reservations for folks.

Alternatively, if someone speaks up and says, “Silly Sheeri! It’s all taken care of!” please point me in the right direction.

Public transit information would be great on the travel page, particularly from the airport to the venue. Also, knowing what time camp starts on Friday and ends on Monday would be great…..sure, they’re approximate….

* the hotels are good and cheap, therefore they cannot be fast.
With recent posts by Frank Mash and Stewart Smith about password protecting, migraine I am reminded of all the privacy vs. security arguments we have going on in the United States. Basically, infertility I see a somewhat similar situation — how much privacy do folks give up for the sake of security is analogous to how much calculation, how many hoops to jump through, to ensure that data is secured properly.

On the one hand, the analogy falls apart, because encryption calculation times are much less of an “inconvenience” than an invasion of privacy, and thus the argument gets usurped. It’s just a function, or a few calculations, no big deal. We all use SSH instead of telnet, and hopefully SFTP instead of FTP, because plaintext passwords are bad.

As a retort, most folks do not use SSL-enabled MySQL, and some do not even use SSL-enabled http. Why? Because it’s slow! Well, we do not want anything slow! But the security is worth the slowness! What? You mean people will go to another web site if yours is too slow? But the competitor is not as secure!!!!! So the analogy works there.

The analogy also works, when you consider how valuable the data is that you are attempting to lock up. Financial and health institutions need as high a level of encryption as possible for passwords, and any organization that stores a federal ID number should encrypt that. Nobody wants their privacy violated by having their health information exposed, their identity stolen, their bank accounts drained and their credit cards maxxed out.

Many people worry about things that are basically public information. For instance, in the US, the bank account number and the routing number are both printed on every check. Electronic Fund Transfers (EFTs) require three numbers — the checking account number, the bank routing number, and the amount. Or at least, that is all I type in. Perhaps my account stores address information and it is checked against that, but I am not asked for my billing address, as I am with a credit card transaction.

Some people guard their bank account number with extreme privacy, but it is in plain sight on the bottom of every paper check written!

Similarly, does it matter if someone cracks my password in some applications? How much damage could someone do if they got my password to a newspaper site. What are they going to do, write a letter to the editor or a comment? You’re not liable if someone cracks your server and then uses it to port-scan government agencies. It’s just a pain when you have to wipe your OS and start over. But no valuable information is lost, just time and patience.

At work, sure, they could get their hands on trade secrets if they cracked my desktop, laptop, VPN, or email password.

What about a dating site? What about a community forum? Should my password on forums.mysql.com be stored as tightly as my password on mysqlcamp.org? What does it matter if either password is cracked? Sure, if they try that same password on paypal, and I am dumb enough to use the same password for important data that I do for non-important data, my password will be stolen.

This is, to me, one of the greatest things about wikis. Sure, people have vandalized wikis, but it’s much more satisfying for folks to vandalize a site that’s not “open”. Someone is going to steal a new $3,000 bicycle that’s not locked up, but nobody is going to touch the old beater with a rusty frame, missing front wheel, flat back tire.

And of course, your application probably falls somewhere in between “everyone wants it” and “nobody wants it”. One of the things I say over and over in the presentations I give is “think about it.” Think about the security you need. Do a risk analysis. If you want your data secure, write it on a piece of paper and have recipients swallow it after they’ve read it. For any other security method, think about the gaps, and think about what really matters.

As a user, think about the ramifications of your passwords, as well. Many sites without “important” information will e-mail your password if you forget it. There it is, in plaintext for the world to intercept. I keep a few passwords at any given time (and change them every so often) — “really secure” ones, for financial institutions and such, “somewhat secure” ones, for things like blogging sites, and then “throwaway” ones, for sites where the info is not important, and I would suffer very little if my password is “cracked”.

—————————-
As well, by highlighting the encryption functions (MD5() and SHA1()) in MySQL, both articles imply that applications should call the encryption functions within MySQL. If an application is using encryption at all, it should be done as close to the user as possible. A client-side encryption such as a Javascript function is much better, security-wise, than using MySQL. You want to encrypt it *before* it goes over the network. If someone’s sniffing the network, then running SELECT nickname FROM myUser WHERE username='sheeri' and password=SHA1('kritzer'); is pointless — even if you salt the data. Someone can sniff the packet and find the plaintext ‘kritzer’ — either between the client’s browser and the web server, or between the web server and the database server.

Stewart’s post did not mention that a JOIN is actually the best way to go — JOIN the words table with the passwords on the salted hash, and then you can possibly retrieve a password. As well, if I were a cracker, I wouldn’t care about using a VIEW, I’d just add a “saltedmd5” column to my table, index it, and then JOIN the tables (creating an index on the table field if need be). Because he was comparing md5 sums, not actually trying to compare passwords through an application, it means he had a backdoor to make database calls, so we could indeed assume a JOIN is possible.

My desktop at work is a Windows machine. Why? Because it gives me what I need — shell access to servers so I can do real work on the machines, viagra dosage a text editor, information pills an e-mail client and a web browser. That’s really all I need to do my job. Sure, I could put in for a Macintosh or install a Unix variant. But if it gives me what I want, why would I spend all that work changing things around, just to ultimately get the same requirements — shell, web browser, text editor, e-mail client….????

I love MySQL, it’s a great database. But in order to meet its tenets, it has sacrificed features. When database religious wars start, it comes down to “MySQL stinks because it does not have the features,” or, nowadays, “It didn’t have [x feature] for a long time.” When that happens, my question is always, “If MySQL is so bad, why do so many people use it?”

Because it gives them the most important feature — SPEED. Speed is the #1 top priority in embedded databases, web applications and most desktop applications. (Am I missing a use of a database?) Companies will pay through the nose for training and licenses if it means their customers are happy because their product is speedy. (Meanwhile, MySQL is offering it very affordably, so folks do not have to pay through the nose.)

So to the folks who argue that MySQL stinks — I’ll agree, if you are talking about being feature-rich. However, MySQL has been growing in that department, so the argument is only relevant if you want to do a pivot table, or index a calculated field in a VIEW, or something complex like that. Perhaps MySQL isn’t appropriate for, say, a data warehouse. SQL Server is a better choice for that, as it has reporting modules and analysis wizards and all sorts of stuff.

MySQL is not perfect for everything, and it is not lousy for everything. But if you look at what most people need, it is speed. MySQL delivers that.

http://www.mysql.com/news-and-events/press-release/release_2006_35.html

MySQL won a contest, and it is proclaimed the fastest database.

I am happy, but I am not surprised. The basic tenets of MySQL are “fast, easy-to-use and bug-free”.

It is nice to know MySQL is actually meeting their goals. ‘Bug-free’ is not totally true, of course, but MySQL’s features are well-implemented. And being the fastest database is an achievement, even if it was one that was planned for.

We all know MySQL is the fastest. That’s why so many organizations have used it, even during the time when MyISAM was the only widely used storage engine. Before transactions, before fulltext indexing, before views and stored procedures and triggers, MySQL was widely used. A developer should not have to write transaction code in a programming language, but many organizations were happy to use bad coding techniques and zoom past their competitors with a speedy site.

Now the rest of the world knows that MySQL is the fastest. And, of course, MySQL is fast, easy-to-use, affordable AND full-featured. MySQL is doing a good job of marketing their new features, but another thing they should do is find out all the outdated information on websites and educate folks, that the arguments against MySQL are fewer and fewer as time goes on.
What is it about the folks on Planet MySQL having twin brothers?

Roland Bouman
Sheeri Kritzer
Jay Pipes
Zach Urlocker

That’s 4 of the top 25 posters to Planet MySQL. Anyone else want to reveal having a twin? Anyone on here have a twin sister? If you’re a twin and aren’t on the Planet, purchase note that here too……
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
On Wednesday night, cialis 40mg I did some consulting, physician and it ended up taking twice as long as I thought it would. So I am rewarding myself by going to MySQLCamp!

Speaking of which, pill I updated the home page, adding explicitly that registration is free, and a section on travel information. I have no idea how housing is being organized, or if it is, and I am happy to take the lead on doing so.

The basics are that there are some good, cheap* 3-star hotels not too far away. Cheap = under $100, I even found some in the $60 price range! I would love to get a sense of what folks are doing for lodging, and if folks want, I can work on getting a group discount (some hotels will arrange one for a minimum of 10 rooms), arranging a suite for the “evening track”, etc.

Currently stating that you’re interested does not require a commitment to get a hotel room. If we have critical mass, I can see what the options are, and folks can reserve a room for themselves or, if it’s easier/cheaper, I can make the reservations for folks.

Alternatively, if someone speaks up and says, “Silly Sheeri! It’s all taken care of!” please point me in the right direction.

Public transit information would be great on the travel page, particularly from the airport to the venue. Also, knowing what time camp starts on Friday and ends on Monday would be great…..sure, they’re approximate….

* the hotels are good and cheap, therefore they cannot be fast.
With recent posts by Frank Mash and Stewart Smith about password protecting, migraine I am reminded of all the privacy vs. security arguments we have going on in the United States. Basically, infertility I see a somewhat similar situation — how much privacy do folks give up for the sake of security is analogous to how much calculation, how many hoops to jump through, to ensure that data is secured properly.

On the one hand, the analogy falls apart, because encryption calculation times are much less of an “inconvenience” than an invasion of privacy, and thus the argument gets usurped. It’s just a function, or a few calculations, no big deal. We all use SSH instead of telnet, and hopefully SFTP instead of FTP, because plaintext passwords are bad.

As a retort, most folks do not use SSL-enabled MySQL, and some do not even use SSL-enabled http. Why? Because it’s slow! Well, we do not want anything slow! But the security is worth the slowness! What? You mean people will go to another web site if yours is too slow? But the competitor is not as secure!!!!! So the analogy works there.

The analogy also works, when you consider how valuable the data is that you are attempting to lock up. Financial and health institutions need as high a level of encryption as possible for passwords, and any organization that stores a federal ID number should encrypt that. Nobody wants their privacy violated by having their health information exposed, their identity stolen, their bank accounts drained and their credit cards maxxed out.

Many people worry about things that are basically public information. For instance, in the US, the bank account number and the routing number are both printed on every check. Electronic Fund Transfers (EFTs) require three numbers — the checking account number, the bank routing number, and the amount. Or at least, that is all I type in. Perhaps my account stores address information and it is checked against that, but I am not asked for my billing address, as I am with a credit card transaction.

Some people guard their bank account number with extreme privacy, but it is in plain sight on the bottom of every paper check written!

Similarly, does it matter if someone cracks my password in some applications? How much damage could someone do if they got my password to a newspaper site. What are they going to do, write a letter to the editor or a comment? You’re not liable if someone cracks your server and then uses it to port-scan government agencies. It’s just a pain when you have to wipe your OS and start over. But no valuable information is lost, just time and patience.

At work, sure, they could get their hands on trade secrets if they cracked my desktop, laptop, VPN, or email password.

What about a dating site? What about a community forum? Should my password on forums.mysql.com be stored as tightly as my password on mysqlcamp.org? What does it matter if either password is cracked? Sure, if they try that same password on paypal, and I am dumb enough to use the same password for important data that I do for non-important data, my password will be stolen.

This is, to me, one of the greatest things about wikis. Sure, people have vandalized wikis, but it’s much more satisfying for folks to vandalize a site that’s not “open”. Someone is going to steal a new $3,000 bicycle that’s not locked up, but nobody is going to touch the old beater with a rusty frame, missing front wheel, flat back tire.

And of course, your application probably falls somewhere in between “everyone wants it” and “nobody wants it”. One of the things I say over and over in the presentations I give is “think about it.” Think about the security you need. Do a risk analysis. If you want your data secure, write it on a piece of paper and have recipients swallow it after they’ve read it. For any other security method, think about the gaps, and think about what really matters.

As a user, think about the ramifications of your passwords, as well. Many sites without “important” information will e-mail your password if you forget it. There it is, in plaintext for the world to intercept. I keep a few passwords at any given time (and change them every so often) — “really secure” ones, for financial institutions and such, “somewhat secure” ones, for things like blogging sites, and then “throwaway” ones, for sites where the info is not important, and I would suffer very little if my password is “cracked”.

—————————-
As well, by highlighting the encryption functions (MD5() and SHA1()) in MySQL, both articles imply that applications should call the encryption functions within MySQL. If an application is using encryption at all, it should be done as close to the user as possible. A client-side encryption such as a Javascript function is much better, security-wise, than using MySQL. You want to encrypt it *before* it goes over the network. If someone’s sniffing the network, then running SELECT nickname FROM myUser WHERE username='sheeri' and password=SHA1('kritzer'); is pointless — even if you salt the data. Someone can sniff the packet and find the plaintext ‘kritzer’ — either between the client’s browser and the web server, or between the web server and the database server.

Stewart’s post did not mention that a JOIN is actually the best way to go — JOIN the words table with the passwords on the salted hash, and then you can possibly retrieve a password. As well, if I were a cracker, I wouldn’t care about using a VIEW, I’d just add a “saltedmd5” column to my table, index it, and then JOIN the tables (creating an index on the table field if need be). Because he was comparing md5 sums, not actually trying to compare passwords through an application, it means he had a backdoor to make database calls, so we could indeed assume a JOIN is possible.

My desktop at work is a Windows machine. Why? Because it gives me what I need — shell access to servers so I can do real work on the machines, viagra dosage a text editor, information pills an e-mail client and a web browser. That’s really all I need to do my job. Sure, I could put in for a Macintosh or install a Unix variant. But if it gives me what I want, why would I spend all that work changing things around, just to ultimately get the same requirements — shell, web browser, text editor, e-mail client….????

I love MySQL, it’s a great database. But in order to meet its tenets, it has sacrificed features. When database religious wars start, it comes down to “MySQL stinks because it does not have the features,” or, nowadays, “It didn’t have [x feature] for a long time.” When that happens, my question is always, “If MySQL is so bad, why do so many people use it?”

Because it gives them the most important feature — SPEED. Speed is the #1 top priority in embedded databases, web applications and most desktop applications. (Am I missing a use of a database?) Companies will pay through the nose for training and licenses if it means their customers are happy because their product is speedy. (Meanwhile, MySQL is offering it very affordably, so folks do not have to pay through the nose.)

So to the folks who argue that MySQL stinks — I’ll agree, if you are talking about being feature-rich. However, MySQL has been growing in that department, so the argument is only relevant if you want to do a pivot table, or index a calculated field in a VIEW, or something complex like that. Perhaps MySQL isn’t appropriate for, say, a data warehouse. SQL Server is a better choice for that, as it has reporting modules and analysis wizards and all sorts of stuff.

MySQL is not perfect for everything, and it is not lousy for everything. But if you look at what most people need, it is speed. MySQL delivers that.

http://www.mysql.com/news-and-events/press-release/release_2006_35.html

MySQL won a contest, and it is proclaimed the fastest database.

I am happy, but I am not surprised. The basic tenets of MySQL are “fast, easy-to-use and bug-free”.

It is nice to know MySQL is actually meeting their goals. ‘Bug-free’ is not totally true, of course, but MySQL’s features are well-implemented. And being the fastest database is an achievement, even if it was one that was planned for.

We all know MySQL is the fastest. That’s why so many organizations have used it, even during the time when MyISAM was the only widely used storage engine. Before transactions, before fulltext indexing, before views and stored procedures and triggers, MySQL was widely used. A developer should not have to write transaction code in a programming language, but many organizations were happy to use bad coding techniques and zoom past their competitors with a speedy site.

Now the rest of the world knows that MySQL is the fastest. And, of course, MySQL is fast, easy-to-use, affordable AND full-featured. MySQL is doing a good job of marketing their new features, but another thing they should do is find out all the outdated information on websites and educate folks, that the arguments against MySQL are fewer and fewer as time goes on.
What is it about the folks on Planet MySQL having twin brothers?

Roland Bouman
Sheeri Kritzer
Jay Pipes
Zach Urlocker

That’s 4 of the top 25 posters to Planet MySQL. Anyone else want to reveal having a twin? Anyone on here have a twin sister? If you’re a twin and aren’t on the Planet, purchase note that here too……
At the July MySQL User Group Meeting, cystitis Jim Starkey wondered aloud, “What happens when I COMMIT on a memory table?” I wrote the question down, to research it later.

The obvious answer is “COMMIT on a non-transactional table does nothing.”

Tonight I was thinking about this, and I realized I do not actually COMMIT “on a table.”

The manual page at: http://dev.mysql.com/doc/refman/4.1/en/commit.html (and the 5.0 and 5.1 equivalents) state:

By default, MySQL runs with autocommit mode enabled. This means that as soon as you execute a statement that updates (modifies) a table, MySQL stores the update on disk.

If you are using a transaction-safe storage engine (such as InnoDB, BDB, or NDB Cluster), you can disable autocommit mode with the following statement:

SET AUTOCOMMIT=0;

But what does If you are using a transaction-safe storage engine really mean? I ask because I do not specify a table type with that command. So I tried it on a fresh MySQL install (5.0.19 GA, on a RedHat Linux Enterprise machine):
Continue reading

Vote For Something That Matters To You

Life has been super busy, web dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
On Wednesday night, cialis 40mg I did some consulting, physician and it ended up taking twice as long as I thought it would. So I am rewarding myself by going to MySQLCamp!

Speaking of which, pill I updated the home page, adding explicitly that registration is free, and a section on travel information. I have no idea how housing is being organized, or if it is, and I am happy to take the lead on doing so.

The basics are that there are some good, cheap* 3-star hotels not too far away. Cheap = under $100, I even found some in the $60 price range! I would love to get a sense of what folks are doing for lodging, and if folks want, I can work on getting a group discount (some hotels will arrange one for a minimum of 10 rooms), arranging a suite for the “evening track”, etc.

Currently stating that you’re interested does not require a commitment to get a hotel room. If we have critical mass, I can see what the options are, and folks can reserve a room for themselves or, if it’s easier/cheaper, I can make the reservations for folks.

Alternatively, if someone speaks up and says, “Silly Sheeri! It’s all taken care of!” please point me in the right direction.

Public transit information would be great on the travel page, particularly from the airport to the venue. Also, knowing what time camp starts on Friday and ends on Monday would be great…..sure, they’re approximate….

* the hotels are good and cheap, therefore they cannot be fast.
Life has been super busy, dosage recuperation but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Not much more to add to the wonderful posts:

How to Kill Good Ideas

How to Come Up With Good Ideas

Supporting Ideas and Being Productive

and
Yet More Ways to Kill Great Ideas

However, prescription while not great in quantity, I think one of the most important points has been completely left out:

Don’t have ego.

An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”

Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.

It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.

Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”

The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”

The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Since OSCON, tablets most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”

Y’all are prolific!

Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”

Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.

Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because

there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.

So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?

I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.

When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.

Restaurants publish cookbooks, yet they do not go out of business. Why is that?

It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.

It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.

I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual

Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.

I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?

Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.

Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.

So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.

I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.

There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.

Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
http://www.artfulsoftware.com has a “Common Queries” page, what is ed which I find rather useful. I hadn’t realized its history, as described on the homepage:

Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.

One day, I clicked on the page and got the dreaded “blank” PHP page.

This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.

Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.

The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.

I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.

I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).

Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.

This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.

* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Back at the MySQL Users Conference, pilule I was talking to Monty about a good PHP* interface to MySQL that would go through a database, capsule and make pages to be able to search, order update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.

As far as I can tell from here, here, here and here:

  • Unireg started as a curses-based interface to an SQL database.
  • Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
  • Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports

Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..

It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.

I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.

So what is your favorite program to automatically generate a database ui? Specifically, it should:

  • Generate web pages in PHP*,
  • Automatically connect to the database,
  • Allow for easy specification of join tables based on (a) field(s),
  • Allow for easy selecting of all, none, or some table fields,
  • Not require that fields in the join condition be shown.
  • Allow the “view”, “edit” and “add” pages to show different fields

I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.

* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Most developers are used to programming in procedural or object-oriented languages. SQL, drugs as a declarative language, denture is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.

That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).

In no particular order:

1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b

2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.

3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.

4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.

5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.

6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.

7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.

8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.

(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
(also entitled, mind “Who Put the J in lam-a-lam-a-LAMJ?”)

So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):

MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.

Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .

JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?

So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )

Continuent using “Java”

Continuent uses “JBoss/J2EE”

Continuent uses “JSP/J2EE”

They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?

But then why does the Stored Procedures book use “JBoss”?
August 23rd was the first proposed date for DBA Day since, somnology apparently, treat DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.

I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.

However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.

I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.

I got a smiley from the sysadmin yesterday, so I’ll share it here:

[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component

(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
On Wednesday night, cialis 40mg I did some consulting, physician and it ended up taking twice as long as I thought it would. So I am rewarding myself by going to MySQLCamp!

Speaking of which, pill I updated the home page, adding explicitly that registration is free, and a section on travel information. I have no idea how housing is being organized, or if it is, and I am happy to take the lead on doing so.

The basics are that there are some good, cheap* 3-star hotels not too far away. Cheap = under $100, I even found some in the $60 price range! I would love to get a sense of what folks are doing for lodging, and if folks want, I can work on getting a group discount (some hotels will arrange one for a minimum of 10 rooms), arranging a suite for the “evening track”, etc.

Currently stating that you’re interested does not require a commitment to get a hotel room. If we have critical mass, I can see what the options are, and folks can reserve a room for themselves or, if it’s easier/cheaper, I can make the reservations for folks.

Alternatively, if someone speaks up and says, “Silly Sheeri! It’s all taken care of!” please point me in the right direction.

Public transit information would be great on the travel page, particularly from the airport to the venue. Also, knowing what time camp starts on Friday and ends on Monday would be great…..sure, they’re approximate….

* the hotels are good and cheap, therefore they cannot be fast.
With recent posts by Frank Mash and Stewart Smith about password protecting, migraine I am reminded of all the privacy vs. security arguments we have going on in the United States. Basically, infertility I see a somewhat similar situation — how much privacy do folks give up for the sake of security is analogous to how much calculation, how many hoops to jump through, to ensure that data is secured properly.

On the one hand, the analogy falls apart, because encryption calculation times are much less of an “inconvenience” than an invasion of privacy, and thus the argument gets usurped. It’s just a function, or a few calculations, no big deal. We all use SSH instead of telnet, and hopefully SFTP instead of FTP, because plaintext passwords are bad.

As a retort, most folks do not use SSL-enabled MySQL, and some do not even use SSL-enabled http. Why? Because it’s slow! Well, we do not want anything slow! But the security is worth the slowness! What? You mean people will go to another web site if yours is too slow? But the competitor is not as secure!!!!! So the analogy works there.

The analogy also works, when you consider how valuable the data is that you are attempting to lock up. Financial and health institutions need as high a level of encryption as possible for passwords, and any organization that stores a federal ID number should encrypt that. Nobody wants their privacy vio