Category Archives: Security

April 2007 Boston MySQL User Group Video

Using MySQL As Active DBMS for Monitoring Applications — Jacob Nikom.

Jacob presented this as a special preview at the April 2007 Boston MySQL User Group, and then presented it at the 2007 MySQL Users Conference and Expo.

The last in the “better late than never” series….

Download from http://www.technocation.org/videos/2007_04BostonUserGroup.wmv
or view right here:

MySQL Security Presentation at Boston MySQL User Group Meeting

The February Boston MySQL User Group meeting was great! I spoke about MySQL security; you can now download the slides and the video. I continue to be impressed with the sound quality of the video camera I have, but you can clearly hear it in the audio (well, I could when I was wearing headphones, but I also have pretty bad hearing).

Special thanks to http://technocation.org for hosting the bandwidth for the videos.

Topics covered in the talk:
ACLs
Test dbs & anonymous accounts
OS files and permissions
Application data flow
SQL Injection
XSS (Cross-site scripting)

PDF of slides (1.4M):
http://www.sheeri.com/presentations/MySQLSecurity2007_02_08.pdf

Slides in Flash (107K):
http://www.sheeri.com/presentations/MySQLSecurity2007_02_08.swf

Video of presentation (large, 289M)
http://technocation.org/videos/original/mysqlsecurity2007_02_08large.wmv

Video of presentation (small, 27M)
http://technocation.org/videos/original/mysqlsecurity2007_02_08small.wmv

User Group Video Up, and Video Camera Review

Download the video at:

http://technocation.org/videos/original/BostonMySQLJanUserGroupBrianAkerLg.wmv
– 520 kbps, 320 x 240, 354M. Small size, low quality, but you can still see the slides and hear everything.

http://technocation.org/videos/original/BostonMySQLJanUserGroupBrianAkerSm.wmv – 45 kbps, 320 x 120, 29M. Small size, low quality, but you can still see the slides and hear everything.

Technocation, Inc. received a donation of a Sony Handycam DCR SR80 (http://tinyurl.com/yvyfam ), extra-long battery, microphone (proprietary Sony that goes with the camera).

In a short sentence: I am impressed. The sound quality (on the large version) is almost exactly what I heard. Granted, I have some hearing loss, but I forgot to bring the microphone, and you can still hear audience questions very well. The video quality is great, too. The hard disk is perfect, because files can be copied over or burned directly to DVD. It records in MPEG-4 format.

The 1 hour 38 minute talk took up less than 6 gigs of space raw (I forget how much exactly, but it cuts the files into 2G chunks, and there were 3). This gives at least 10 hours of recording time before needing to dump to disk. This is a very exciting prospect for the MySQL Conference and Expo at the end of April, I’ll be able to video a LOT.

Special thanks go to the User Group member (who may wish to remain anonymous, but I forgot his name anyway, so if he wishes to comment he can, or just e-mail me so I remember your name!) who talked to me about codecs and which programs to use, because they worked!

I was not quite ready for the start of the User Group, and had to run out to my car to get the tripod, so the first minute or so (until 1:25) is me setting up the tripod — I apologize for the movement.

You can see the “Night Shot” functionality early on, when I focus on Brian and turn it on. It does a great job, but loses a lot of color (1:53 until 1:59).

I was disappointed that when you connect the DC power supply, the video stops (so there’s a few hops int here).

Brian takes some slides, starts talking, and questions ensue. The basic slides were about MySQL’s internal architecture.

Some links:
MySQL and dual-master/circular replication
There’s a great article by Guiseppe Maxia at: http://www.onlamp.com/pub/a/onlamp/2006/04/20/advanced-mysql-replication.html

And a free chapter on Replication from Jeremy Zawodny’s
http://www.oreilly.com/catalog/hpmysql/chapter/ch07.pdf

Around 27:00 there is a reference to Jim Gray’s “Black Book”, which is entitled “Transaction Processing: Concepts and Techniques” and can be found here: http://tinyurl.com/2md3tb

http://forge.mysql.com/wiki/Top10SQLPerformanceTips

The Trend of Managed Schemas: A Database is Not a Messaging System

This thread on the Boston MySQL User Group Board is getting interesting:
http://mysql.meetup.com/137/boards/view/viewthread?thread=2280640

(From the original poster:)

I think that nonequivalence comes from the past when the data sharing was a
rare situation. The data changes were always initiated by application and it
always knew about those changes. Now the situation is different. When the data
are shared between multiple remote applications you have to notify other
interested parties about those changes.

Currently databases are mostly used as “pull” components. If they had standard
“push” functionality they could compete with messaging systems with the advantages
of automatic data persistence and powerful query language.

(my response:)
Well, and that’s the problem — the paradigm *has* changed. MySQL is fast and reliable because it does NOT put things like messaging into their database, which Oracle and SQL Server do. A database is not a messaging system, it’s a database.

What effect would notification that there have been changes have on MVCC? I do wish there was a “pull” way to check if the data has changed.

The paradigm change of the application managing the schema causes this. I do not believe messaging is the correct way to handle this problem.

Consider the parallel to source code version control. Much like MVCC, you check out code, change it, and commit the code. Unlike many source code version control systems, though, MVCC (“data version control”) does not have the equivalent of an “update” command, except for doing another pull from the database. It would be great if there was an easy way to do a “diff” of what’s in the database versus what the application is changing, but that seems like it would be a programmatic thing (function or method), not a database thing.

And consider the database overhead and bandwidth….instead of just running queries, MySQL would have to somehow keep track of which thread has what data, and then notify every single thread that has that data, that it’s changed. The applications will have to be written to keep threads open longer, which will consume lots of resources. That’s lots more overhead for the database, and much more bandwidth, because there may be instances of the application that are using data that they do not care if it changed….so the messaging system would be wasting bandwidth, sending messages to instances that do not care. Although that could be mitigated by the application keeping a thread open when it cares about whether or not the data has changed.

Then again, I’m not fond of managed schema in the application…or at least, when the developers write that code. Seems to me it should be the DBA writing that code. It’s *very* useful for data consistency and integrity, which is a function of the DBA, not a developer.

What effects do you see the managed schema having on databases? Who should be responsible for writing a managed schema? Should a managed schema be used for database consistency within an application? Where is the line drawn between the application putting the required information into the database, and the database’s job of maintaining consistency and integrity?

It’s somewhat ironic, since for a long time MySQL advocated using the application to ensure the consistency and integrity (ie, before MySQL had a storage engine with foreign keys and transactions).

I often say that the biggest reason MySQL is a widely used database is because it is fast. A fast database can be complemented by an application that adds the features the database is missing; but a slow database that is full-featured cannot be made faster by an application. So it worries me when folks request very specialized systems such as a messaging server (or ANY “push” system) into the database, because that could be easily done with a “pull” mechanism, only using the bandwidth needed by the instances of the applications that care. Otherwise, it will end up adding Microsoft-level bloat to a really nice and fast program.

October Boston MySQL User Group Topic: Boolean Values and Bit Operators

Boston October MySQL User Group: see full event listings at:

http://mysql.meetup.com/137/calendar/5118339/

Tuesday, Oct. 10th at MIT, free pizza and soda (thanks to MySQL, AB and the MIT community). Please RSVP!!

To RSVP anonymously, please login to the Meetup site with the e-mail address “admin at sheeri dot com” and the password “guest”.

Plenty of free parking (you can park in MIT lots after 3 pm); 1 block south of the Kendall Square T stop.

————–

Most of the September Boston User Group was spent discussing an interesting problem with a large amount of data (5 million records). Basically, this data had about 40 boolean (or small set) fields that needed to be able to be searched against. Folks suggested:

1) Just leaving the table as is and using 1-character values
Pro: simple
Con: Indexes are bad for columns with low selectivity, searching will take a long time due to full table scans

2) Creating a “joining” table for each boolean value
Pro: Indexing for each boolean value can be used
Con: Complex — lots of tables, lots of joins for search

3) Using BIT(1) values or BIT(2) values and matching up booleans
Pro: Simple
Con: Difficult to write the search query, keeping in mind the search terms given below.

The biggest issue is the accuracy of indexes vs. size/amount of tables and joins. The person with the original problem (Chris) and I are doing a joint presentation, with real data on those three cases to figure out which is the best for his situation.

What have other folks done for boolean values? Please be specific about the amount of data, and the performance. Remember that this situation involves a lot of data and a lot of boolean fields, and searching across any or all boolean/small set fields is a core function. As well, fields may be null, and searching may include:

For boolean:
search for 0
search for 1
search for 0 or 1 (any value set)
search for NULL (any value not set)
search for 0 or NULL
search for 1 or NULL

For small sets:
search for ‘a’ (single value match)
search for ‘a’,’b’, and ‘c’ (multiple values will match)
search for ‘any value not null’ (anything not null)
search for ‘any value including null’ (anything null)

Any ideas? I will do some quick research if there’s another option that the September User Group did not come up with.

Question #2: Trigger on One Table To Insert Data into Another

Question #2 from the September MySQL User Group was whether or not a TRIGGER can affect a different table. Apparently the documentation (perhaps for an earlier version??) specified this was not possible. Tom Hanlon, MySQL employee, put up this example (modified from the original, special thanks to Ralph Navarro for copying it down):

Basically, this trigger will insert the current user and timestamp into another table.

mysql> delimiter $$
mysql> CREATE TRIGGER BIcity BEFORE INSERT ON city
-> FOR EACH ROW BEGIN
-> INSERT INTO citytest (name,happened) values (current_user(),now());
-> END;
-> $$
Query OK, 0 rows affected (0.03 sec)

mysql> delimiter ;
mysql> create table citytest (
name varchar(60) not null default '',
happened datetime not null);
Query OK, 0 rows affected (0.19 sec)

mysql> describe city;
+-------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+----------+------+-----+---------+----------------+
| ID | int(11) | NO | PRI | NULL | auto_increment |
| Name | char(35) | NO | | | |
| CountryCode | char(3) | NO | | | |
| District | char(20) | NO | | | |
| Population | int(11) | NO | | 0 | |
+-------------+----------+------+-----+---------+----------------+
5 rows in set (0.01 sec)

mysql> insert into city (Name) values ('tomtown');
Query OK, 1 row affected (0.02 sec)

mysql> select * from citytest\G
*************************** 1. row ***************************
name: root@localhost
happened: 2006-09-07 21:45:14
1 row in set (0.01 sec)

Performance Question #1

I promised to write this up for for the folks who attended the Boston MySQL September User Group meeting, so here’s performance question #1 that was asked:

How can a bulk insert be speeded up?
We discussed disabling keys with

ALTER TABLE tblname DISABLE KEYS;
loading the data
ALTER TABLE tblname ENABLE KEYS;

However, as a post by Frank Mash not too long ago and comments explain, this has no effect on InnoDB tables.

For InnoDB tables, you can load the data in primary key order, which makes the loading much faster. Basically, InnoDB stores the data in primary key order on disk. If there is no primary key specified, the internal engine makes one anyway, so you might as well specify one and take advantage of it.

As well, you can SET UNIQUE CHECKS=0 before the load and SET UNIQUE CHECK=1 after the load if there are unique constraints. The final suggestion is to SET AUTOCOMMIT=0 before the load and SET AUTOCOMMIT=1 after the load, again to speed things up.

FREE MySQL Performance Help, Food and Networking — TOMORROW: Thu Sept. 7 2006 7:00 pm, Cambridge MA

What: Free MySQL help with Tom Hanlon, MySQL employee.
at the Boston MySQL User Group
When: Thursday, September 7, 2006, 7:00 PM
Where: MIT Building E51, Room 372
Wadsworth and Amherst Streets
Cambridge , MA 02117
Cost: Free
Transportation: 1 block from Kendall Square T station;
free parking (MIT does not enforce their lot restrictions in the evenings,
so any signs except handicapped parking can safely be ignored)
RSVP: Free pizza and soda will be served, so please RSVP accurately.
To RSVP anonymously, go to http://www.meetup.com/login/ ,
login with the e-mail address “admin at sheeri dot com”
and the password “guest”, then click on “Add Guests”
and add 1 to the number of guests.

The September Boston MySQL User Group Meeting will feature Tom Hanlon, MySQL employee, answering questions about performance issues (or anything else). Please feel free to bring theoretical questions as well as actual issues you’ve been having.

If you have a specific question, please bring descriptions of all relevant queries, tables, data samples, etc if you have a complex question. (see ********** below for more details)

You may submit a question to awfief@gmail.com or just bring it to the user group meeting.

We will be meeting on MIT campus, close to the Kendall stop on the Red Line (subway). There is also plenty of free parking — you can park in ANY MIT lot after 3 pm, even if it says “parking by permit only”. We are in building E51, room 372.

Here is the URL for the MIT Map with the location of this building:
http://whereis.mit.edu/map-jpg?selection=E51&Buildings=go

This map shows the MBTA Kendall Stop:
http://whereis.mit.edu/map-jpg?selection=L5&Landmarks=go
(the stop is in red on that map, and you can see E51 in the bottom right)

Here are the URL’s for the parking lots:
http://whereis.mit.edu/map-jpg?selection=P4&Parking=go
http://whereis.mit.edu/map-jpg?selection=P5&Parking=go

Free pizza and soda will be served, so please RSVP accurately.

To RSVP anonymously, please login to the Meetup site with the e-mail address “admin at sheeri dot com” and the password “guest”.

More information:

http://mysql.meetup.com/137/events/4976426/

**********
What to bring:
If you are submitting materials, you must submit your materials by noon the day of the user group meeting.
1) Either submit materials to Sheeri at awfief@gmail.com ahead of time or bring them on your laptop and be prepared to connect to the projector (we have the cables, just bring your laptop). Alternatively, you can make overhead projector slides and bring those.

2) Descriptions of relevant tables. Run the following for each table and bring the output:
SHOW CREATE TABLE tbl1\G
SHOW CREATE TABLE tbl2\G

3) Sample data for relevant tables. Run the following for each table and bring the output:
SELECT * FROM tbl1 ORDER BY RAND() LIMIT 5;
SELECT * FROM tbl2 ORDER BY RAND() LIMIT 5;

4) Query descriptions. Run the following for each query and bring the output:
The actual query, ie, “SELECT name FROM addresses WHERE city=’Boston’;”
The EXPLAIN output for the query, ie, EXPLAIN SELECT name FROM addresses WHERE city='Boston';
What you expect to get (data if the issue is inaccurate results, or a time estimate if the issue is slowness)
What you actually get (data if the issue is inaccurate results, or a time estimate if the issue is slowness)

Awkward JDBC API and MySQL User Group Meeting

Life has been super busy, but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.

Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!

We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.

My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.

I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”

I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.

I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.

Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.

So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”

“Yes,” I replied. “Oh, then yes, we’re using it.”

“I don’t think so. If the interface is awkward, you’re not using Connector/J.”

He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/

New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.

Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!

[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]

Jim Starkey Speaks, July Boston MySQL User Group Meeting

Please feel free to forward to interested parties.

Who: Jim Starkey at the Boston MySQL User Group
What: Falcon, the new MySQL storage engine
When:
Monday, July 10, 2006 at 7:00 PM
Where:
MIT Building E51, Room 372
Wadsworth and Amherst Streets
Cambridge, MA 02117
Steps from the Red line, plenty of free parking.

The July Boston MySQL User Group’s topic is Falcon, the new storage engine for MySQL. Creator Jim Starkey will speak. Jim Starkey has been writing database software for 20 years. He created BLOBs, multi-versioning concurrency for relational databases, cascading update triggers, event alerters, and more. Read more about him at http://tinyurl.com/lno4p and http://tinyurl.com/mym7d.

We will be meeting on MIT campus, close to the Kendall stop on the Red Line (subway). There is also plenty of free parking — you can park in ANY MIT lot after 3 pm, even if it says “parking by permit only”. We are in building E51, room 372.

Here is the URL for the MIT Map with the location of this building:
http://whereis.mit.edu/map-jpg?selection=E51&Buildings=go

This map shows the MBTA Kendall Stop:
http://whereis.mit.edu/map-jpg?selection=L5&Landmarks=go
(the stop is in red on that map, and you can see E51 in the bottom right)

Here are the URL’s for the parking lots:
http://whereis.mit.edu/map-jpg?selection=P4&Parking=go
http://whereis.mit.edu/map-jpg?selection=P5&Parking=go

Free pizza and soda will be served, so please RSVP accurately and arrive a few minutes early.

To RSVP anonymously, please login to the Meetup site (http://mysql.meetup.com/137) with the e-mail address “admin at sheeri dot com” and the password of “guest”.

Sponsors welcome, call Sheeri at 857-205-9786 for details.