Populating Images Into a Test Database

Guiseppe Maxia recently posted on Filling Test Tables Quickly. However, he only gave examples for numbers, words and dates. This article will deal with randomly inserting images into a database — particularly useful for testing images stored in the database versus the filesystem. This article was inspired by Guiseppe’s article.

[note: there will be a post soon with my presentation on storing images in the MySQL filesystem, and there will be ample opportunity for discussion then, especially since everyone making a claim will have tested their system, given this code.]

Getting images into the database is not easy. All the tutorials I have read have used a programming language to get images into a database. Using cat and similar unix tools, I have attempted to be able to insert an image into a database using only the mysql client commandline tool. That failed — I would love to know if anyone has any tips on inserting an image into a database without using a programming language.

So, I compromised. I put 20 images into a table as a baseline, and then wrote a stored procedure to suit my needs. Basically, users can have up to 6 images, so I wanted to replicate that possibility in the test db.

This stored procedure takes in 3 integers: maximum # of images per uid, range of random images from the “originals” table, and # of uids to use when populating the table. You will also need a table called “originals” containing the original images. A sample table, with 20 images, can be downloaded from:
http://www.sheeri.net/public/originalimages.sql.gz.

These images were taken from:
http://www.takigen.co.jp/japanese/images/number without explicit permission. I have no idea what Takigen is as I do not read Japanese. I ran a Google Image Search for tiny pictures involving numbers, and found that website that way.

To import it, run at the shell prompt:

> gunzip originalimages.sql.gz
> mysql dbname < originalimages.sql

(explanation below the code)


delimiter $$

drop procedure if exists populate_images $$
create procedure populate_images(max_images INT,num_originals INT,finalcount INT)
not deterministic
modifies sql data
begin
declare uid int default 0;
declare num_images int default 0;
declare image_num int default 0;
declare image_counter int default 0;
set max_images:=max_images+1;
set num_originals:=num_originals+1;
while uid < = finalcount do set image_counter:=0; set num_images:=FLOOR(RAND() * max_images); while image_counter < num_images do set image_num:=FLOOR(RAND() * num_originals); INSERT INTO images (uid,image,uploaded) SELECT uid,image,NOW() from originals where imageNum=image_num; set image_counter:=image_counter+1; end while; set uid:=uid+1; end while; end $$

delimiter ;


Explanation:
declare uid int default 0;
Initialize the counter for the current uid.

declare num_images int default 0;
Initialize the variable for the number of images per user; selected by random with a maximum as given in the 1st argument to the procedure.

declare image_num int default 0;
Initialze the variable that will hold which image to retrieve from the originals table. Selected by random for each image.

declare image_counter int default 0;
Initialize the counter for the image number; reset to 0 for each uid.

set max_images:=max_images+1;
set num_originals:=num_originals+1;

Variables are increased by 1 because I used < in the code instead of <= . I guess one could change the code to reflect this. while uid < = finalcount do
loop through the appropriate # of uids…

set image_counter:=0;
set num_images:=FLOOR(RAND() * max_images);

reset the image counter and get a new # of images for this particular uid, between 0 and max_images. If you want to specify between 1 and max_images, change to 1+FLOOR(RAND() * max_images)

while image_counter < num_images do set image_num:=FLOOR(RAND() * num_originals);
Get a random number between 1 and num_originals (ie, how many images there are to choose from).

INSERT INTO images (uid,image,uploaded) SELECT uid,image,NOW() from originals where imageNum=image_num;
Insert the random image and the uid and time uploaded into the images table.

set image_counter:=image_counter+1;
end while;

increment the image counter — you are finished with this image for this uid

set uid:=uid+1;
end while;
Increment the uid counter -- you are finished with this uid

end $$
Procedure end


Giuseppe’s article talked about randomly filling things quickly. This isn’t as lightning fast as other methods. To wit:

5 uids produced 20 image entries and took 1.14 seconds. (using call populate_images(6,20,5);)
50 uids produced 126 image entries and took 7.37 seconds.
500 uids produced 1416 image entries and took 1 min 26.48 seconds.
5,000 uids produced 14049 image entries and took 14 min 40.68 seconds.
(it scales linearly per image entry, not suprisingly).
So 50,000 uids would take about 2.5 hours, and 500,000 entries would take about a day to randomly fill.

Alternatively, using this procedure to randomly fill 5 uids and then repeating
insert into images (image,uid,uploaded) select image,uid,NOW() from images;

Takes:
5 uids, 1.33 seconds
50 uids, 1.60 seconds
500 uids, 2.03 seconds
5000 uids, 3.08 seconds
50000 uids, 30.45 seconds
500000 uids, 8 min, 47.10 seconds

But that’s only got 5 really random ones, copied a bunch of times. It may not be random enough. How about 1,000 random ones, and then copy them (I didn’t change uids, although I should have, I just didn’t feel like coding it):

call populate_images(6,20,1000);
insert into images (image,uid,uploaded) select image,uid,NOW() from images;

(repeating that last line)

Takes:
1000 uids, 2 min 53.66 seconds
5000 uids, 2 min 54.24 seconds
50000 uids, 3 min 31.91 seconds
500000 uids, 20 min 45.08 seconds

Comments and suggestions welcome.

Comments are closed.