Images in a database

About six months ago, the question of storing images in a database came up. This is one of my favorite topics, and has many database-agnostic parts.

Personally, I think “tell me about storing images in a database” is actually a great interview question, because you will be able to see the difference between someone who has just memorized “what’s right” versus someone who is really thinking. It also helps you see how someone will communicate — if they just say “NEVER do it, it’s as bad as crossing the streams!” then they are a type of person that gives you a short answer, without much explanation, and without many nuances. (That may be what you are looking for, but usually you want someone who gives reasons for why they strongly feel one way or another).

Consider the following cases:

What about storing lots of small
(say 1Mb or less) images that change frequently (say, dating profile pictures)?

What about large images that change infrequently (say, highly detailed satellite images)?

How important is consistency, and how will data consistency be maintained if images are stored outside the database? How will consistent backups be done?

What other resources (such as a CDN) are available?

What compliance and auditing issues might this client have?

What is the “sweet spot” in terms of size of an image stored in the database, and does that apply to text too? Would it be OK to have a TEXT or LONGTEXT field in the database, but all images are stored outside of the database?

All of these (and more) may swing the pendulum one way or another.

Putting images on a filesystem is the most popular method of doing things. We do have a client that stores “big BLOBby data” in a table (documents, images) — this is mostly for consistency and archiving purposes.

The cons of storing images in a database:
– more db traffic — in, out, and through (where through = replication)
– backups and exports from the db are more of a pain. Data/index files and tablespaces can get quite large. It’s nicer to be able to rsync
files….
– restores are more of a pain too.
– there is no computational stuff really that the db can do (other than, say, length). It cannot sort, compare, etc (well it can but it’s
meaningless). Using separate files makes coding actions such as like making thumbnails much easier.

The pros of storing images in a database:
– you have data integrity and consistency. If you delete a record you can cascade that delete down to the image. You cannot do a JOIN of an OS
and a db.
– databases are really really good at storage and retrieval. It can be argued that filesystems are really really good at storage and retrieval too, but for high activity only certain types of filesystems are, and if you do not have a ReiserFS or MogileFS expert, you may end up with more
than you bargained for. Very likely, you have someone already knowledgeable about databases.

Things that can be pros and cons:

– filesystem vs. database caching. This really depends on the filesystem and the DBMS you’re using. Many folks also use other caching methods for images too…..(webserver/app server/memcached/CDN/whatever)

Here are some references:

Oracle’s whitepaper on The Move to Store Images in a Database
An Article I wrote in Mar 2006, which is still highly relevant. Make sure to read all the comments, including following links like Mike Kruckenberg’s thoughts.

About six months ago, the question of storing images in a database came up. This is one of my favorite topics, and has many database-agnostic parts.

Personally, I think “tell me about storing images in a database” is actually a great interview question, because you will be able to see the difference between someone who has just memorized “what’s right” versus someone who is really thinking. It also helps you see how someone will communicate — if they just say “NEVER do it, it’s as bad as crossing the streams!” then they are a type of person that gives you a short answer, without much explanation, and without many nuances. (That may be what you are looking for, but usually you want someone who gives reasons for why they strongly feel one way or another).

Consider the following cases:

What about storing lots of small
(say 1Mb or less) images that change frequently (say, dating profile pictures)?

What about large images that change infrequently (say, highly detailed satellite images)?

How important is consistency, and how will data consistency be maintained if images are stored outside the database? How will consistent backups be done?

What other resources (such as a CDN) are available?

What compliance and auditing issues might this client have?

What is the “sweet spot” in terms of size of an image stored in the database, and does that apply to text too? Would it be OK to have a TEXT or LONGTEXT field in the database, but all images are stored outside of the database?

All of these (and more) may swing the pendulum one way or another.

Putting images on a filesystem is the most popular method of doing things. We do have a client that stores “big BLOBby data” in a table (documents, images) — this is mostly for consistency and archiving purposes.

The cons of storing images in a database:
– more db traffic — in, out, and through (where through = replication)
– backups and exports from the db are more of a pain. Data/index files and tablespaces can get quite large. It’s nicer to be able to rsync
files….
– restores are more of a pain too.
– there is no computational stuff really that the db can do (other than, say, length). It cannot sort, compare, etc (well it can but it’s
meaningless). Using separate files makes coding actions such as like making thumbnails much easier.

The pros of storing images in a database:
– you have data integrity and consistency. If you delete a record you can cascade that delete down to the image. You cannot do a JOIN of an OS
and a db.
– databases are really really good at storage and retrieval. It can be argued that filesystems are really really good at storage and retrieval too, but for high activity only certain types of filesystems are, and if you do not have a ReiserFS or MogileFS expert, you may end up with more
than you bargained for. Very likely, you have someone already knowledgeable about databases.

Things that can be pros and cons:

– filesystem vs. database caching. This really depends on the filesystem and the DBMS you’re using. Many folks also use other caching methods for images too…..(webserver/app server/memcached/CDN/whatever)

Here are some references:

Oracle’s whitepaper on The Move to Store Images in a Database
An Article I wrote in Mar 2006, which is still highly relevant. Make sure to read all the comments, including following links like Mike Kruckenberg’s thoughts.