Beware: Default charset for mysqldump is utf8, regardless of server default charset

 

I ran into this issue a while ago, and was reminded of it again recently.  mysqldump uses a default charset of utf8, even when the default charset of the server is set differently.  Why does this matter?

The problem exists more in the fact that if you have string data that is in latin1 format, you are allowed to put in non-Latin characters. This can lead to lost data, especially when upgrading a major series (e.g. 5.0 to 5.1 or 5.1 to 5.5), because you’re supposed to export and import the data.

Also, when importing a backup of an InnoDB table, if there is an error with one of the parts of the INSERT, the whole INSERT statement rolls back.  I have experienced major data loss because the garbled characters cause an error when INSERTed, and it causes perfectly fine data *not* to import because they’re in the same INSERT statement as the garbled data.

For example:

First, set variables such on a MySQL server (5.0 or 5.1, I haven’t tested on 5.5):

mysql> show global variables like ‘%char%’;

+————————–+—————————-+

| Variable_name            | Value                      |

+————————–+—————————-+

| character_set_client     | latin1                     | 

| character_set_connection | latin1                     | 

| character_set_database   | latin1                     | 

| character_set_filesystem | binary                     | 

| character_set_results    | latin1                     | 

| character_set_server     | latin1                     | 

| character_set_system     | utf8                       | 

| character_sets_dir       | /usr/share/mysql/charsets/ | 

+————————–+—————————-+

8 rows in set (0.00 sec)

 

Then create these tables with data:

 

CREATE TABLE `test_utf8` (

  `kwid` int(10) unsigned NOT NULL default ‘0’,

  `keyword` varchar(80) NOT NULL default ”

) ENGINE=InnoDB DEFAULT CHARSET=utf8;

 

INSERT INTO `test_utf8` VALUES

(1,’watching’),(2,’poet’),(3,’просмотра’),(4,’Поэту’);

 

CREATE TABLE `test_latin1` (

  `kwid` int(10) unsigned NOT NULL default ‘0’,

  `keyword` varchar(80) NOT NULL default ”

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

 

INSERT INTO `test_latin1` VALUES

(1,’watching’),(2,’poet’),(3,’просмотра’),(4,’Поэту’);

 

Now compare:

mysqldump test > test_export_utf8.sql

mysqldump –default-character-set=latin1 test > test_export_latin1.sql

 

Note that the test export with the default character set of utf8 has mojibake whereas the export with latin1 does not.

 

So be *extremely* careful when using mysqldump – whether for backups or while upgrading.  You can checksum your data before and after  an export/import with mysqldump to be sure that your data is the same.

 

 

I ran into this issue a while ago, and was reminded of it again recently.  mysqldump uses a default charset of utf8, even when the default charset of the server is set differently.  Why does this matter?

The problem exists more in the fact that if you have string data that is in latin1 format, you are allowed to put in non-Latin characters. This can lead to lost data, especially when upgrading a major series (e.g. 5.0 to 5.1 or 5.1 to 5.5), because you’re supposed to export and import the data.

Also, when importing a backup of an InnoDB table, if there is an error with one of the parts of the INSERT, the whole INSERT statement rolls back.  I have experienced major data loss because the garbled characters cause an error when INSERTed, and it causes perfectly fine data *not* to import because they’re in the same INSERT statement as the garbled data.

For example:

First, set variables such on a MySQL server (5.0 or 5.1, I haven’t tested on 5.5):

mysql> show global variables like ‘%char%’;

+————————–+—————————-+

| Variable_name            | Value                      |

+————————–+—————————-+

| character_set_client     | latin1                     | 

| character_set_connection | latin1                     | 

| character_set_database   | latin1                     | 

| character_set_filesystem | binary                     | 

| character_set_results    | latin1                     | 

| character_set_server     | latin1                     | 

| character_set_system     | utf8                       | 

| character_sets_dir       | /usr/share/mysql/charsets/ | 

+————————–+—————————-+

8 rows in set (0.00 sec)

 

Then create these tables with data:

 

CREATE TABLE `test_utf8` (

  `kwid` int(10) unsigned NOT NULL default ‘0’,

  `keyword` varchar(80) NOT NULL default ”

) ENGINE=InnoDB DEFAULT CHARSET=utf8;

 

INSERT INTO `test_utf8` VALUES

(1,’watching’),(2,’poet’),(3,’просмотра’),(4,’Поэту’);

 

CREATE TABLE `test_latin1` (

  `kwid` int(10) unsigned NOT NULL default ‘0’,

  `keyword` varchar(80) NOT NULL default ”

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

 

INSERT INTO `test_latin1` VALUES

(1,’watching’),(2,’poet’),(3,’просмотра’),(4,’Поэту’);

 

Now compare:

mysqldump test > test_export_utf8.sql

mysqldump –default-character-set=latin1 test > test_export_latin1.sql

 

Note that the test export with the default character set of utf8 has mojibake whereas the export with latin1 does not.

 

So be *extremely* careful when using mysqldump – whether for backups or while upgrading.  You can checksum your data before and after  an export/import with mysqldump to be sure that your data is the same.