One of the responses to a question I asked yesterday suggested that I should make sure my database can handle UTF-8 characters correctly. How I can do this with MySQL?
Short answer - You should almost always be using the
utf8mb4 charset and
To alter database:
ALTER DATABASE dbname CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Aaron's comment on this answer How to make MySQL handle UTF-8 properly
What's the difference between utf8_general_ci and utf8_unicode_ci
Conversion guide: https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-conversion.html
MySQL 4.1 and above has a default character set of UTF-8. You can verify this in your
my.cnf file, remember to set both client and server (
If you have existing data that you wish to convert to UTF-8, dump your database, and import it back as UTF-8 making sure:
SET NAMES utf8before you query/insert into the database
DEFAULT CHARSET=utf8when creating new tables
- at this point your MySQL client and server should be in UTF-8 (see
my.cnf). remember any languages you use (such as PHP) must be UTF-8 as well. Some versions of PHP will use their own MySQL client library, which may not be UTF-8 aware.
If you do want to migrate existing data remember to backup first! Lots of weird choping of data can happen when things don't go as planned!
- complete UTF-8 migration (cdbaby.com)
- article on UTF-8 readiness of php functions (note some of this information is outdated)
This may be a job for the
In my limited experience with it, it's not 100% reliable when used as a generic "encoding sniffer" - It checks for the presence of certain characters and byte values to make an educated guess - but in this narrow case (it'll need to distinguish just between UTF-8 and ISO-8859-1 ) it should work.
<?php $text = $entity['Entity']['title']; echo 'Original : ', $text."<br />"; $enc = mb_detect_encoding($text, "UTF-8,ISO-8859-1"); echo 'Detected encoding '.$enc."<br />"; echo 'Fixed result: '.iconv($enc, "UTF-8", $text)."<br />"; ?>
you may get incorrect results for strings that do not contain special characters, but that is not a problem.
Alter the table to change the column character set to Latin-1. You will now have singly-encoded UTF-8 strings, but sitting in a field whose collation is supposed to be Latin-1.
What you do then is, change the column character set back to UTF-8 via the binary character set - that way MySQL doesn't convert the characters at any point.
ALTER TABLE MyTable MODIFY MyColumn ... CHARACTER SET latin1 ALTER TABLE MyTable MODIFY MyColumn ... CHARACTER SET binary ALTER TABLE MyTable MODIFY MyColumn ... CHARACTER SET utf8
(is the correct syntax iirc; put the appropriate column type in where
You have to define the connection to your database as UTF-8:
// Set up your connection $connection = mysql_connect('localhost', 'user', 'pw'); mysql_select_db('yourdb', $connection); mysql_query("SET NAMES 'utf8'", $connection); // Now you get UTF-8 encoded stuff $query = sprintf('SELECT name FROM place where id = 1'); $result = mysql_query($query, $connection); $result = mysql_fetch_assoc($result);
I think it might have something to do with collation as well, but I'm not sure. In my case it certainly did, since I had to support cyrillic.
Try this, worked for me:
Set initial collation while creating the target database to
SET NAMES 'utf8' COLLATE 'utf8_general_ci';to the top of your sql file
mysql -u root -p --default-character-set=utf8 yourDB < yourSQLfile.sql
One more thing, in order to properly get the UTF-8 data form your database, you'll have to modify your connection string as well. For example:
Additionally, take a look at what my problem was.