Switching a web site to UTF-8
From ShawnReevesWiki
Jump to navigationJump to search
I'd enjoy a web site that had zero issues with characters rendered improperly, so I'm considering converting my web sites to use the UTF-8 character set for storage and presentation.
Steps
- Convert all the existing data in every text field of every table in the database
- This is the toughest part, because data in a field may not match the declared character set of the field itself, so converting might mangle unexpected characters. It is mentioned in the tutorials below that one might convert the contents of a field to the declared character set of that field before converting it to UTF-8. I used the ALTER TABLE command on each table, and when I wasn't sure whether it'd mess it up, I exported the table as an SQL file before and after the conversion, then compared the two using Apple's FileMerge program. Usually there were no differences in the two backups so I knew that data was not corrupted.
ALTER TABLE tablename CONVERT TO CHARSET utf8
- Convert all the html pages to declare UTF-8 as their encoding.
- This is done in the META tag in the header.
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
- Convert all php scripts to use UTF-8 as the default character set.
- Convert all mysql connections and requests to use UTF-8.
- I inserted the code in my connection files, which are shared by all pages using any MySQL connection:
mysqli_set_charset($ETONewsConnect, "utf8");
- Ensure all output from the database is properly treated.
- I usually used htmlentities($row_rsData['whatever'], ENT_COMPAT, 'UTF-8') or nl2br(htmlentities($row_rsData['whatever'], ENT_COMPAT, 'UTF-8')).
- Convert all files on the web server to UTF-8.
- I use Dreamweaver, so I just changed DW's preferences to use UTF-8 as the default, saved files as I made other edits, then uploaded them again.
- OPTIONAL—Use .htaccess to direct httpd to serve all html, htm, and php files as UTF-8 with the AddCharset directive. This didn't work for me because that directive is not allowed on my hosting server.
- Use a validator to check that pages are really being served in UTF-8.
- http://validator.w3.org/i18n-checker/
Existing tutorials
- Converting a MySQL database to UTF-8
- http://www.drzycimski.com/programming/zend-framework/converting-a-mysql-database-to-utf-8/
- Converting Database Character Sets << WordPress Codex
- http://codex.wordpress.org/Converting_Database_Character_Sets
- Making sure .php files are served as UTF-8, and announced properly in the header
- http://stackoverflow.com/questions/4279282/set-http-header-to-utf-8-php
- and
- http://www.w3.org/International/questions/qa-changing-encoding
- and
- http://www.w3.org/International/questions/qa-htaccess-charset