Switching a web site to UTF-8
From ShawnReevesWiki
I'd enjoy a web site that had zero issues with characters rendered improperly, so I'm considering converting my web sites to use the UTF-8 character set for storage and presentation.
Steps
- Convert all the existing data in every text field of every table in the database. This is the toughest part, because data in a field may not match the declared character set of the field itself, so converting might mangle unexpected characters. It is mentioned in the tutorials below that one might convert the contents of a field to the declared character set of that field before converting it to UTF-8.
- Convert all the html pages to declare UTF-8 as their encoding. This is done in the META tag in the header.
- Convert all php scripts to use UTF-8 as the default character set.
- Convert all mysql connections and requests to use UTF-8.
- Convert all files on the web server to UTF-8
- OPTIONAL—Use .htaccess to direct httpd to serve all html, htm, and php files as UTF-8 with the AddCharset directive.
- Use a validator to check that pages are really being served in UTF-8. http://validator.w3.org/i18n-checker/
Existing tutorials
- Converting a MySQL database to UTF-8
- http://www.drzycimski.com/programming/zend-framework/converting-a-mysql-database-to-utf-8/
- Converting Database Character Sets << WordPress Codex
- http://codex.wordpress.org/Converting_Database_Character_Sets
- Making sure .php files are served as UTF-8, and announced properly in the header
- http://stackoverflow.com/questions/4279282/set-http-header-to-utf-8-php
- and
- http://www.w3.org/International/questions/qa-changing-encoding
- and
- http://www.w3.org/International/questions/qa-htaccess-charset