Switching a web site to UTF-8: Difference between revisions

From ShawnReevesWiki
Jump to navigationJump to search
No edit summary
No edit summary
Line 2: Line 2:


===Steps===
===Steps===
#Convert all the existing data in every text field of every table in the database. This is the toughest part, because data in a field may not match the declared character set of the field itself, so converting might mangle unexpected characters. It is mentioned in the tutorials below that one might convert the contents of a field to the declared character set of that field before converting it to UTF-8.
;Convert all the existing data in every text field of every table in the database:This is the toughest part, because data in a field may not match the declared character set of the field itself, so converting might mangle unexpected characters. It is mentioned in the tutorials below that one might convert the contents of a field to the declared character set of that field before converting it to UTF-8. I used the ALTER TABLE command on each table, and when I wasn't sure whether it'd mess it up, I exported the table as an SQL file before and after the conversion, then compared the two using Apple's FileMerge program. Usually there were no differences in the two backups so I knew that data was not corrupted.
#Convert all the html pages to declare UTF-8 as their encoding. This is done in the META tag in the header.
ALTER TABLE tablename CONVERT TO CHARSET utf8
#Convert all php scripts to use UTF-8 as the default character set.
;Convert all the html pages to declare UTF-8 as their encoding.:This is done in the META tag in the header.
#Convert all mysql connections and requests to use UTF-8.
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
#Ensure all output from the database is properly treated, usually using htmlentities($row_rsData['whatever'], ENT_COMPAT, 'UTF-8') or nl2br(htmlentities($row_rsData['whatever'], ENT_COMPAT, 'UTF-8')).
;Convert all php scripts to use UTF-8 as the default character set.
#Convert all files on the web server to UTF-8
;Convert all mysql connections and requests to use UTF-8.
#OPTIONAL—Use .htaccess to direct httpd to serve all html, htm, and php files as UTF-8 with the AddCharset directive.
;Ensure all output from the database is properly treated.:I usually used htmlentities($row_rsData['whatever'], ENT_COMPAT, 'UTF-8') or nl2br(htmlentities($row_rsData['whatever'], ENT_COMPAT, 'UTF-8')).
#Use a validator to check that pages are really being served in UTF-8. http://validator.w3.org/i18n-checker/
;Convert all files on the web server to UTF-8.:I use Dreamweaver, so I just changed DW's preferences to use UTF-8 as the default, saved files as I made other edits, then uploaded them again.
;OPTIONAL—Use .htaccess to direct httpd to serve all html, htm, and php files as UTF-8 with the AddCharset directive. This didn't work for me because that directive is not allowed on my hosting server.
;Use a validator to check that pages are really being served in UTF-8.: http://validator.w3.org/i18n-checker/


===Existing tutorials===
===Existing tutorials===

Revision as of 10:41, 24 April 2013

I'd enjoy a web site that had zero issues with characters rendered improperly, so I'm considering converting my web sites to use the UTF-8 character set for storage and presentation.

Steps

Convert all the existing data in every text field of every table in the database
This is the toughest part, because data in a field may not match the declared character set of the field itself, so converting might mangle unexpected characters. It is mentioned in the tutorials below that one might convert the contents of a field to the declared character set of that field before converting it to UTF-8. I used the ALTER TABLE command on each table, and when I wasn't sure whether it'd mess it up, I exported the table as an SQL file before and after the conversion, then compared the two using Apple's FileMerge program. Usually there were no differences in the two backups so I knew that data was not corrupted.
ALTER TABLE tablename CONVERT TO CHARSET utf8 
Convert all the html pages to declare UTF-8 as their encoding.
This is done in the META tag in the header.
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Convert all php scripts to use UTF-8 as the default character set.
Convert all mysql connections and requests to use UTF-8.
Ensure all output from the database is properly treated.
I usually used htmlentities($row_rsData['whatever'], ENT_COMPAT, 'UTF-8') or nl2br(htmlentities($row_rsData['whatever'], ENT_COMPAT, 'UTF-8')).
Convert all files on the web server to UTF-8.
I use Dreamweaver, so I just changed DW's preferences to use UTF-8 as the default, saved files as I made other edits, then uploaded them again.
OPTIONAL—Use .htaccess to direct httpd to serve all html, htm, and php files as UTF-8 with the AddCharset directive. This didn't work for me because that directive is not allowed on my hosting server.
Use a validator to check that pages are really being served in UTF-8.
http://validator.w3.org/i18n-checker/

Existing tutorials

Converting a MySQL database to UTF-8
http://www.drzycimski.com/programming/zend-framework/converting-a-mysql-database-to-utf-8/
Converting Database Character Sets << WordPress Codex
http://codex.wordpress.org/Converting_Database_Character_Sets
Making sure .php files are served as UTF-8, and announced properly in the header
http://stackoverflow.com/questions/4279282/set-http-header-to-utf-8-php
and
http://www.w3.org/International/questions/qa-changing-encoding
and
http://www.w3.org/International/questions/qa-htaccess-charset