UTF-8 encoding allows the display of different charsets on the same Web page; using UTF-8, you can write with Latin, Chinese and Arab characters on the same page.
On the other hand, setting up a whole site to work with UTF-8 can be incredibly tedious if you do not know how to proceed. You have to configure the HTML code, PHP, Apache, and the MySQL databases you will need.
Here is a very simple tutorial to convert a whole website to UTF8 within minutes!
Apache
Send the charset in the HTTP headers; you can either add this in the
.htaccess at the root of your website :
AddDefaultCharset
UTF-8
Or include this in every PHP script:
header("Content-type: text/html; charset=UTF-8");
PHP Configuration
You will need to use
multibyte string functions instead of the usual string functions (substr, etc.).
You will have to edit your
php.ini configuration file.
mbstring.language=UTF-8
mbstring.internal_encoding=UTF-8
mbstring.http_input=UTF-8
mbstring.http_output=UTF-8
mbstring.detect_order=
auto
Here are a the multibytes counterparts to the regular string functions:
Regular | Multibyte |
mail | mb_send_mail |
strlen | mb_strlen |
strpos | mb_strpos |
strrpos | mb_strrpos |
substr | mb_substr |
strtolower | mb_strtolower |
strtoupper | mb_strtoupper |
substr_count | mb_substr_count |
ereg | mb_ereg |
eregi | mb_eregi |
ereg_replace | mb_ereg_replace |
eregi_replace | mb_eregi_replace |
split | mb_split |
htmlentities($text) | htmlentities($text, ENT_QUOTES, 'UTF-8') |
htmlspecialchars($text) | htmlspecialchars($text, ENT_QUOTES, 'UTF-8') |
You can use the
mb_convert_encoding and
mb_detect_encoding functions to convert a string to its UTF-8 equivalent, which is very useful when you are using data from external files or parsed HTML pages.
Note that in multibyte regular expression,
\w will match any accuentuated character, which can be very practical for words detection most non-english languages!
MySQL
When you create a database, use the following request:
CREATE DATABASE
foo
CHARACTER SET
utf8
COLLATE
utf8_bin;
By default, a table created in a UTF-8 database will use the UTF-8 charset.
Similarly, text columns will by default inherit from the table they are in.
If you are using
PHPMyAdmin, you can access those properties in the Operations tab corresponding to the MySQL database or table you want to change.
In your PHP code, don't forget this query for every connection to the database:
mysql_query("SET NAMES 'utf8'");
If are using MySQL from the command line, you have to specify the following option:
--default-character-set=utf8
HTML
You can put the following tag in the head section:
<meta
http-equiv="Content-type"
content="text/html; charset=UTF-8"
/>
And don't forget to say form data will be sent using UTF8:
<form
accept-charset="UTF-8">
Do not forget always to save your files in
UTF-8 without BOM in the text editors! This is true for JavaScript, HTML, and PHP files. CSS files are less likely to contain special characters, but consistency never hurts.