[wp-trac] Re: [WordPress Trac] #3517: WordPress should be 100% UTF-8
WordPress Trac
wp-trac at lists.automattic.com
Sat Feb 3 17:44:07 GMT 2007
#3517: WordPress should be 100% UTF-8
---------------------+------------------------------------------------------
Reporter: sehh | Owner: anonymous
Type: defect | Status: reopened
Priority: normal | Milestone: 2.2
Component: General | Version: 2.0.5
Severity: major | Resolution:
Keywords: UTF-8 |
---------------------+------------------------------------------------------
Comment (by tenpura):
Here I bring up 4 different scenarios. I hope this info will help...
Case2 is the case ryan mentioned.
=== Case 1: Retrievable ===
[Configuration][[BR]]
table/column: latin1 (default)[[BR]]
MySQL system variables related to client (character_set_client,
character_set_connection, character_set_result): latin1 (default)[[BR]]
input: UTF-8
[Description][[BR]]
All the related variables and table/column character set have the same
value (=latin1). In this case MySQL stores input data without conversion
so that the data in the database can be retrievable. These datas are good
to go to the UTF-8 table converter.
=== Case 2: Unretrievable ===
[Configuration][[BR]]
table/column: latin2[[BR]]
MySQL system variables related to client: latin1 (default)[[BR]]
input: UTF-8
[Description][[BR]]
Only ascii survives. Multybyte based characters (e.g. UTF-8 Japanese) are
converted wrongly and destroyed when they are stored. These datas are
already broken and unretrievable.
=== Case 3: Retrievable by reverse process ===
[Configuration][[BR]]
table/column: utf8[[BR]]
MySQL system variables related to client: latin1 (default)[[BR]]
input: UTF-8(I have tested with UTF-8 Japanese)
[Description][[BR]]
This may be a common misconfiguration. In this case, web browsers can
display the contents as expected, but in the database, datas are force
converted and garbled. This type of datas can only be retrieved by reverse
process, and thus the regular UTF-8 table converter won't help them.
=== Case 4: Case1 with multiple input encodings ===
[Configuration][[BR]]
table/column: latin1 (default)[[BR]]
MySQL system variables related to client: latin1 (default)[[BR]]
input: UTF-8, EUC-JP (e.g. UTF-8 articles and EUC-JP pingbacks)
[Description][[BR]]
Both data can be retrievable, but of course they need to be treated in
different way. If EUC-JP pingbacks is processed regardlessly with the
UTF-8 table converter, the data will be lost.
--
Ticket URL: <http://trac.wordpress.org/ticket/3517#comment:19>
WordPress Trac <http://trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list