[wp-trac] Re: [WordPress Trac] #3517: WordPress should be 100% UTF-8

WordPress Trac wp-trac at lists.automattic.com
Sat Feb 3 17:44:07 GMT 2007


#3517: WordPress should be 100% UTF-8
---------------------+------------------------------------------------------
 Reporter:  sehh     |        Owner:  anonymous
     Type:  defect   |       Status:  reopened 
 Priority:  normal   |    Milestone:  2.2      
Component:  General  |      Version:  2.0.5    
 Severity:  major    |   Resolution:           
 Keywords:  UTF-8    |  
---------------------+------------------------------------------------------
Comment (by tenpura):

 Here I bring up 4 different scenarios. I hope this info will help...
 Case2 is the case ryan mentioned.

 === Case 1: Retrievable ===

 [Configuration][[BR]]
 table/column: latin1 (default)[[BR]]
 MySQL system variables related to client (character_set_client,
 character_set_connection, character_set_result): latin1 (default)[[BR]]
 input: UTF-8

 [Description][[BR]]
 All the related variables and table/column character set have the same
 value (=latin1). In this case MySQL stores input data without conversion
 so that the data in the database can be retrievable. These datas are good
 to go to the UTF-8 table converter.

 === Case 2: Unretrievable ===

 [Configuration][[BR]]
 table/column: latin2[[BR]]
 MySQL system variables related to client: latin1 (default)[[BR]]
 input: UTF-8

 [Description][[BR]]
 Only ascii survives. Multybyte based characters (e.g. UTF-8 Japanese) are
 converted wrongly and destroyed when they are stored. These datas are
 already broken and unretrievable.

 === Case 3: Retrievable by reverse process ===

 [Configuration][[BR]]
 table/column: utf8[[BR]]
 MySQL system variables related to client: latin1 (default)[[BR]]
 input: UTF-8(I have tested with UTF-8 Japanese)

 [Description][[BR]]
 This may be a common misconfiguration. In this case, web browsers can
 display the contents as expected, but in the database, datas are force
 converted and garbled. This type of datas can only be retrieved by reverse
 process, and thus the regular UTF-8 table converter won't help them.

 === Case 4: Case1 with multiple input encodings ===

 [Configuration][[BR]]
 table/column: latin1 (default)[[BR]]
 MySQL system variables related to client: latin1 (default)[[BR]]
 input: UTF-8, EUC-JP (e.g. UTF-8 articles and EUC-JP pingbacks)

 [Description][[BR]]
 Both data can be retrievable, but of course they need to be treated in
 different way. If EUC-JP pingbacks is processed regardlessly with the
 UTF-8 table converter, the data will be lost.

-- 
Ticket URL: <http://trac.wordpress.org/ticket/3517#comment:19>
WordPress Trac <http://trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list