[wp-trac] [WordPress Trac] #32165: wp-db.php destructs all the multibyte characters

WordPress Trac noreply at wordpress.org
Thu May 7 23:46:50 UTC 2015


#32165: wp-db.php destructs all the multibyte characters
--------------------------+-----------------------
 Reporter:  kjmtsh        |       Owner:  pento
     Type:  defect (bug)  |      Status:  reopened
 Priority:  high          |   Milestone:  4.2.3
Component:  Database      |     Version:  4.1.2
 Severity:  blocker       |  Resolution:
 Keywords:                |     Focuses:
--------------------------+-----------------------

Comment (by kjmtsh):

 I think I detected the cause of some malfunction on some specific
 environment.


 ==  1. Integer overflow ==

 {{{$wpdb::prepare()}}} method uses {{{vsprintf()}}} function. We are using
 this method in {{{strip_invalid_text()}}} to create a SQL statement to
 truncate the input data according to the target character fields. When
 this method checks wp_posts.post_content data, it sets the length
 4294967295 which comes from the fact that the data type of the
 post_content field is LONGTEXT.

 On 32 bit PHP system, the maximum integer bytes (PHP_INT_MAX) are
 2147483647. This causes integer overflow. And, as a result, we get a queer
 SQL statement. Example:

 {{{
 $this->prepare( "CONVERT( LEFT( CONVERT( %s USING binary ), %d ) USING
 {$value['charset']} )", $value['value'], $value['length']['length'] );
 }}}

 This code creats the statement like below for the post_content field
 input.

 {{{
 CONVERT( LEFT( CONVERT( 'Some post content data' USING binary ), -1 )
 USING ujis)
 }}}

 -1 is the same as 0. This statement removes all the content data. It
 doesn't happen on the 64 bit PHP implementation, which has the maximum
 integer 9223372036854775807.

 PHP official document says: if integer value overflows, PHP will change it
 to float type. But it doesn't work, at least on my environment (Windows 32
 bit and Ubuntu Linux 32 bit). I don't know if all the 32 bit system
 behaves the same or not. Anyway, we should change this format placeholder
 from %d to %f. I couldn't find in MySQL documents if LEFT() function can
 take float number. So I experimented it and made sure it does work fine.


 == 2. ujis character problem ==

 The ujis database/table/column has another problem. In
 {{{strip_invalid_text()}}} method defined in wp-db.php, {{{$value['db']}}}
 and {{{$db_check_string}}} variables are set to true. And then we go into
 the {{{if ($db_check_string)}}} block. This block causes a few problem.

 '''A. SQL statement'''

 When we try to write a new post, WordPress tries to make an auto draft
 entitled 'auto draft.' Our environment is in UTF-8, and all the WordPress
 PHP and Javascript files are encoded in UTF-8. So all of our input data
 are encoded in UTF-8, never EUC-JP. I checked SELECT CONVERT etc input on
 the commandline. In Japanese, auto draft is translated into '自動下書き'.

 {{{
 mysql> charset utf8

 SELECT HEX( '自動下書き' );
 => E887AAE58B95E4B88BE69BB8E3818D

 SELECT CONVERT( '自動下書き' USING binary );
 => OK. We get E887AAE58B95E4B88BE69BB8E3818D as expected.

 SELECT LEFT( CONVERT( '自動下書き' USING binary ), 65535 );
 => OK. We still get E887AAE58B95E4B88BE69BB8E3818D as expected.

 SELECT CONVERT( LEFT( CONVERT( '自動下書き' USING binary ), 65535 ) USING
 ujis );
 => NG. We get nothing unexpectedly.

 SELECT CONVERT( LEFT( CONVERT( '自動下書き' USING binary ), 65535 ) USING
 utf8 );
 => OK. We get E887AAE58B95E4B88BE69BB8E3818D again as expected.
 }}}

 This is our environment. WordPress always fails to instantiate an object,
 and PHP complains 'Trying to get property of non-object' from all the
 subsequent methods. And we couldn't use post new admin page. So we must
 not use

 {{{
 USING {$value['charset']}
 }}}

 We must use this instead.

 {{{
 USING {$this->charset}
 }}}

 For comparison, ujis example.

 {{{
 mysql> charset ujis

 SELECT HEX('自動下書き');
 => BCABC6B0B2BCBDF1A4AD

 SELECT CONVERT( LEFT( CONVERT( '自動下書き' USING binary), 65535) USING
 ujis);
 => OK. We get BCABC6B0B2BCBDF1A4AD as expected.

 SELECT CONVERT( LEFT( CONVERT( '自動下書き' USING binary), 65535) USING
 utf8);
 => NG. We get nothing at all.
 }}}

 This is trivial from the experiment above. BUT this is NOT our
 environment.

 '''B. Database connection character set'''

 As I explained in the above report, we have ujis table/column but the data
 transmission is in UTF-8. This is from users' settings, namely
 'DB_CHARSET' value in wp-config.php. I think we should not change users'
 decision to another without notice. If we don't inform them the change, it
 will make the problem detection very difficult or impossible, just like
 our case. Even if they ask in the forum reporting their settings precisely
 with all the plugins deactivated and default template used, no one ever
 knows from their information where the problem is. So the next lines in
 {{{strip_invalid_text()}}} method.

 {{{
 if ( $charset !== $connection_charset ) {
     $connection_charset = $charset;
     $this->set_charset( $this->dbh, $charset );
 }
 }}}

 Changing connection character set is always wrong unless users explicitly
 want to change it. This code block makes the connection from WordPress to
 MySQL broken. Nothing returns from MySQL and WordPress fails to make the
 object etc...

 Is this character set change necessary on some environments? I don't know
 the situation that requires this change. My suggestion is this.

 We should remove above if block, and at the same time another if block
 below, which is required from the above block, and leave the database
 connection character set to users' choice according to their server
 settings.

 {{{
 // Don't forget to change the charset back!
 if ( $connection_charset !== $this->charset ) {
     $this->set_charset( $this->dbh );
 }
 }}}


 ----
 Number 1 and Number 2 fixes make WordPress work fine on both ujis
 environment and on other environments. There's no patch, because fixes are
 simple enough. I hope this will help you. Thanks.

--
Ticket URL: <https://core.trac.wordpress.org/ticket/32165#comment:36>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list