[wp-trac] [WordPress Trac] #32165: wp-db.php destructs all the multibyte characters
WordPress Trac
noreply at wordpress.org
Thu May 7 23:46:50 UTC 2015
#32165: wp-db.php destructs all the multibyte characters
--------------------------+-----------------------
Reporter: kjmtsh | Owner: pento
Type: defect (bug) | Status: reopened
Priority: high | Milestone: 4.2.3
Component: Database | Version: 4.1.2
Severity: blocker | Resolution:
Keywords: | Focuses:
--------------------------+-----------------------
Comment (by kjmtsh):
I think I detected the cause of some malfunction on some specific
environment.
== 1. Integer overflow ==
{{{$wpdb::prepare()}}} method uses {{{vsprintf()}}} function. We are using
this method in {{{strip_invalid_text()}}} to create a SQL statement to
truncate the input data according to the target character fields. When
this method checks wp_posts.post_content data, it sets the length
4294967295 which comes from the fact that the data type of the
post_content field is LONGTEXT.
On 32 bit PHP system, the maximum integer bytes (PHP_INT_MAX) are
2147483647. This causes integer overflow. And, as a result, we get a queer
SQL statement. Example:
{{{
$this->prepare( "CONVERT( LEFT( CONVERT( %s USING binary ), %d ) USING
{$value['charset']} )", $value['value'], $value['length']['length'] );
}}}
This code creats the statement like below for the post_content field
input.
{{{
CONVERT( LEFT( CONVERT( 'Some post content data' USING binary ), -1 )
USING ujis)
}}}
-1 is the same as 0. This statement removes all the content data. It
doesn't happen on the 64 bit PHP implementation, which has the maximum
integer 9223372036854775807.
PHP official document says: if integer value overflows, PHP will change it
to float type. But it doesn't work, at least on my environment (Windows 32
bit and Ubuntu Linux 32 bit). I don't know if all the 32 bit system
behaves the same or not. Anyway, we should change this format placeholder
from %d to %f. I couldn't find in MySQL documents if LEFT() function can
take float number. So I experimented it and made sure it does work fine.
== 2. ujis character problem ==
The ujis database/table/column has another problem. In
{{{strip_invalid_text()}}} method defined in wp-db.php, {{{$value['db']}}}
and {{{$db_check_string}}} variables are set to true. And then we go into
the {{{if ($db_check_string)}}} block. This block causes a few problem.
'''A. SQL statement'''
When we try to write a new post, WordPress tries to make an auto draft
entitled 'auto draft.' Our environment is in UTF-8, and all the WordPress
PHP and Javascript files are encoded in UTF-8. So all of our input data
are encoded in UTF-8, never EUC-JP. I checked SELECT CONVERT etc input on
the commandline. In Japanese, auto draft is translated into '自動下書き'.
{{{
mysql> charset utf8
SELECT HEX( '自動下書き' );
=> E887AAE58B95E4B88BE69BB8E3818D
SELECT CONVERT( '自動下書き' USING binary );
=> OK. We get E887AAE58B95E4B88BE69BB8E3818D as expected.
SELECT LEFT( CONVERT( '自動下書き' USING binary ), 65535 );
=> OK. We still get E887AAE58B95E4B88BE69BB8E3818D as expected.
SELECT CONVERT( LEFT( CONVERT( '自動下書き' USING binary ), 65535 ) USING
ujis );
=> NG. We get nothing unexpectedly.
SELECT CONVERT( LEFT( CONVERT( '自動下書き' USING binary ), 65535 ) USING
utf8 );
=> OK. We get E887AAE58B95E4B88BE69BB8E3818D again as expected.
}}}
This is our environment. WordPress always fails to instantiate an object,
and PHP complains 'Trying to get property of non-object' from all the
subsequent methods. And we couldn't use post new admin page. So we must
not use
{{{
USING {$value['charset']}
}}}
We must use this instead.
{{{
USING {$this->charset}
}}}
For comparison, ujis example.
{{{
mysql> charset ujis
SELECT HEX('自動下書き');
=> BCABC6B0B2BCBDF1A4AD
SELECT CONVERT( LEFT( CONVERT( '自動下書き' USING binary), 65535) USING
ujis);
=> OK. We get BCABC6B0B2BCBDF1A4AD as expected.
SELECT CONVERT( LEFT( CONVERT( '自動下書き' USING binary), 65535) USING
utf8);
=> NG. We get nothing at all.
}}}
This is trivial from the experiment above. BUT this is NOT our
environment.
'''B. Database connection character set'''
As I explained in the above report, we have ujis table/column but the data
transmission is in UTF-8. This is from users' settings, namely
'DB_CHARSET' value in wp-config.php. I think we should not change users'
decision to another without notice. If we don't inform them the change, it
will make the problem detection very difficult or impossible, just like
our case. Even if they ask in the forum reporting their settings precisely
with all the plugins deactivated and default template used, no one ever
knows from their information where the problem is. So the next lines in
{{{strip_invalid_text()}}} method.
{{{
if ( $charset !== $connection_charset ) {
$connection_charset = $charset;
$this->set_charset( $this->dbh, $charset );
}
}}}
Changing connection character set is always wrong unless users explicitly
want to change it. This code block makes the connection from WordPress to
MySQL broken. Nothing returns from MySQL and WordPress fails to make the
object etc...
Is this character set change necessary on some environments? I don't know
the situation that requires this change. My suggestion is this.
We should remove above if block, and at the same time another if block
below, which is required from the above block, and leave the database
connection character set to users' choice according to their server
settings.
{{{
// Don't forget to change the charset back!
if ( $connection_charset !== $this->charset ) {
$this->set_charset( $this->dbh );
}
}}}
----
Number 1 and Number 2 fixes make WordPress work fine on both ujis
environment and on other environments. There's no patch, because fixes are
simple enough. I hope this will help you. Thanks.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/32165#comment:36>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list