[wp-trac] [WordPress Trac] #20368: htmlspecialchars() returns empty string for non-UTF-8 input in PHP 5.4
WordPress Trac
wp-trac at lists.automattic.com
Thu Apr 5 12:43:24 UTC 2012
#20368: htmlspecialchars() returns empty string for non-UTF-8 input in PHP 5.4
--------------------------+-----------------------------
Reporter: convissor | Owner:
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: General | Version:
Severity: major | Keywords:
--------------------------+-----------------------------
The default value of the input `$encoding` parameter for
`htmlspecialchars()` changed to UTF-8 in PHP 5.4. The prior default was
ISO-8859-1. The function's UTF-8 handler checks the input, returning an
empty string if the input isn't valid UTF-8.
WordPress will see the UTF-8 validator kicking because most of the
`htmlspecialchars()` calls don't use the `$encoding` parameter. This will
cause major problems for sites that have a `DB_CHARSET` other than `utf8`.
[http://article.gmane.org/gmane.comp.php.devel/71783 Posting 58859 to php-
internals] by Rasmus gives a clear example of the problem. Here is a link
to [http://thread.gmane.org/gmane.comp.php.devel/71777 view the whole
thread], starting with posting 58853).
Creating two centralized functions is an approach for resolving this
problem. This route is simpler and easier to maintain than adding the
parameters to each `htmlspecialchars()` call throughout the code base.
1. `wp_hsc_db()` for safely displaying database results. Uses
`DB_CHARSET` to calculate the appropriate `$encoding` parameter. MySQL's
character set names are not equivalent to the values PHP is looking for in
the `$encoding` parameter. Please see the `hsc_db()` method in the
[http://plugins.svn.wordpress.org/login-security-solution/trunk/login-
security-solution.php Login Security Solution plugin] for a mapping of the
valid options.
2. `wp_hsc_utf8()` for safely displaying strings known to be saved as
UTF-8, such as error messages written in core. Uses `UTF-8` as the
`$encoding` parameter.
Some calls in core use the `$flags` parameter, so these new functions will
need the parameter too. The default should be `ENT_COMPAT`, which works
under PHP 5.2, 5.3 and 5.4.
It may be suggested that WP use `htmlspecialchar()`'s auto-detection
option (by passing an empty string to the `$encoding` parameter). This is
not advisable because it can produce inconsistent behavior. Even the PHP
manual says this route is not recommended.
--
Ticket URL: <http://core.trac.wordpress.org/ticket/20368>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list