[wp-trac] [WordPress Trac] #11528: sanitize_text_field() issue with UTF-8 characters
WordPress Trac
wp-trac at lists.automattic.com
Tue Jan 5 22:51:05 UTC 2010
#11528: sanitize_text_field() issue with UTF-8 characters
----------------------------+-----------------------------------------------
Reporter: SergeyBiryukov | Owner:
Type: defect (bug) | Status: reopened
Priority: normal | Milestone: 2.9.2
Component: Formatting | Version: 2.9
Severity: major | Resolution:
Keywords: |
----------------------------+-----------------------------------------------
Changes (by hakre):
* status: closed => reopened
* resolution: fixed =>
* milestone: 2.9.1 => 2.9.2
Comment:
To fix this properly, you need to add the UTF8 modifier to preg_replace
otherwise this will ever fail. Don't fix on the wrong end even if you
think the results are pleasing. Instead you should know what is actually
broken and what you do.
About the PCRE-u-Modifier:
'''u (PCRE8)'''[[BR]]
This modifier turns on additional functionality of PCRE that is
incompatible with Perl. Pattern strings are treated as UTF-8. This
modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3
on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.
[http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php Pattern
Modifiers in the PHP Manual]
To check wether or not a string is an UTF8 string, the best you can
(currently) do with WP core code is to use the seems_utf8() function which
only has some deficiencies compared to the UTF8 standard rfc. Propper
functions are suggested in another ticket/patch locate here: #5998 /
[http://core.trac.wordpress.org/attachment/ticket/5998/5998.2.patch
5998.2.patch] (for reference).
Propper check therefore would be to still use the \s character class but
to use the u-modifier for it. preg_replace will set a regex error and
return an empty string (boolean false) in case there is a problem with the
encoding.
Unfourtionatly this patch went into 2.9.1 without a propper review so I
will reopen the ticket and I suggest 2.9.2 as next milestone. This issue
is related to charset / encoding.
--
Ticket URL: <http://core.trac.wordpress.org/ticket/11528#comment:10>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list