[wp-trac] [WordPress Trac] #11528: sanitize_text_field() issue with UTF-8 characters
WordPress Trac
wp-trac at lists.automattic.com
Sun Dec 20 10:32:52 UTC 2009
#11528: sanitize_text_field() issue with UTF-8 characters
----------------------------+-----------------------------------------------
Reporter: SergeyBiryukov | Owner:
Type: defect (bug) | Status: new
Priority: normal | Milestone: Unassigned
Component: Formatting | Version: 2.9
Severity: major | Keywords:
----------------------------+-----------------------------------------------
{{{sanitize_text_field()}}} is the new function in {{{/wp-
includes/formatting.php}}} which sanitizes a string from user input or
from the database.
The following line of the function is not fully compatible with UTF-8:
{{{
$filtered = trim( preg_replace('/\s+/', ' ', $filtered) );
}}}
It creates problems with characters like Р (capital Cyrillic R) which can
be represented as {{{D0 A0}}} (hexadecimal) in ASCII and becomes {{{D0
20}}} after the replacement. To reproduce the issue, one can try to create
a category named оРангутанг or САПР. The rest of the word after Р is not
displayed, the slug is incorrect too. If a title starts with Р, it is not
displayed at all.
The problem was reported on Russian support forums soon after the release.
Currently the filter is included in local files to avoid this replacement,
however I think the issue is relevant to other languages using Cyrillic
alphabet.
--
Ticket URL: <http://core.trac.wordpress.org/ticket/11528>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list