[wp-trac] [WordPress Trac] #11528: sanitize_text_field() issue with UTF-8 characters

WordPress Trac wp-trac at lists.automattic.com
Sun Dec 20 10:32:52 UTC 2009


#11528: sanitize_text_field() issue with UTF-8 characters
----------------------------+-----------------------------------------------
 Reporter:  SergeyBiryukov  |       Owner:            
     Type:  defect (bug)    |      Status:  new       
 Priority:  normal          |   Milestone:  Unassigned
Component:  Formatting      |     Version:  2.9       
 Severity:  major           |    Keywords:            
----------------------------+-----------------------------------------------
 {{{sanitize_text_field()}}} is the new function in {{{/wp-
 includes/formatting.php}}} which sanitizes a string from user input or
 from the database.

 The following line of the function is not fully compatible with UTF-8:
 {{{
 $filtered = trim( preg_replace('/\s+/', ' ', $filtered) );
 }}}
 It creates problems with characters like Р (capital Cyrillic R) which can
 be represented as {{{D0 A0}}} (hexadecimal) in ASCII and becomes {{{D0
 20}}} after the replacement. To reproduce the issue, one can try to create
 a category named оРангутанг or САПР. The rest of the word after Р is not
 displayed, the slug is incorrect too. If a title starts with Р, it is not
 displayed at all.

 The problem was reported on Russian support forums soon after the release.
 Currently the filter is included in local files to avoid this replacement,
 however I think the issue is relevant to other languages using Cyrillic
 alphabet.

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/11528>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list