[wp-trac] [WordPress Trac] #11528: sanitize_text_field() issue with UTF-8 characters

WordPress Trac wp-trac at lists.automattic.com
Tue Jan 5 22:51:05 UTC 2010


#11528: sanitize_text_field() issue with UTF-8 characters
----------------------------+-----------------------------------------------
 Reporter:  SergeyBiryukov  |        Owner:          
     Type:  defect (bug)    |       Status:  reopened
 Priority:  normal          |    Milestone:  2.9.2   
Component:  Formatting      |      Version:  2.9     
 Severity:  major           |   Resolution:          
 Keywords:                  |  
----------------------------+-----------------------------------------------
Changes (by hakre):

  * status:  closed => reopened
  * resolution:  fixed =>
  * milestone:  2.9.1 => 2.9.2


Comment:

 To fix this properly, you need to add the UTF8 modifier to preg_replace
 otherwise this will ever fail. Don't fix on the wrong end even if you
 think the results are pleasing. Instead you should know what is actually
 broken and what you do.

 About the PCRE-u-Modifier:
  '''u (PCRE8)'''[[BR]]
  This modifier turns on additional functionality of PCRE that is
 incompatible with Perl. Pattern strings are treated as UTF-8. This
 modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3
 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.

 [http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php Pattern
 Modifiers in the PHP Manual]

 To check wether or not a string is an UTF8 string, the best you can
 (currently) do with WP core code is to use the seems_utf8() function which
 only has some deficiencies compared to the UTF8 standard rfc. Propper
 functions are suggested in another ticket/patch locate here: #5998 /
 [http://core.trac.wordpress.org/attachment/ticket/5998/5998.2.patch
 5998.2.patch] (for reference).

 Propper check therefore would be to still use the \s character class but
 to use the u-modifier for it. preg_replace will set a regex error and
 return an empty string (boolean false) in case there is a problem with the
 encoding.

 Unfourtionatly this patch went into 2.9.1 without a propper review so I
 will reopen the ticket and I suggest 2.9.2 as next milestone. This issue
 is related to charset / encoding.

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/11528#comment:10>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list