[wp-trac] [WordPress Trac] #11738: sanitize_text_field() issue with UTF-8 characters

WordPress Trac wp-trac at lists.automattic.com
Sun Jan 10 06:26:53 UTC 2010


#11738: sanitize_text_field() issue with UTF-8 characters
--------------------------+-------------------------------------------------
 Reporter:  hakre         |       Owner:  hakre      
     Type:  defect (bug)  |      Status:  new        
 Priority:  normal        |   Milestone:  3.0        
Component:  Charset       |     Version:  2.9.1      
 Severity:  normal        |    Keywords:  needs-patch
--------------------------+-------------------------------------------------

Comment(by hakre):

 [http://www.php.net/manual/en/reference.pcre.pattern.differences.php
 Normally isspace() matches]:

  * space " " / x20 / 32
  * formfeed "\f" / xOC / 12
  * newline "\n" / x0A / 10
  * carriage return "\r" / x0D / 13
  * horizontal tab "\t" / x09 / 9
  * vertical tab "\v" / x0B / 11

 This is the same like on the [http://linux.die.net/man/3/isspace
 isspace() Linux Manpage].

 Within SGML and HTML I could not find a concrete definition of whitspace
 (saying, xA0 in or not). Since users reported problems with double-byte
 chars containing xA0 and the degrade to ascii 7-bit did the fix, we must
 assume that \s w/o the /u modifier conains xA0 as well.

 On a system where that was the case, a test should be run with \s and the
 /u modifier to check wether it helps or not. I did run that on my testbed
 and the /u modifier does solve the problem (test-code posted on the other
 ticket).

 So IMHO the /u modifier is a good to go at least for a first run.

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/11738#comment:13>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list