[wp-trac] [WordPress Trac] #11738: sanitize_text_field() issue with UTF-8 characters
WordPress Trac
wp-trac at lists.automattic.com
Sun Jan 10 06:26:53 UTC 2010
#11738: sanitize_text_field() issue with UTF-8 characters
--------------------------+-------------------------------------------------
Reporter: hakre | Owner: hakre
Type: defect (bug) | Status: new
Priority: normal | Milestone: 3.0
Component: Charset | Version: 2.9.1
Severity: normal | Keywords: needs-patch
--------------------------+-------------------------------------------------
Comment(by hakre):
[http://www.php.net/manual/en/reference.pcre.pattern.differences.php
Normally isspace() matches]:
* space " " / x20 / 32
* formfeed "\f" / xOC / 12
* newline "\n" / x0A / 10
* carriage return "\r" / x0D / 13
* horizontal tab "\t" / x09 / 9
* vertical tab "\v" / x0B / 11
This is the same like on the [http://linux.die.net/man/3/isspace
isspace() Linux Manpage].
Within SGML and HTML I could not find a concrete definition of whitspace
(saying, xA0 in or not). Since users reported problems with double-byte
chars containing xA0 and the degrade to ascii 7-bit did the fix, we must
assume that \s w/o the /u modifier conains xA0 as well.
On a system where that was the case, a test should be run with \s and the
/u modifier to check wether it helps or not. I did run that on my testbed
and the /u modifier does solve the problem (test-code posted on the other
ticket).
So IMHO the /u modifier is a good to go at least for a first run.
--
Ticket URL: <http://core.trac.wordpress.org/ticket/11738#comment:13>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list