[wp-trac] [WordPress Trac] #22692: Quotes Are Messing Up
WordPress Trac
noreply at wordpress.org
Fri Nov 1 20:33:54 UTC 2013
#22692: Quotes Are Messing Up
----------------------------------------+-----------------------------
Reporter: miqrogroove | Owner:
Type: defect (bug) | Status: new
Priority: normal | Milestone: Future Release
Component: Formatting | Version: 3.4.2
Severity: normal | Resolution:
Keywords: has-patch needs-unit-tests |
----------------------------------------+-----------------------------
Comment (by azaozz):
> ...my home server is properly referencing the ISO-8859-1 code page (\xA0
is a space), whereas BlueHost could be using ASCII (\xA0 is invalid).
Yes, that's the most likely reason. Still couldn't confirm/reproduce it
though. There are more inconsistencies. looking in the PCRE documentation:
{{{
The \s characters are HT (9), LF (10), FF (12), CR (13), and space (32).
If "use locale;" is included in a Perl script, \s may match the VT charac-
ter. In PCRE, it never does.
}}}
And later on:
{{{
By default, in a UTF mode, characters with values greater than 128
never match \d, \s, or \w, and always match \D, \S, and \W. These
sequences retain their original meanings from before UTF support was
available, mainly for efficiency reasons. However, if PCRE is compiled
with Unicode property support, and the PCRE_UCP option is set, the be-
haviour is changed so that Unicode properties are used to determine
character types, as follows:
\d any character that \p{Nd} matches (decimal digit)
\s any character that \p{Z} matches, plus HT, LF, FF, CR
\w any character that \p{L} or \p{N} matches, plus underscore
}}}
So, `\s` will not match U+00A0 if PCRE was compiled without Unicode
property support. In addition the PCRE_UCP option is set as of PHP 5.3.4,
so the `u` modifier doesn't change `\s` in earlier versions:
https://bugs.php.net/bug.php?id=52971.
In that terms, to fix this particular case we should use `[\s\xA0]` to
match white space. However to make it fully compatible, this should
include all white space chars from `\p{Z}` plus \r, \n, \t and \f.
In addition:
- In PHP 5.3.4+, when using preg_* functions in UTF mode and PCRE has been
compiled with Unicode property support, the meaning of `\d`, `\D`, `\s`,
`\S`, `\w`, and `\W` changes. They match the Unicode property of each
char, including a lot more characters and making the regex quite slower.
- Currently this cannot be used as the lowest required PHP version is
5.2.4 and some hosts don't have Unicode property support in PCRE.
--
Ticket URL: <http://core.trac.wordpress.org/ticket/22692#comment:39>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list