[wp-trac] [WordPress Trac] #22692: Quotes Are Messing Up

WordPress Trac noreply at wordpress.org
Fri Nov 1 20:33:54 UTC 2013


#22692: Quotes Are Messing Up
----------------------------------------+-----------------------------
 Reporter:  miqrogroove                 |       Owner:
     Type:  defect (bug)                |      Status:  new
 Priority:  normal                      |   Milestone:  Future Release
Component:  Formatting                  |     Version:  3.4.2
 Severity:  normal                      |  Resolution:
 Keywords:  has-patch needs-unit-tests  |
----------------------------------------+-----------------------------

Comment (by azaozz):

 > ...my home server is properly referencing the ISO-8859-1 code page (\xA0
 is a space), whereas BlueHost could be using ASCII (\xA0 is invalid).

 Yes, that's the most likely reason. Still couldn't confirm/reproduce it
 though. There are more inconsistencies. looking in the PCRE documentation:
 {{{
 The \s characters are HT (9), LF (10), FF (12), CR (13), and space (32).
 If "use locale;" is included in a Perl script, \s may match the VT charac-
 ter. In PCRE, it never does.
 }}}

 And later on:

 {{{
 By default, in a UTF mode, characters  with  values  greater  than  128
 never  match  \d,  \s,  or  \w,  and always match \D, \S, and \W. These
 sequences retain their original meanings from before  UTF  support  was
 available,  mainly for efficiency reasons. However, if PCRE is compiled
 with Unicode property support, and the PCRE_UCP option is set, the  be-
 haviour  is  changed  so  that Unicode properties are used to determine
 character types, as follows:

  \d  any character that \p{Nd} matches (decimal digit)
  \s  any character that \p{Z} matches, plus HT, LF, FF, CR
  \w  any character that \p{L} or \p{N} matches, plus underscore
 }}}

 So, `\s` will not match U+00A0 if PCRE was compiled without Unicode
 property support. In addition the PCRE_UCP option is set as of PHP 5.3.4,
 so the `u` modifier doesn't change `\s` in earlier versions:
 https://bugs.php.net/bug.php?id=52971.

 In that terms, to fix this particular case we should use `[\s\xA0]` to
 match white space. However to make it fully compatible, this should
 include all white space chars from `\p{Z}` plus \r, \n, \t and \f.

 In addition:
 - In PHP 5.3.4+, when using preg_* functions in UTF mode and PCRE has been
 compiled with Unicode property support, the meaning of  `\d`, `\D`, `\s`,
 `\S`, `\w`, and `\W` changes. They match the Unicode property of each
 char, including a lot more characters and making the regex quite slower.
 - Currently this cannot be used as the lowest required PHP version is
 5.2.4 and some hosts don't have Unicode property support in PCRE.

--
Ticket URL: <http://core.trac.wordpress.org/ticket/22692#comment:39>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list