[wp-trac] [WordPress Trac] #26842: Contenteditable, multiple spaces, &nbsp, and U+00A0

WordPress Trac noreply at wordpress.org
Wed Jan 15 17:03:32 UTC 2014


#26842: Contenteditable, multiple spaces, &nbsp, and U+00A0
-------------------------+-----------------
 Reporter:  azaozz       |      Owner:
     Type:  enhancement  |     Status:  new
 Priority:  normal       |  Milestone:  3.9
Component:  TinyMCE      |    Version:
 Severity:  normal       |   Keywords:
-------------------------+-----------------
 In contenteditable mode when the user types multiple spaces (ASCII char
 32, U+0020) they are preserved. The browsers insert ` ` as every
 other character, the string is `      ` etc.

 In WordPress TinyMCE is set to
 {{{
 'entities' => '38,amp,60,lt,62,gt',
 'entity_encoding' => 'raw',
 }}}

 Anything other than the three basic "htmlspecialchars" `&`, `<` and
 `>` is outputted as UTF-8 when serializing the DOM. This outputs the
 (multiple) ` ` as U+00A0 which in PHP shows as `0xC2
 0xA0`([http://en.wikipedia.org/wiki/Non-breaking_space reference]).

 A problem with `0xC2 0xA0` is that in PHP the regex `\s` matches `0xA0` in
 certain cases, fails to match the "white space", breaks the UTF char, and
 sometimes leaves an `Â` behind. One example is wptexturize(), see #22692.

 Another problem is that the user is not aware there are multiple ` `
 when looking in the Text editor or the html source, as U+00A0 are
 "invisible".

--
Ticket URL: <https://core.trac.wordpress.org/ticket/26842>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list