[wp-trac] [WordPress Trac] #30130: Normalize characters with combining marks to precomposed characters

WordPress Trac noreply at wordpress.org
Mon Aug 22 13:58:00 UTC 2016


#30130: Normalize characters with combining marks to precomposed characters
------------------------------------+-----------------------------
 Reporter:  zodiac1978              |       Owner:
     Type:  enhancement             |      Status:  new
 Priority:  normal                  |   Milestone:  Future Release
Component:  Formatting              |     Version:
 Severity:  normal                  |  Resolution:
 Keywords:  dev-feedback has-patch  |     Focuses:
------------------------------------+-----------------------------

Comment (by gitlost):

 The fork mentioned is now available from the WP repository as
 [https://wordpress.org/plugins/unfc-normalize/ UNFC Nörmalizer] (thanks
 anonymous plugin reviewer!).

 As to how or if to normalize input in core, I'm really not sure. Adding
 all those filters still doesn't seem right.

 Also normalization may not always be desirable, eg in the case of CJK
 compatibility ideographs (they get mapped to unified ideographs under both
 NFC and NFD normalization), although it's hard to get a definite read on
 this as the Unicode Consortium seem to suggest it's not a problem - see eg
 [http://unicode.org/faq/han_cjk.html#8 Isn't it true that some Japanese
 can't write their own names in Unicode?], and I suppose it could be made
 locale dependent if necessary.

 Anyway I'd lean towards a javascript only fix, only added for pasting in
 Chrome and Firefox under Mac OS X (and iOS?). These browsers support the
 `normalize()` method so no polyfill would be needed, and (presumably)
 encompass the vast bulk of use cases. I can extract and adapt the code
 used in the plugin as a patch if there's interest.

 Another option - use Safari!

 PS In case people have difficulty replicating this bug, note that the
 paste needs to done in Chrome or Firefox - ironically Safari normalizes
 pastings (and upload filenames, but that's another story) to NFC, while
 the others just take what they're given. The copying from
 [https://core.trac.wordpress.org/attachment/ticket/30130/copy-paste-
 test.pdf copy-paste-test.pdf] can be done from Preview or from Adobe
 Reader, and as noted above (at least on the versions that come with
 Mountain Lion 10.8) Preview (Version 6) decomposes all the umlauted
 characters, while Adobe Reader (Version 11.0.13) decomposes only the
 u-umlaut. Think different.

 PPS I do think the ability to normalize (via the Symfony polyfill) would
 be a worthwhile addition to core (eg for `sanitize_file_name()`,
 `remove_accents()`) and will open a ticket suggesting it.

 PPPS Implementing the plugin threw up a number of issues, eg admin
 javascript in a lot of cases does not check and refresh its data based on
 what comes back from the server, and meta keys aren't sanitized - I'll
 (hopefully) open tickets for each of these.

--
Ticket URL: <https://core.trac.wordpress.org/ticket/30130#comment:34>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list