[wp-trac] [WordPress Trac] #30130: Normalize characters with combining marks to precomposed characters
WordPress Trac
noreply at wordpress.org
Mon Aug 22 13:58:00 UTC 2016
#30130: Normalize characters with combining marks to precomposed characters
------------------------------------+-----------------------------
Reporter: zodiac1978 | Owner:
Type: enhancement | Status: new
Priority: normal | Milestone: Future Release
Component: Formatting | Version:
Severity: normal | Resolution:
Keywords: dev-feedback has-patch | Focuses:
------------------------------------+-----------------------------
Comment (by gitlost):
The fork mentioned is now available from the WP repository as
[https://wordpress.org/plugins/unfc-normalize/ UNFC Nörmalizer] (thanks
anonymous plugin reviewer!).
As to how or if to normalize input in core, I'm really not sure. Adding
all those filters still doesn't seem right.
Also normalization may not always be desirable, eg in the case of CJK
compatibility ideographs (they get mapped to unified ideographs under both
NFC and NFD normalization), although it's hard to get a definite read on
this as the Unicode Consortium seem to suggest it's not a problem - see eg
[http://unicode.org/faq/han_cjk.html#8 Isn't it true that some Japanese
can't write their own names in Unicode?], and I suppose it could be made
locale dependent if necessary.
Anyway I'd lean towards a javascript only fix, only added for pasting in
Chrome and Firefox under Mac OS X (and iOS?). These browsers support the
`normalize()` method so no polyfill would be needed, and (presumably)
encompass the vast bulk of use cases. I can extract and adapt the code
used in the plugin as a patch if there's interest.
Another option - use Safari!
PS In case people have difficulty replicating this bug, note that the
paste needs to done in Chrome or Firefox - ironically Safari normalizes
pastings (and upload filenames, but that's another story) to NFC, while
the others just take what they're given. The copying from
[https://core.trac.wordpress.org/attachment/ticket/30130/copy-paste-
test.pdf copy-paste-test.pdf] can be done from Preview or from Adobe
Reader, and as noted above (at least on the versions that come with
Mountain Lion 10.8) Preview (Version 6) decomposes all the umlauted
characters, while Adobe Reader (Version 11.0.13) decomposes only the
u-umlaut. Think different.
PPS I do think the ability to normalize (via the Symfony polyfill) would
be a worthwhile addition to core (eg for `sanitize_file_name()`,
`remove_accents()`) and will open a ticket suggesting it.
PPPS Implementing the plugin threw up a number of issues, eg admin
javascript in a lot of cases does not check and refresh its data based on
what comes back from the server, and meta keys aren't sanitized - I'll
(hopefully) open tickets for each of these.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/30130#comment:34>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list