[wp-trac] [WordPress Trac] #30130: Normalize characters with combining marks to precomposed characters
WordPress Trac
noreply at wordpress.org
Mon Jan 20 18:25:41 UTC 2020
#30130: Normalize characters with combining marks to precomposed characters
------------------------------------+-----------------------------
Reporter: zodiac1978 | Owner: SergeyBiryukov
Type: enhancement | Status: reviewing
Priority: normal | Milestone: 5.4
Component: Formatting | Version:
Severity: normal | Resolution:
Keywords: dev-feedback has-patch | Focuses:
------------------------------------+-----------------------------
Comment (by a8bit):
I just wanted to throw up a contrary view of this ticket.
I just spent a day fighting with this problem in reverse. Renaming a file
to a string stored in a mysql database that included a precomposed
character (U+0161) caused the OS (macOS) to convert that character to the
compound form (U+0073 U+030C). WordPress than couldn't find the file
because file_exists() was always false. I had to change the string in the
db to the compound form to get it to work.
The Unicode Standard says that
Many compatibility decomposable characters are included in the
Unicode Standard solely to represent distinctions in other base standards.
They support transmission and processing of legacy data. Their use is
discouraged other than for legacy data or other special circumstances.
Apple now enforces that. I could find no way to use U+016 in my file, it
was forced to the compound form even if I entered the hex directly.
MSDN also recommends compound characters, saying that
Pre-composed characters may also be decomposed. For example, an
application importing a text file containing the pre-composed character
"ü" may decompose that character into a "u" followed by the non-spacing
character "¨". This allows easy alphabetical sorting for languages where
character modifiers do not affect alphabetical order. The Unicode standard
defines decomposition for all pre-composed characters.
I haven't checked if Windows forces the decomposition or not but Microsoft
clearly thinks you should decompose wherever possible.
I should also point out that the w3 document linked in the first post of
this issue has been updated since 2014 and the latest version recommends
NFC but admits it's not always appropriate or even available. (see
https://www.w3.org/TR/charmod-norm/#normalizationChoice)
--
Ticket URL: <https://core.trac.wordpress.org/ticket/30130#comment:45>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list