[wp-trac] [WordPress Trac] #30130: Normalize characters with combining marks to precomposed characters
WordPress Trac
noreply at wordpress.org
Tue Jan 21 05:09:23 UTC 2020
#30130: Normalize characters with combining marks to precomposed characters
------------------------------------+-----------------------------
Reporter: zodiac1978 | Owner: SergeyBiryukov
Type: enhancement | Status: reviewing
Priority: normal | Milestone: 5.4
Component: Formatting | Version:
Severity: normal | Resolution:
Keywords: dev-feedback has-patch | Focuses:
------------------------------------+-----------------------------
Comment (by a8bit):
Replying to [comment:46 zodiac1978]:
> Replying to [comment:45 a8bit]:
>
> That shows IMHO exactly why everything **should be** normalized to NFC.
Because then we have a common ground. macOS is using NFD (decomposed
characters) internally and that's why Safari does normalize files on
upload. But Chrome/Firefox are not doing this. We could wait for the
browsers to fix it or we can fix it in WordPress.
>
IMO it shows that everything **should be** normalized, just not
necessarily to NFC. There is no way Apple is going to adopt NFC, NFC is
described by Unicode as for legacy systems. The future appears to be NFD.
>
> That's correct, because the filesystem itself (HFS+ and APFS for
example) are using NFD and not NFC.
>
This means if all text in WordPress is normalized to NFC any file
comparisons with files on APFS that have multi-byte characters is going to
fail.
I solved my problem today by writing a function to check the existence of
files using both forms, doubling the file io's in the process. Not exactly
optimal.
> Windows doesn't force decomposition and I don't think you should do this
and I can't find your source on MSDN if I google this text. Can you please
share the link, so that I can check the source myself?
It was quoted as a source on the wikipedia page for precomposed characters
http://msdn.microsoft.com/en-us/library/aa911606.aspx
> Agreed, but what would be the alternative? We could check and warn the
user, as this is recommended by the document. But as the module with the
needed function is optional that wouldn't be very reliable:
The alternative would be NFD.
> or we could normalize locale-specific, because the biggest problem seems
to be that other languages may have a problem with normalization:
That would be great if no one ever read a website outside of their own
country
> I think there are not many cases where you will really need NFD text.
The advantages of a working search, working proofreading, etc. are
outweighing any possible edge cases where the NFD text is needed.
They said that about 4-digit years ;)
I could mention that search and sort becomes more flexible with NFD
because you can now choose to do those things with and without the
compound characters, I don't see how proofreading is improved with NFC?
> I am still recommending to get this patch in and then see what breaks
(if something breaks).
I hope it all goes well, I don't have any skin in this game I was merely
flagging up one of the edge cases I actually hit today in case no one had
thought of it. Apple not allowing NFC is going to cause issues for
international macOS users when comparing source and destination data, it
remains to be seen how big of an issue that will be but I accept it's
likely to be quite small.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/30130#comment:47>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list