[wp-trac] [WordPress Trac] #30130: Normalize characters with combining marks to precomposed characters
WordPress Trac
noreply at wordpress.org
Mon Oct 27 22:36:32 UTC 2014
#30130: Normalize characters with combining marks to precomposed characters
-------------------------+-----------------------------
Reporter: zodiac1978 | Owner:
Type: enhancement | Status: new
Priority: normal | Milestone: Awaiting Review
Component: General | Version: trunk
Severity: normal | Keywords:
Focuses: |
-------------------------+-----------------------------
I ran into a little weird problem which I wanted to solve. And here it is:
I have a PDF file with German Umlauts (üöäÜÖÄ) and if I copy & paste them
into WordPress I get the vowel (uoaUOA) which followed by a diaeresis
(http://www.fileformat.info/info/unicode/char/0308/index.htm) instead of
just one precomposed character.
This results in some problems:
- Search for words with umlauts doesn't work
- Proofreading fails
- W3C validation fails with warning "Text run is not in Unicode
Normalization Form C." because precomposed characters are prefered (See:
http://www.w3.org/International/docs/charmod-norm/#choice-of-
normalization-form)
Solution: I made a proof-of-concept with the "content_save_pre" filter and
it works. In this proof-of-concept I just replaced the two characters with
the precomposed character:
'''$content = str_replace( "a\xCC\x88", "ä", $content );
$content = str_replace( "o\xCC\x88", "ö", $content );
$content = str_replace( "u\xCC\x88", "ü", $content );
$content = str_replace( "A\xCC\x88", "Ä", $content );
$content = str_replace( "O\xCC\x88", "Ö", $content );
$content = str_replace( "U\xCC\x88", "Ü", $content );'''
If we could (I know we can't, because WP is still supporting PHP 5.2) rely
on PHP 5.3+ we could use a function for that:
http://php.net/manual/de/normalizer.normalize.php
So the above code (also used in the upcoming patch) would be just one line
and much more general:
'''$content = normalizer_normalize($content, Normalizer::FORM_C );'''
Fun facts:
The problem is just on Mac OS X (Lion, 10.7.5) for me (on Ubuntu 14.04 or
Win 7 I couldn't reproduce the problem).
Maybe this is an edge case and/or plugin territory.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/30130>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list