[wp-trac] [WordPress Trac] #35951: remove_accents() doesn't escape Unicode NFD characters

WordPress Trac noreply at wordpress.org
Thu Feb 25 16:18:55 UTC 2016


#35951: remove_accents() doesn't escape Unicode NFD characters
---------------------------+-----------------------------
 Reporter:  onnimonni      |      Owner:
     Type:  defect (bug)   |     Status:  new
 Priority:  normal         |  Milestone:  Awaiting Review
Component:  Charset        |    Version:  4.4.2
 Severity:  normal         |   Keywords:
  Focuses:  accessibility  |
---------------------------+-----------------------------
 OS X filesystem HFS uses unicode '''NFD''' instead of '''NFC'''. This
 causes all sorts of problems when uploaded files with accents are moved
 between environments or if the site is developed in OS X machine and then
 pushed to production.

 I'm trying to solve this problem using remove_accents() function and
 sanitizing all uploaded files. But in my test machine `remove_accents()`
 doesn't do anything for NFD characters.

 It should use something like `Normalizer::normalize()` to avoid this. But
 sadly Normalizer isn't available in all systems.

 I included zip file which contains nfd characters. If you open it in linux
 machine you can see a small difference between the characters and "normal"
 utf-8 accented characters like: '''öäå'''.

 Try to copy the contents and run it through `remove_accents('content')`
 and you can see that nothing is changed.

 If you have Normalizer available you can test that `remove_accent()` if
 characters are first filtered by running Normalizer for example:
 `remove_accents(Normalizer::normalize('content'))`

 I realize this doesn't concern native english speaking countries but it's
 really big annoyance for the rest of us.

--
Ticket URL: <https://core.trac.wordpress.org/ticket/35951>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list