[wp-trac] [WordPress Trac] #44793: remove_accents() doesnt escape all versions of "i"

WordPress Trac noreply at wordpress.org
Thu Mar 21 21:28:01 UTC 2019


#44793: remove_accents() doesnt escape all versions of "i"
-------------------------------------------------+-------------------------
 Reporter:  bagosm                               |       Owner:
                                                 |  SergeyBiryukov
     Type:  defect (bug)                         |      Status:  reviewing
 Priority:  normal                               |   Milestone:  5.3
Component:  Formatting                           |     Version:
 Severity:  normal                               |  Resolution:
 Keywords:  has-patch dev-feedback needs-        |     Focuses:
  testing                                        |
-------------------------------------------------+-------------------------

Comment (by xkon):

 Hey there!

 I just noticed this ticket as it was turned into a Future Release (good
 call imho as it might need a lot more discussion as well!). I can handle
 the Greek letters but I need some clarifications first please because I'm
 not sure where exactly `remove_accents()` is needed & used for. I saw in
 core it's within `sanitize_title()` and `sanitize_user()` but it might be
 on various other places or used in more broader scopes as well that I
 can't know at the moment.

 Please bear with me and let me explain my thinking process and "issues"
 before creating a patch and if there's an outcome on what's actually
 needed I'll be more than happy to provide a patch for Greek letters.

 The function itself is called "remove_accents()" that literally means
 removing accents. The description though in our Handbook says `Converts
 all accent characters to ASCII characters.` and this is something totally
 different, in many languages removing an accent and converting to ASCII
 (Latin) is whole different story and changes everything.

 For example:
 **Removing accent ( literally so both are Greek letters ):** ί = ι
 **Converting to ASCII ( so altering the locale ):** ί = i

 Questions:

 **Where would `remove_accents()` be actually used and for what purposes?**
 As for example if it's used to create "slugs" or titles if we only add
 accented characters people will end up with a mixed slug or title by
 having Greek/Latin letters. So most likely in this case a "full" exchange
 of Greek -> Latin would be needed.


 **What is actually needed here?**
 If it's simply changing from an accented Greek letter to a non accented
 one it would be ok (I believe). If it's to remove accents for another
 reason and it's used to  try and convert everything into Latin maybe we
 need to split things up into `remove_accents()` and `convert_to_latin()`
 for example by introducing something new (if something equivalent doesn't
 already exist that I'm not aware of).

 For me this isn't something as simple as it looks as for various Languages
 removing accents (if used to alter actual readable text) ends up on
 changing the meaning of the word as well.

 Thanks!

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/44793#comment:19>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list