[wp-trac] [WordPress Trac] #23907: Scandinavian ligatures transcribed wrong in remove_accents()

WordPress Trac noreply at wordpress.org
Wed Oct 9 23:25:57 UTC 2013


#23907: Scandinavian ligatures transcribed wrong in remove_accents()
---------------------------------+-----------------------------
 Reporter:  dnusim               |       Owner:
     Type:  defect (bug)         |      Status:  new
 Priority:  normal               |   Milestone:  Future Release
Component:  I18N                 |     Version:  3.6
 Severity:  minor                |  Resolution:
 Keywords:  has-patch 3.7-early  |
---------------------------------+-----------------------------

Comment (by knutsp):

 Due to communication problems at WCE, we didn't succeed to have talks
 about this question. :-(

 Currently, in WordPress, `ä` becomes `a`, `æ` becomes `ae`, `ø` and `ö`
 becomes `o` and `å` becomes `a`.

 `ä` is the Swedish equivalent of Danish/Norwegian `æ`, and Swedish `ö` is
 the equivalent of Danish/Norwegian `ø`. `Ø` is an accented/modified `O`,
 where the accent/modifier is "/", `å` is an accented `a` where the special
 accent is a small ring above and `æ` may me seen as having a ligature of
 `ae` as its origin, even if it's an alphabetic character in it's own right
 in Danish/Norwegian. Likewise, `å`, `ä` and `ö` is part of the the Swedish
 alphabet.

 Changing how `ö` transliterates will not only implicate Swedish, but other
 languages, too. If we don't want that changed, then `ø` should not be
 transliterated to anything else either.

 WordPress tradition here is to just remove all accents, as the name of
 function in question, `remove_accents()`, indicates. This is the simplest
 way to make a normalized `post_name` and URL. Hence, the Swedish truly
 accented characters should (still) be treated like that. The corresponding
 Norwegian/Danish characters should transliterate to the same base as the
 corresponding Swedish ones, even if "/" and "small ring above" are not
 always regarded as true accents, or modifiers.

 I have tried to find some documentation on that the official standard
 transliterations (ae, oe, aa) has an advantage for SEO, but I haven't
 found anything that indicates this. Correct me if I'm wrong and please
 point to some documents, research or views, if there is any. I know other
 CMS'es just replace all non-ASCII characters with either nothing, an
 underscore or a hyphen, and that is bad, both for SEO and readability.

 Transliterating with semi-unique character couples (oe, aa) may be useful
 when having to make a ''reverse'' transliteration, and that is the case
 for the use of this standard on things like passenger name on airline
 tickets, or more generally forcing Scandinavian names and words into (old)
 computer systems/software only supporting ASCII. For URLs a correct
 reverse is not that important, I think. And I think readability doesn't
 suffer either.

 So my suggestion for WordPress, and accented latin characters in general,
 is: Just remove any accent or modifier and transliterate to the basic
 character (or characters, in case of a ligature origin).

 This means I quite strongly suggest a '''wontfix''' for this ticket,
 letting `æ` still be transliterated to `ae`, and the other ones to their
 base, as they are now, plain and simple.

--
Ticket URL: <http://core.trac.wordpress.org/ticket/23907#comment:11>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list