[wp-trac] [WordPress Trac] #23907: Scandinavian ligatures transcribed wrong in remove_accents()
WordPress Trac
noreply at wordpress.org
Wed Oct 9 23:25:57 UTC 2013
#23907: Scandinavian ligatures transcribed wrong in remove_accents()
---------------------------------+-----------------------------
Reporter: dnusim | Owner:
Type: defect (bug) | Status: new
Priority: normal | Milestone: Future Release
Component: I18N | Version: 3.6
Severity: minor | Resolution:
Keywords: has-patch 3.7-early |
---------------------------------+-----------------------------
Comment (by knutsp):
Due to communication problems at WCE, we didn't succeed to have talks
about this question. :-(
Currently, in WordPress, `ä` becomes `a`, `æ` becomes `ae`, `ø` and `ö`
becomes `o` and `å` becomes `a`.
`ä` is the Swedish equivalent of Danish/Norwegian `æ`, and Swedish `ö` is
the equivalent of Danish/Norwegian `ø`. `Ø` is an accented/modified `O`,
where the accent/modifier is "/", `å` is an accented `a` where the special
accent is a small ring above and `æ` may me seen as having a ligature of
`ae` as its origin, even if it's an alphabetic character in it's own right
in Danish/Norwegian. Likewise, `å`, `ä` and `ö` is part of the the Swedish
alphabet.
Changing how `ö` transliterates will not only implicate Swedish, but other
languages, too. If we don't want that changed, then `ø` should not be
transliterated to anything else either.
WordPress tradition here is to just remove all accents, as the name of
function in question, `remove_accents()`, indicates. This is the simplest
way to make a normalized `post_name` and URL. Hence, the Swedish truly
accented characters should (still) be treated like that. The corresponding
Norwegian/Danish characters should transliterate to the same base as the
corresponding Swedish ones, even if "/" and "small ring above" are not
always regarded as true accents, or modifiers.
I have tried to find some documentation on that the official standard
transliterations (ae, oe, aa) has an advantage for SEO, but I haven't
found anything that indicates this. Correct me if I'm wrong and please
point to some documents, research or views, if there is any. I know other
CMS'es just replace all non-ASCII characters with either nothing, an
underscore or a hyphen, and that is bad, both for SEO and readability.
Transliterating with semi-unique character couples (oe, aa) may be useful
when having to make a ''reverse'' transliteration, and that is the case
for the use of this standard on things like passenger name on airline
tickets, or more generally forcing Scandinavian names and words into (old)
computer systems/software only supporting ASCII. For URLs a correct
reverse is not that important, I think. And I think readability doesn't
suffer either.
So my suggestion for WordPress, and accented latin characters in general,
is: Just remove any accent or modifier and transliterate to the basic
character (or characters, in case of a ligature origin).
This means I quite strongly suggest a '''wontfix''' for this ticket,
letting `æ` still be transliterated to `ae`, and the other ones to their
base, as they are now, plain and simple.
--
Ticket URL: <http://core.trac.wordpress.org/ticket/23907#comment:11>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list