[wp-trac] [WordPress Trac] #37086: Remove Middle Dot (U+00B7) from URL (for Catalan only?)
WordPress Trac
noreply at wordpress.org
Mon Jun 13 11:08:14 UTC 2016
#37086: Remove Middle Dot (U+00B7) from URL (for Catalan only?)
-------------------------------------------------+-------------------------
Reporter: xavivars | Owner:
Type: defect (bug) | Status: new
Priority: normal | Milestone: Future
Component: Formatting | Release
Severity: normal | Version:
Keywords: needs-refresh has-patch needs-unit- | Resolution:
tests | Focuses:
-------------------------------------------------+-------------------------
Changes (by ocean90):
* keywords: has-patch => needs-refresh has-patch needs-unit-tests
* milestone: Awaiting Review => Future Release
Old description:
> Currently, [remove_accents
> https://core.trac.wordpress.org/browser/tags/4.5.2/src/wp-
> includes/formatting.php#L1132] converts all characters to an ASCII
> equivalent so it looks "nice" as a URLs without the need of escaping
> characters (and, thus, showing % as part of the links).
>
> However, the middle dot (U+00B7) is not removed. Middle dot is used in
> Catalan between two L (like this l·l).
>
> Quoting from wikipedia:
> > The flown dot (Catalan: punt volat) is used in Catalan between two Ls
> in cases where each belongs to a separate syllable, for example cel·la,
> "cell". This distinguishes such "geminate Ls" (ela geminada), which are
> pronounced [ɫː], from "double L" (doble ela), which are written without
> the flown dot and are pronounced [ʎ].
>
> On top of non being consistent (all other Catalan diacritics are
> removed), not removing this character has some side-effects, because
> there are some URL libraries that don't take it into account (like the
> one Twitter uses: see
> https://twitter.com/VilaWeb/status/738348674137399296).
>
> My proposal is to remove that char when it appears between two l.
New description:
Currently, [https://core.trac.wordpress.org/browser/tags/4.5.2/src/wp-
includes/formatting.php#L1132 remove_accents()] converts all characters to
an ASCII equivalent so it looks "nice" as a URLs without the need of
escaping characters (and, thus, showing % as part of the links).
However, the middle dot (U+00B7) is not removed. Middle dot is used in
Catalan between two L (like this l·l).
Quoting from wikipedia:
> The flown dot (Catalan: punt volat) is used in Catalan between two Ls in
cases where each belongs to a separate syllable, for example cel·la,
"cell". This distinguishes such "geminate Ls" (ela geminada), which are
pronounced [ɫː], from "double L" (doble ela), which are written without
the flown dot and are pronounced [ʎ].
On top of non being consistent (all other Catalan diacritics are removed),
not removing this character has some side-effects, because there are some
URL libraries that don't take it into account (like the one Twitter uses:
see https://twitter.com/VilaWeb/status/738348674137399296).
My proposal is to remove that char when it appears between two l.
--
Comment:
@xavivars Thanks for your patches. The replacement should only be done for
Catalan. Removing the dots can maybe handled by
`sanitize_title_with_dashes()`.
Can you make sure that the patches are relative to the root directory? And
there should be a unit test for this change in
`/tests/phpunit/tests/formatting/RemoveAccents.php`.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/37086#comment:2>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list