[wp-trac] [WordPress Trac] #37086: Remove Middle Dot (U+00B7) from URL (for Catalan only?)

WordPress Trac noreply at wordpress.org
Mon Jun 13 11:08:14 UTC 2016


#37086: Remove Middle Dot (U+00B7) from URL (for Catalan only?)
-------------------------------------------------+-------------------------
 Reporter:  xavivars                             |       Owner:
     Type:  defect (bug)                         |      Status:  new
 Priority:  normal                               |   Milestone:  Future
Component:  Formatting                           |  Release
 Severity:  normal                               |     Version:
 Keywords:  needs-refresh has-patch needs-unit-  |  Resolution:
  tests                                          |     Focuses:
-------------------------------------------------+-------------------------
Changes (by ocean90):

 * keywords:  has-patch => needs-refresh has-patch needs-unit-tests
 * milestone:  Awaiting Review => Future Release


Old description:

> Currently, [remove_accents
> https://core.trac.wordpress.org/browser/tags/4.5.2/src/wp-
> includes/formatting.php#L1132] converts all characters to an ASCII
> equivalent so it looks "nice" as a URLs without the need of escaping
> characters (and, thus, showing % as part of the links).
>
> However, the middle dot (U+00B7) is not removed. Middle dot is used in
> Catalan between two L (like this l·l).
>
> Quoting from wikipedia:
> > The flown dot (Catalan: punt volat) is used in Catalan between two Ls
> in cases where each belongs to a separate syllable, for example cel·la,
> "cell". This distinguishes such "geminate Ls" (ela geminada), which are
> pronounced [ɫː], from "double L" (doble ela), which are written without
> the flown dot and are pronounced [ʎ].
>
> On top of non being consistent (all other Catalan diacritics are
> removed), not removing this character has some side-effects, because
> there are some URL libraries that don't take it into account (like the
> one Twitter uses: see
> https://twitter.com/VilaWeb/status/738348674137399296).
>
> My proposal is to remove that char when it appears between two l.

New description:

 Currently, [https://core.trac.wordpress.org/browser/tags/4.5.2/src/wp-
 includes/formatting.php#L1132 remove_accents()] converts all characters to
 an ASCII equivalent so it looks "nice" as a URLs without the need of
 escaping characters (and, thus, showing % as part of the links).

 However, the middle dot (U+00B7) is not removed. Middle dot is used in
 Catalan between two L (like this l·l).

 Quoting from wikipedia:
 > The flown dot (Catalan: punt volat) is used in Catalan between two Ls in
 cases where each belongs to a separate syllable, for example cel·la,
 "cell". This distinguishes such "geminate Ls" (ela geminada), which are
 pronounced [ɫː], from "double L" (doble ela), which are written without
 the flown dot and are pronounced [ʎ].

 On top of non being consistent (all other Catalan diacritics are removed),
 not removing this character has some side-effects, because there are some
 URL libraries that don't take it into account (like the one Twitter uses:
 see https://twitter.com/VilaWeb/status/738348674137399296).

 My proposal is to remove that char when it appears between two l.

--

Comment:

 @xavivars Thanks for your patches. The replacement should only be done for
 Catalan. Removing the dots can maybe handled by
 `sanitize_title_with_dashes()`.

 Can you make sure that the patches are relative to the root directory? And
 there should be a unit test for this change in
 `/tests/phpunit/tests/formatting/RemoveAccents.php`.

--
Ticket URL: <https://core.trac.wordpress.org/ticket/37086#comment:2>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list