[wp-trac] [WordPress Trac] #56656: Move accent removal from `sanitize_title_with_dashes()` to `remove_accents()`

WordPress Trac noreply at wordpress.org
Mon Sep 26 12:58:17 UTC 2022


#56656: Move accent removal from `sanitize_title_with_dashes()` to
`remove_accents()`
--------------------------+-----------------------------
 Reporter:  anrghg        |      Owner:  (none)
     Type:  defect (bug)  |     Status:  new
 Priority:  normal        |  Milestone:  Awaiting Review
Component:  Formatting    |    Version:
 Severity:  major         |   Keywords:
  Focuses:                |
--------------------------+-----------------------------
 When `sanitize_title()` attempts to remove accents, it fails, because
 neither means does work as expected:

 1. It calls `remove_accents()`, but this only converts a handful symbols
 and a set of precomposed Latin letters to Latin base letters.
 2. One of the filters (in `default-filters.php`) it applies calls back
 `sanitize_title_with_dashes()`, but this only removes five accents for the
 matter (plus two spacing acutes). Full list of removed combining
 diacritics: U+0301, U+0341, U+0300, U+0304, U+030C.

 So, when a title contains a combining tilde, this gets into the slug.
 Example:

 Title: Eñe [U+0045 U+006E U+0303 U+0065]
 Slug: Display: eñe/
 Slug: Encoded: en%cc%83e/

 This problem was mentioned in #56530.

 The proposed solution is to fix `sanitize_title_with_dashes()` as
 suggested in #56531, and to fix `remove_accents()` by adding:
 {{{#!php
 <?php
 $string = preg_replace( '/\p{M}/u', '', $string );
 }}}

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/56656>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list