[wp-trac] [WordPress Trac] #56656: Move accent removal from `sanitize_title_with_dashes()` to `remove_accents()`
WordPress Trac
noreply at wordpress.org
Mon Sep 26 12:58:17 UTC 2022
#56656: Move accent removal from `sanitize_title_with_dashes()` to
`remove_accents()`
--------------------------+-----------------------------
Reporter: anrghg | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Formatting | Version:
Severity: major | Keywords:
Focuses: |
--------------------------+-----------------------------
When `sanitize_title()` attempts to remove accents, it fails, because
neither means does work as expected:
1. It calls `remove_accents()`, but this only converts a handful symbols
and a set of precomposed Latin letters to Latin base letters.
2. One of the filters (in `default-filters.php`) it applies calls back
`sanitize_title_with_dashes()`, but this only removes five accents for the
matter (plus two spacing acutes). Full list of removed combining
diacritics: U+0301, U+0341, U+0300, U+0304, U+030C.
So, when a title contains a combining tilde, this gets into the slug.
Example:
Title: Eñe [U+0045 U+006E U+0303 U+0065]
Slug: Display: eñe/
Slug: Encoded: en%cc%83e/
This problem was mentioned in #56530.
The proposed solution is to fix `sanitize_title_with_dashes()` as
suggested in #56531, and to fix `remove_accents()` by adding:
{{{#!php
<?php
$string = preg_replace( '/\p{M}/u', '', $string );
}}}
--
Ticket URL: <https://core.trac.wordpress.org/ticket/56656>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list