[wp-trac] [WordPress Trac] #33924: sanitize_html_class valid characters
WordPress Trac
noreply at wordpress.org
Mon Sep 5 17:49:39 UTC 2022
#33924: sanitize_html_class valid characters
-------------------------------------------------+-------------------------
Reporter: m-e-h | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Future
| Release
Component: Formatting | Version: 4.4
Severity: normal | Resolution:
Keywords: has-patch 2nd-opinion has-unit- | Focuses:
tests |
-------------------------------------------------+-------------------------
Comment (by anrghg):
Thank you for supporting Non-Latin scripts so everybody gets the same
opportunity of using the slug as a class.
As @peterwilsoncc wrote in #56504 — my apologies for opening a duplicate —
today:
> Raising the issue of non-latin alphabets is an excellent point.
> I do agree that the function ought to be more permissive for valid
characters
Since page slugs are used as class names, all scripts should be equal:
Latin, Greek, Cyrillic, all 160 (number growing) Non-Latin scripts already
supported by Unicode. (Plus non-ASCII Latin, since for slugs, Latin-script
users can choose between simplified Latin (remove accents) and real
Latin.)
Currently, `sanitize_html_class()` provides security at the expense of
usability, equity, internationalization and localization. By deleting all
non-ASCII characters along with the non-alphanumeric ASCII (except hyphen,
underscore), WordPress is throwing the baby with the bathwater.
That behavior surely breaks WordPress’ internationalization and inclusion
policies.
Test example of added body classes based on a page slug `/χαιρε-εν-αμπ/`
(slashes for clarity, Greek transliteration intentional, quoted from
https://anrghg.sunsite.fr/test-amp-compat/64-characters-%e2%96%b6-css-
allows-all-non-
ascii-%f0%9f%98%91/#1129-id-1164-%cf%87%ce%b1%ce%b9%cf%81%ce%b5-%ce%b5%ce%bd-%ce%b1%ce%bc%cf%80):
* CSS spec conformant or permissive (simplified markup):
{{{
<body class="id-1164 χαιρε-εν-αμπ">
}}}
* Legacy aka strict:
{{{
<body class="id-1164 --">
}}}
These examples are made up to demonstrate levels of usability. In real
life, all three classes are added together, like in the full source at
view-source:https://anrghg.sunsite.fr/test-amp-
compat/%cf%87%ce%b1%ce%b9%cf%81%ce%b5-%ce%b5%ce%bd-%ce%b1%ce%bc%cf%80/
{{{
<body class="page-template-default page page-id-1164 logged-in wp-embed-
responsive id-1164 _-- χαιρε-εν-αμπ">
}}}
On a side note: The double-hyphen class is invalid CSS so it has a
(configurable) underscore prepended as a more intuitive alternative to
escaping the second hyphen in CSS: `.-\2D`. The goal is maximum
intuitivity and usability for users adding Custom CSS and to avoid
screwing things up.
In Ukrainian, the equivalent CSS spec conformant or permissive class is
(courtesy Google Translate):
{{{
<body class="id-1164 ласкаво-просимо-до-amp">
}}}
The derived legacy aka strict class is not very specific:
{{{
<body class="id-1164 ---amp">
}}}
Using the built-in (and screwed-up — it requires picking the right prefix
among `postid-` and `page-id-`) post ID selector is currently still the
only option for Non-Latin users. Using the convenient slug selector is
currently still a privilege of Latin script users.
Thank you to everyone for striving to lift that limitation.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/33924#comment:18>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list