[wp-trac] [WordPress Trac] #56504: `sanitize_html_class()` is both too restrictive, and too permissive so it may return an invalid class name
WordPress Trac
noreply at wordpress.org
Sun Sep 4 15:51:22 UTC 2022
#56504: `sanitize_html_class()` is both too restrictive, and too permissive so it
may return an invalid class name
--------------------------+------------------------------
Reporter: anrghg | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: General | Version:
Severity: normal | Resolution:
Keywords: | Focuses:
--------------------------+------------------------------
Comment (by anrghg):
Replying to [comment:1 joyously]:
> I discovered the leading digit part of this in my testing of my theme,
which adds body classes using page slugs.
> Since my theme has options that are class names, changing the sanitizing
function can cause user input to not match, so any change to this function
needs a prominent Dev Note or some way to be backward compatible.
Backcompat could be ensured by changing the function to:
{{{
function sanitize_html_class(
string $class,
string $fallback = '',
bool $css_compliant = false,
string $prefix = '_',
bool $decode = true
) {
if ( $css_compliant ) {
// Do the right thing.
} else {
// Legacy behavior.
}
return apply_filters( 'sanitize_html_class', $sanitized, $class,
$fallback );
}
}}}
However, with your themes fixing the leading digit issue, users inputting
classes with a leading digit or hyphen-digit only need to know the string
that their classes will be prefixed with.
The proposed function really supports user input. Where unpredictability
comes in awkwardly is when copy-pasting a spec-conformant slug (supporting
all Unicode in the page title) after selecting the slug in the URL bar so
it is plain Unicode.
The legacy `sanitize_html_class()` leaves only bare-bone ASCII, and if
there is no ASCII in the slug e.g. due to being written in a Non-Latin
script, it falls back on an empty string.
Yet another pitfall in the legacy `sanitize_html_class()` is that it may
return **uppercase** ASCII instead of converting it to lowercase while it
is on it.
Uppercase in URL slugs may lead to 404 errors depending on [whether the
server supports it or not](https://stackoverflow.com/questions/7996919
/should-url-be-case-sensitive). Generally it’s so unsafe that A-Z should
be banned from slugs. The Block Editor does so. I came across the issue
while using my plugin and [hinted it in a
comment](https://github.com/ampproject/amp-
wp/issues/7238#issuecomment-1236226182).
Indeed a prominent **Dev Note** should be part of the solution. It should
not only document that `sanitize_html_class()` has been fixed to
*optionally* comply to the CSS spec.
The Dev Note should also promote best practice of adding not only a single
class but **three** classes to the `body` element:
1. `id-1234` to complete the default `postid-1234` OR `page-id-1234` (with
an extra hyphen for readability…);
2. The legacy plain ASCII class but **sanitized** so in non-Latin scripts
there will be no “-1”, “-2”, “-3” class names any longer, either;
3. The CSS conformant plain Unicode class, with only ASCII symbols and
punctuation removed except the hyphen and underscore and the backslash
escaping numeric references plus the space delimiting them! Its really
very tricky.
As a bonus, the legacy not-really-sanitized class could also be added, but
I do not recommend invalid CSS any longer.
It is not up to the developers to complete sanitization after using a
function designed and supposed to do the job.
Nor should CSS be pulled down to bare-bone ASCII, at the expense of Non-
Latin scripts, emoji, and UX.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/56504#comment:2>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list