[wp-trac] [WordPress Trac] #61009: HTML API: Fix some existing bugs in `kses` comment detection, enable Bits storage. (was: HTML API: Preserve some additional invalid HTML comment syntaxes.)
WordPress Trac
noreply at wordpress.org
Wed May 22 22:17:04 UTC 2024
#61009: HTML API: Fix some existing bugs in `kses` comment detection, enable Bits
storage.
-----------------------------------+------------------------------
Reporter: dmsnell | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: HTML API | Version: trunk
Severity: normal | Resolution:
Keywords: has-patch 2nd-opinion | Focuses:
-----------------------------------+------------------------------
Description changed by dmsnell:
Old description:
> When `wp_kses_split` processes a document it attempts to leave HTML
> comments relatively alone. It makes minor adjustments, but leaves the
> comments in the document in its output.
>
> Unfortunately it only recognizes one kind of HTML comment and rejects
> many other kinds which appear as the result of various invalid HTML
> markup.
>
> This patch makes a minor adjustment to the algorithm in `wp_kses_split`
> to allow two additional kinds of HTML comments:
>
> - HTML comments with the incorrect closer `--!>`.
> - Closing tags with an invalid tag name, e.g. `</%dolly>`.
>
> In an HTML parser these all become comments, and so leaving them in the
> document should be a benign operation, improving the reliability of
> detecting comments in Core. These invalid closing tags, which in a
> browser are interpreted as comments, are one proposal for a placeholder
> mechanism in the HTML API unlocking HTML templating, a new kind of
> shortcode, and more. Having these persist in Core is a requirement for
> exploring and utilizing the new syntax.
New description:
When `wp_kses_split` processes a document it attempts to leave HTML
comments alone. It makes minor adjustments, but leaves the comments in the
document in its output. Unfortunately it only recognizes one kind of HTML
comment and rejects many others.
In HTML there are many kinds of invalid markup which, according to the
specification, are to be interpreted as an HTML comment. These include,
but are not limited to:
- HTML comments with invalid syntax, `<!-->`, `<!-- --!>`, etc…
- HTML closing tags whose tag name is invalid `</3>`, `</%happy>`, etc…
- Things that look like XML CDATA sections, `<![CDATA[…]]>`
- Things that look like XML Processor Instruction nodes, `<?include
"blarg">`
This patch makes a minor adjustment to the algorithm in `wp_kses_split` to
allow two additional kinds of HTML comments:
- HTML comments with the incorrect closer `--!>`, because this one was a
simple and easy change.
- Closing tags with an invalid tag name, e.g. `</%dolly>`j, because these
are required to open up explorations in Gutenberg on Bits, a new iteration
of dynamic tokens for externally-sourced data, or "Shortcodes 2.0"
These invalid closing tags, which in a browser are interpreted as
comments, are one proposal for a placeholder mechanism in the HTML API
unlocking HTML templating, a new kind of shortcode, and more. Having these
persist in Core is a requirement for exploring and utilizing the new
syntax because as long as Core removes them, there's no way to load
content from the database and experiment on the full life cycle of
potential Bits systems.
On its own, however, this represents a kind of bug fix for Core, making
the implementation of `wp_kses_split()` more closely align with its stated
goal of leaving HTML comments as comments. It doesn't attempt to fully fix
the mis-parsed comments (because that is a much deeper issue and involves
many more questions about existing expectations) but it does propose a
couple of hopefully and expectedly minor fixes that hopefully won't break
any existing code or projects.
--
--
Ticket URL: <https://core.trac.wordpress.org/ticket/61009#comment:10>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list