[wp-trac] [WordPress Trac] #62036: HTML API: Introduce normalization methods.
WordPress Trac
noreply at wordpress.org
Fri Sep 20 22:30:20 UTC 2024
#62036: HTML API: Introduce normalization methods.
--------------------------------------+----------------------
Reporter: dmsnell | Owner: dmsnell
Type: enhancement | Status: closed
Priority: normal | Milestone: 6.7
Component: HTML API | Version: trunk
Severity: normal | Resolution: fixed
Keywords: has-patch has-unit-tests | Focuses:
--------------------------------------+----------------------
Changes (by dmsnell):
* owner: (none) => dmsnell
* status: new => closed
* resolution: => fixed
Comment:
In [changeset:"59076" 59076]:
{{{
#!CommitTicketReference repository="" revision="59076"
HTML API: Add `normalize()` to give us the HTML we always wanted.
HTML often appears in ways that are unexpected. It may be missing implicit
tags, may have unquoted, single-quoted, or double-quoted attributes, may
contain duplicate attributes, may contain unescaped text content, or any
number of other possible invalid constructions. The HTML API understands
all fo these inputs, but downline parsers may not, and HTML snippets which
are safe on their own may introduce problems when joined with other HTML
snippets.
This patch introduces the `serialize()` method on the HTML Processor,
which prints a fully-normative HTML output, eliminating invalid markup
along the way. It produces a string which contains every missing tag,
double-quoted attributes, and no duplicates. A `normalize()` static method
on the HTML Processor provides a convenient wrapper for constructing a
fragment parser and immediately serializing.
Subclasses relying on the `serialize_token()` method may perform
structural HTML modifications with as much security as the upcoming
`\Dom\HTMLDocument()` parser will, though these are not
able to provide the full safety that will eventually appear with
`set_inner_html()`.
Further work may explore serializing to XML (which involves a number of
other important transformations) and adding constraints to serialization
(such as only allowing inline/flow/formatting elements and text).
Developed in https://github.com/wordpress/wordpress-develop/pull/7331
Discussed in https://core.trac.wordpress.org/ticket/62036
Props dmsnell, jonsurrell, westonruter.
Fixes #62036.
}}}
--
Ticket URL: <https://core.trac.wordpress.org/ticket/62036#comment:7>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list