[wp-trac] [WordPress Trac] #62036: HTML API: Introduce normalization methods.

WordPress Trac noreply at wordpress.org
Fri Sep 20 22:30:20 UTC 2024


#62036: HTML API: Introduce normalization methods.
--------------------------------------+----------------------
 Reporter:  dmsnell                   |       Owner:  dmsnell
     Type:  enhancement               |      Status:  closed
 Priority:  normal                    |   Milestone:  6.7
Component:  HTML API                  |     Version:  trunk
 Severity:  normal                    |  Resolution:  fixed
 Keywords:  has-patch has-unit-tests  |     Focuses:
--------------------------------------+----------------------
Changes (by dmsnell):

 * owner:  (none) => dmsnell
 * status:  new => closed
 * resolution:   => fixed


Comment:

 In [changeset:"59076" 59076]:
 {{{
 #!CommitTicketReference repository="" revision="59076"
 HTML API: Add `normalize()` to give us the HTML we always wanted.

 HTML often appears in ways that are unexpected. It may be missing implicit
 tags, may have unquoted, single-quoted, or double-quoted attributes, may
 contain duplicate attributes, may contain unescaped text content, or any
 number of other possible invalid constructions. The HTML API understands
 all fo these inputs, but downline parsers may not, and HTML snippets which
 are safe on their own may introduce problems when joined with other HTML
 snippets.

 This patch introduces the `serialize()` method on the HTML Processor,
 which prints a fully-normative HTML output, eliminating invalid markup
 along the way. It produces a string which contains every missing tag,
 double-quoted attributes, and no duplicates. A `normalize()` static method
 on the HTML Processor provides a convenient wrapper for constructing a
 fragment parser and immediately serializing.

 Subclasses relying on the `serialize_token()` method may perform
 structural HTML modifications with as much security as the upcoming
 `\Dom\HTMLDocument()` parser will, though these are not
 able to provide the full safety that will eventually appear with
 `set_inner_html()`.

 Further work may explore serializing to XML (which involves a number of
 other important transformations) and adding constraints to serialization
 (such as only allowing inline/flow/formatting elements and text).

 Developed in https://github.com/wordpress/wordpress-develop/pull/7331
 Discussed in https://core.trac.wordpress.org/ticket/62036

 Props dmsnell, jonsurrell, westonruter.
 Fixes #62036.
 }}}

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/62036#comment:7>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list