[wp-trac] [WordPress Trac] #58517: HTML API: Introduce HTML Processor, a higher-level partner to the Tag Processor

WordPress Trac noreply at wordpress.org
Fri Jul 21 03:24:38 UTC 2023


#58517: HTML API: Introduce HTML Processor, a higher-level partner to the Tag
Processor
--------------------------------------+------------------------------
 Reporter:  dmsnell                   |       Owner:  Bernhard Reiter
     Type:  enhancement               |      Status:  closed
 Priority:  normal                    |   Milestone:  6.4
Component:  HTML API                  |     Version:
 Severity:  normal                    |  Resolution:  fixed
 Keywords:  has-patch has-unit-tests  |     Focuses:
--------------------------------------+------------------------------

Comment (by dmsnell):

 @flixos90 I've tried to explain in the class docblock the difference, but
 it's something we can continue to improve.

 the short answer is that the Tag Processor lexes individual tokens within
 an HTML document, but is completely unaware of the HTML structure and
 unaware of the semantic parsing rules, such as the fact that a new `<p>`
 implicitly closes any unopened `<p>` found before it.

 the HTML Processor thus aspires to eventually provide a reliable DOM-like
 interface for operating in a streaming manner just like the Tag Processor,
 but with the knowledge of the entire document structure including nesting,
 overlapping tags, missing closers, and other kinds of HTML errors.

 at this stage we're only introducing the very minimal amount of support
 and I want to gradually bring in further support as we test and vet this
 code in the Gutenberg plugin. a large part of why this is so minimal is to
 make review and testing easier. in the code is the basic structure and
 skeleton necessary for fully implementing the HTML5 specification, but in
 almost all of the places requiring more complicated coding the processor
 aborts and refuses to engage with HTML it may not understand.

 at the moment this patch can be viewed as an extension to the Tag
 Processor that allows for a more advanced query. whereas the Tag Processor
 can search by tag name, the HTML Processor can also search by
 "breadcrumbs" which is equivalent to a chain of tag names in a CSS query
 separated by the child combinator (>). This is challenging to implement
 with the Tag Processor, but the HTML Processor is building the HTML state
 machine to make these kinds of queries and operations easy and reliable.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/58517#comment:12>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list