[wp-trac] [WordPress Trac] #58517: HTML API: Introduce HTML Processor, a higher-level partner to the Tag Processor
WordPress Trac
noreply at wordpress.org
Fri Jul 21 03:24:38 UTC 2023
#58517: HTML API: Introduce HTML Processor, a higher-level partner to the Tag
Processor
--------------------------------------+------------------------------
Reporter: dmsnell | Owner: Bernhard Reiter
Type: enhancement | Status: closed
Priority: normal | Milestone: 6.4
Component: HTML API | Version:
Severity: normal | Resolution: fixed
Keywords: has-patch has-unit-tests | Focuses:
--------------------------------------+------------------------------
Comment (by dmsnell):
@flixos90 I've tried to explain in the class docblock the difference, but
it's something we can continue to improve.
the short answer is that the Tag Processor lexes individual tokens within
an HTML document, but is completely unaware of the HTML structure and
unaware of the semantic parsing rules, such as the fact that a new `<p>`
implicitly closes any unopened `<p>` found before it.
the HTML Processor thus aspires to eventually provide a reliable DOM-like
interface for operating in a streaming manner just like the Tag Processor,
but with the knowledge of the entire document structure including nesting,
overlapping tags, missing closers, and other kinds of HTML errors.
at this stage we're only introducing the very minimal amount of support
and I want to gradually bring in further support as we test and vet this
code in the Gutenberg plugin. a large part of why this is so minimal is to
make review and testing easier. in the code is the basic structure and
skeleton necessary for fully implementing the HTML5 specification, but in
almost all of the places requiring more complicated coding the processor
aborts and refuses to engage with HTML it may not understand.
at the moment this patch can be viewed as an extension to the Tag
Processor that allows for a more advanced query. whereas the Tag Processor
can search by tag name, the HTML Processor can also search by
"breadcrumbs" which is equivalent to a chain of tag names in a CSS query
separated by the child combinator (>). This is challenging to implement
with the Tag Processor, but the HTML Processor is building the HTML state
machine to make these kinds of queries and operations easy and reliable.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/58517#comment:12>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list