[wp-trac] [WordPress Trac] #58517: HTML API: Introduce HTML Processor, a higher-level partner to the Tag Processor

WordPress Trac noreply at wordpress.org
Mon Jun 12 13:58:37 UTC 2023


#58517: HTML API: Introduce HTML Processor, a higher-level partner to the Tag
Processor
-------------------------+--------------------------------------
 Reporter:  dmsnell      |      Owner:  (none)
     Type:  enhancement  |     Status:  new
 Priority:  normal       |  Milestone:  Awaiting Review
Component:  HTML API     |    Version:
 Severity:  normal       |   Keywords:  has-patch has-unit-tests
  Focuses:               |
-------------------------+--------------------------------------
 This patch introduces the //first// of //many// iterations on the
 evolution of the HTML API, the HTML Processor, which is built in order to
 understand HTML structure including nesting, misnesting, and complicated
 semantic rules.

 In the first iteration, the HTML Processor is arbitrarily limited to a
 minimal subset of functionality so that we can review it, ship it, test
 it, and collect feedback before moving forward. This means that this patch
 is more or less an extension to the Tag Processor query language,
 providing the ability not only to scan for a tag of a given name, but also
 to find an HTML element in a specific nesting path.

 The HTML Processor also aborts any time it encounters:
  - a tag that isn't a `P`, `DIV`, `FIGURE`, `FIGCAPTION`, `IMG`, `STRONG`,
 `B`, `EM`, `I`, `A`, `BIG`, `CODE`, `FONT`, `SMALL`, `STRIKE`, `TT`, or
 `U` tag. this limit exists because many HTML elements require specific
 rules and we are trying to limit the number of rules introduced at once.
 this work is targeted at existing work in places like the image block.
  - certain misnesting constructs that evoke complicated resolution inside
 the HTML spec. where possible and where simple to do reliably, certain
 parse errors are handled. in most cases the HTML Processor aborts.

 The structure of the HTML Processor is established in this patch. Further
 spec-compliance comes through filling out _more of the same_ kind and
 nature of code as is found in this patch. Certain critical HTML algorithms
 are partially supported, and where support requires more than is present,
 the HTML Processor acknowledges this and refuses to operate.

 In this patch are explorations for how to verify that new HTML support is
 fully added (instead of allowing for partial updates that leave some code
 paths non-compliant). Performance is hard to measure since support is so
 limited at the current time, but it should generally follow the
 performance of the Tag Processor somewhat close as the overhead is
 minimized as much as practical.

 A Make post will follow, and the PR in Github will see updates. Target is
 merging into 6.4.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/58517>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list