[wp-trac] [WordPress Trac] #61560: HTML API: Fragment parser should not add context node to stack of open elements.

Wed Jul 3 16:35:42 UTC 2024

#61560: HTML API: Fragment parser should not add context node to stack of open
elements.
--------------------------+--------------------
 Reporter:  dmsnell       |      Owner:  (none)
     Type:  defect (bug)  |     Status:  new
 Priority:  normal        |  Milestone:  6.7
Component:  HTML API      |    Version:  trunk
 Severity:  normal        |   Keywords:
  Focuses:                |
--------------------------+--------------------
 The HTML Processor currently only supports a fragment parser whose context
 node is `<body>`. As it starts supporting more context nodes it's
 important that the fragment interface works as expected, but it's
 currently wrong in that it adds the context node to the stack of open
 elements when it shouldn't.

 The visible impact of this is that breadcrumbs currently report `HTML >
 BODY > …`. It's //likely// that instead it should skip `BODY`, and perhaps
 `HTML` depending on the use-case.

 Where this becomes more important is with other context nodes, for
 example, with a `P` element. When setting inner HTML on a `P` element and
 encountering `</p>`, the fragment parser //should// create an empty `P`
 element child of the `P` (even though this produces an invalid DOM). With
 the current behavior, it would detect the open `P` on the stack of open
 elements and close it, skipping over the expected empty invalid `P`.

 **A fix** should involve removing the context node from the stack of open
 elements, and //it may// adjust how breadcrumbs are reported. But should
 breadcrumbs show the root node (`HTML`) or only show the nodes inside the
 context? There's no direct analogue in JS since access is only available
 once the parse is complete.

 Based on the working operation of the fragment parser, it seems like the
 breadcrumbs for `<body>` context should //start// at the top-level nodes
 within that `BODY` element, //not// at `HTML`. This brings two drawbacks:

  - This would be a breaking change in the HTML Processor (though it's
 unlikely much or any code in practice relies on this behavior, since
 `matches_breadcrumbs()` matches partial breadcrumbs).
  - We lose the easy opportunity to verify the full breadcrumb path in
 `matches_breadcrumbs()` by starting at `HTML`. Now, it would also be
 necessary to confirm the length of the breadcrumbs or add a new
 placeholder to `matches_breadcrumbs()` to indicate the end. e.g.
 `matches_breadcrumbs( array( 'root', 'P' ) )` to match a `P` element at
 the top level.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/61560>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform