[wp-trac] [WordPress Trac] #62270: Unable to set bookmark on </body> in WP_HTML_Processor

WordPress Trac noreply at wordpress.org
Wed Oct 23 19:25:36 UTC 2024


#62270: Unable to set bookmark on </body> in WP_HTML_Processor
----------------------------+------------------------------
 Reporter:  westonruter     |       Owner:  (none)
     Type:  defect (bug)    |      Status:  new
 Priority:  normal          |   Milestone:  Awaiting Review
Component:  HTML API        |     Version:  6.4
 Severity:  normal          |  Resolution:
 Keywords:  has-unit-tests  |     Focuses:
----------------------------+------------------------------

Comment (by jonsurrell):

 > If there is a comment preceding the SCRIPT tag then what? The comment
 would be left after </body> but then the SCRIPT tag and any other
 remaining content would be moved up to the end of the </body>?

 That's exactly right, except that if we're assuming the original HTML
 ended with `</body></html>` then the comment would be outside the HTML
 element.

 -----

 To elaborate with a few examples:

 Let's assume the original HTML ends as expected: `</body></html>`. This
 would put the parser into the wonderfully named
 [https://html.spec.whatwg.org/multipage/parsing.html#the-after-after-body-
 insertion-mode "after after body" insertion mode].

 - A comment token is inserted as the last child of the document element.
 - Whitespace text is inserted as a child of `BODY` (but does NOT change
 the insertion mode).
 - Anything else (that isn't ignored) switches the insertion mode to "in
 body" and is reprocessed.

 We can look at a few cases.

 If the HTML (with appended HTML) looks like this:

 {{{
 </body></html>
         <!-- A comment -->
 Text
 }}}

 [https://software.hixie.ch/utilities/js/live-dom-
 viewer/?%3C%2Fbody%3E%3C%2Fhtml%3E%0A%09%3C!--%20A%20comment%20--%3E%0AText
 We get this:]

 {{{
 HTML
 ├── HEAD
 ├── BODY
 │   ├── (… whatever was originally here …)
 │   └── #text: \n\t\nText\n
 └── #comment: A comment
 }}}

 The BODY element ends with `\n\t\nText\n`. The `Document` (outside of
 `HTML`) ends with the comment. Most instances of getting it wrong look
 something like this where comments would not be children of body ''if''
 nothing has triggered the switch back to "in body" insertion mode.

 However, if the HTML looks something like this:

 {{{
 </body></html>
 <div>We want to append this to <code>BODY</code></div>
         <!-- A comment -->
 Text
 }}}

 [https://software.hixie.ch/utilities/js/live-dom-
 viewer/?%3C%2Fbody%3E%3C%2Fhtml%3E%0A%3Cdiv%3EWe%20want%20to%20append%20this%20to%20%3Ccode%3EBODY%3C%2Fcode%3E%3C%2Fdiv%3E%0A%09%3C!--%20A%20comment%20--%3E%0AText
 This is the result:]

 {{{
 HTML
 ├── HEAD
 └── BODY
     ├── (… whatever was originally here …)
     ├── #text: \n
     ├── DIV
     │   ├─ #text: We want to append this to
     │   └─ CODE
     │      └─ #text: BODY
     ├── #text: \n\t
     ├── #comment: A comment
     └── #text: Text\n
 }}}

 Then the DOM is exactly how we wanted it. Everything is under BODY an in
 the same order. This is because before the comment can appear out of
 place, the DIV caused the insertion mode to switch to "in body." As long
 as there's any non-whitespace text or another element before any comments,
 this should always hold. And I believe comments are the only thing that
 can have experience this problem.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/62270#comment:8>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list