<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[58713] trunk: HTML API: Simplify breadcrumb accounting.</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { white-space: pre-line; overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta" style="font-size: 105%">
<dt style="float: left; width: 6em; font-weight: bold">Revision</dt> <dd><a style="font-weight: bold" href="https://core.trac.wordpress.org/changeset/58713">58713</a><script type="application/ld+json">{"@context":"http://schema.org","@type":"EmailMessage","description":"Review this Commit","action":{"@type":"ViewAction","url":"https://core.trac.wordpress.org/changeset/58713","name":"Review Commit"}}</script></dd>
<dt style="float: left; width: 6em; font-weight: bold">Author</dt> <dd>dmsnell</dd>
<dt style="float: left; width: 6em; font-weight: bold">Date</dt> <dd>2024-07-12 22:18:16 +0000 (Fri, 12 Jul 2024)</dd>
</dl>

<pre style='padding-left: 1em; margin: 2em 0; border-left: 2px solid #ccc; line-height: 1.25; font-size: 105%; font-family: sans-serif'>HTML API: Simplify breadcrumb accounting.

Since the HTML Processor started visiting all nodes in a document, both
real and virtual, the breadcrumb accounting became a bit complicated
and it's not entirely clear that it is fully reliable.

In this patch the breadcrumbs are rebuilt separately from the stack of
open elements in order to eliminate the problem of the stateful stack
interactions and the post-hoc event queue.

Breadcrumbs are greatly simplified as a result, and more verifiably
correct, in this construction.

Developed in https://github.com/WordPress/wordpress-develop/pull/6981
Discussed in https://core.trac.wordpress.org/ticket/61576

Follow-up to <a href="https://core.trac.wordpress.org/changeset/58590">[58590]</a>.

Props bernhard-reiter, dmsnell.
See <a href="https://core.trac.wordpress.org/ticket/61576">#61576</a>.</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunksrcwpincludeshtmlapiclasswphtmlprocessorphp">trunk/src/wp-includes/html-api/class-wp-html-processor.php</a></li>
<li><a href="#trunktestsphpunittestshtmlapiwpHtmlProcessorSemanticRulesphp">trunk/tests/phpunit/tests/html-api/wpHtmlProcessorSemanticRules.php</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunksrcwpincludeshtmlapiclasswphtmlprocessorphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/src/wp-includes/html-api/class-wp-html-processor.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/src/wp-includes/html-api/class-wp-html-processor.php        2024-07-12 21:58:20 UTC (rev 58712)
+++ trunk/src/wp-includes/html-api/class-wp-html-processor.php  2024-07-12 22:18:16 UTC (rev 58713)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -212,6 +212,15 @@
</span><span class="cx" style="display: block; padding: 0 10px">        private $element_queue = array();
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">        /**
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         * Stores the current breadcrumbs.
+        *
+        * @since 6.7.0
+        *
+        * @var string[]
+        */
+       private $breadcrumbs = array();
+
+       /**
</ins><span class="cx" style="display: block; padding: 0 10px">          * Current stack event, if set, representing a matched token.
</span><span class="cx" style="display: block; padding: 0 10px">         *
</span><span class="cx" style="display: block; padding: 0 10px">         * Because the parser may internally point to a place further along in a document
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -310,8 +319,8 @@
</span><span class="cx" style="display: block; padding: 0 10px">                        false
</span><span class="cx" style="display: block; padding: 0 10px">                );
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                $processor->state->stack_of_open_elements->push( $context_node );
</del><span class="cx" style="display: block; padding: 0 10px">                 $processor->context_node = $context_node;
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                $processor->breadcrumbs  = array( 'HTML', $context_node->node_name );
</ins><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                return $processor;
</span><span class="cx" style="display: block; padding: 0 10px">        }
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -523,44 +532,46 @@
</span><span class="cx" style="display: block; padding: 0 10px">                        return false;
</span><span class="cx" style="display: block; padding: 0 10px">                }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                if ( 'done' !== $this->has_seen_context_node && 0 === count( $this->element_queue ) && ! $this->step() ) {
-                       while ( 'context-node' !== $this->state->stack_of_open_elements->current_node()->bookmark_name && $this->state->stack_of_open_elements->pop() ) {
-                               continue;
-                       }
-                       $this->has_seen_context_node = 'done';
-                       return $this->next_token();
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         /*
+                * Prime the events if there are none.
+                *
+                * @todo In some cases, probably related to the adoption agency
+                *       algorithm, this call to step() doesn't create any new
+                *       events. Calling it again creates them. Figure out why
+                *       this is and if it's inherent or if it's a bug. Looping
+                *       until there are events or until there are no more
+                *       tokens works in the meantime and isn't obviously wrong.
+                */
+               while ( empty( $this->element_queue ) && $this->step() ) {
+                       continue;
</ins><span class="cx" style="display: block; padding: 0 10px">                 }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                // Process the next event on the queue.
</ins><span class="cx" style="display: block; padding: 0 10px">                 $this->current_element = array_shift( $this->element_queue );
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                while ( isset( $this->context_node ) && ! $this->has_seen_context_node ) {
-                       if ( isset( $this->current_element ) ) {
-                               if ( $this->context_node === $this->current_element->token && WP_HTML_Stack_Event::PUSH === $this->current_element->operation ) {
-                                       $this->has_seen_context_node = true;
-                                       return $this->next_token();
-                               }
-                       }
-                       $this->current_element = array_shift( $this->element_queue );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         if ( ! isset( $this->current_element ) ) {
+                       return false;
</ins><span class="cx" style="display: block; padding: 0 10px">                 }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                if ( ! isset( $this->current_element ) ) {
-                       if ( 'done' === $this->has_seen_context_node ) {
-                               return false;
-                       } else {
-                               return $this->next_token();
-                       }
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         $is_pop = WP_HTML_Stack_Event::POP === $this->current_element->operation;
+
+               /*
+                * The root node only exists in the fragment parser, and closing it
+                * indicates that the parse is complete. Stop before popping if from
+                * the breadcrumbs.
+                */
+               if ( 'root-node' === $this->current_element->token->bookmark_name ) {
+                       return ! $is_pop && $this->next_token();
</ins><span class="cx" style="display: block; padding: 0 10px">                 }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                if ( isset( $this->context_node ) && WP_HTML_Stack_Event::POP === $this->current_element->operation && $this->context_node === $this->current_element->token ) {
-                       $this->element_queue   = array();
-                       $this->current_element = null;
-                       return false;
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         // Adjust the breadcrumbs for this event.
+               if ( $is_pop ) {
+                       array_pop( $this->breadcrumbs );
+               } else {
+                       $this->breadcrumbs[] = $this->current_element->token->node_name;
</ins><span class="cx" style="display: block; padding: 0 10px">                 }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                // Avoid sending close events for elements which don't expect a closing.
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                if (
-                       WP_HTML_Stack_Event::POP === $this->current_element->operation &&
-                       ! static::expects_closer( $this->current_element->token )
-               ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         if ( $is_pop && ! static::expects_closer( $this->current_element->token ) ) {
</ins><span class="cx" style="display: block; padding: 0 10px">                         return $this->next_token();
</span><span class="cx" style="display: block; padding: 0 10px">                }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -643,10 +654,11 @@
</span><span class="cx" style="display: block; padding: 0 10px">                        return false;
</span><span class="cx" style="display: block; padding: 0 10px">                }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                foreach ( $this->state->stack_of_open_elements->walk_up() as $node ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         for ( $i = count( $this->breadcrumbs ) - 1; $i >= 0; $i-- ) {
+                       $node  = $this->breadcrumbs[ $i ];
</ins><span class="cx" style="display: block; padding: 0 10px">                         $crumb = strtoupper( current( $breadcrumbs ) );
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                        if ( '*' !== $crumb && $node->node_name !== $crumb ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                 if ( '*' !== $crumb && $node !== $crumb ) {
</ins><span class="cx" style="display: block; padding: 0 10px">                                 return false;
</span><span class="cx" style="display: block; padding: 0 10px">                        }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -862,46 +874,7 @@
</span><span class="cx" style="display: block; padding: 0 10px">         * @return string[]|null Array of tag names representing path to matched node, if matched, otherwise NULL.
</span><span class="cx" style="display: block; padding: 0 10px">         */
</span><span class="cx" style="display: block; padding: 0 10px">        public function get_breadcrumbs() {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                $breadcrumbs = array();
-
-               foreach ( $this->state->stack_of_open_elements->walk_down() as $stack_item ) {
-                       $breadcrumbs[] = $stack_item->node_name;
-               }
-
-               if ( ! $this->is_virtual() ) {
-                       return $breadcrumbs;
-               }
-
-               foreach ( $this->element_queue as $queue_item ) {
-                       if ( $this->current_element->token->bookmark_name === $queue_item->token->bookmark_name ) {
-                               break;
-                       }
-
-                       if ( 'context-node' === $queue_item->token->bookmark_name ) {
-                               break;
-                       }
-
-                       if ( 'real' === $queue_item->provenance ) {
-                               break;
-                       }
-
-                       if ( WP_HTML_Stack_Event::PUSH === $queue_item->operation ) {
-                               $breadcrumbs[] = $queue_item->token->node_name;
-                       } else {
-                               array_pop( $breadcrumbs );
-                       }
-               }
-
-               if ( null !== parent::get_token_name() && ! parent::is_tag_closer() ) {
-                       array_pop( $breadcrumbs );
-               }
-
-               // Add the virtual node we're at.
-               if ( WP_HTML_Stack_Event::PUSH === $this->current_element->operation ) {
-                       $breadcrumbs[] = $this->current_element->token->node_name;
-               }
-
-               return $breadcrumbs;
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         return $this->breadcrumbs;
</ins><span class="cx" style="display: block; padding: 0 10px">         }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">        /**
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -930,9 +903,7 @@
</span><span class="cx" style="display: block; padding: 0 10px">         * @return int Nesting-depth of current location in the document.
</span><span class="cx" style="display: block; padding: 0 10px">         */
</span><span class="cx" style="display: block; padding: 0 10px">        public function get_current_depth() {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                return $this->is_virtual()
-                       ? count( $this->get_breadcrumbs() )
-                       : $this->state->stack_of_open_elements->count();
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         return count( $this->breadcrumbs );
</ins><span class="cx" style="display: block; padding: 0 10px">         }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">        /**
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -2552,7 +2523,6 @@
</span><span class="cx" style="display: block; padding: 0 10px">                        ? $this->bookmarks[ $this->state->current_token->bookmark_name ]->start
</span><span class="cx" style="display: block; padding: 0 10px">                        : 0;
</span><span class="cx" style="display: block; padding: 0 10px">                $bookmark_starts_at   = $this->bookmarks[ $actual_bookmark_name ]->start;
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                $bookmark_length      = $this->bookmarks[ $actual_bookmark_name ]->length;
</del><span class="cx" style="display: block; padding: 0 10px">                 $direction            = $bookmark_starts_at > $processor_started_at ? 'forward' : 'backward';
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                /*
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -2610,6 +2580,12 @@
</span><span class="cx" style="display: block; padding: 0 10px">                        $this->state->frameset_ok    = true;
</span><span class="cx" style="display: block; padding: 0 10px">                        $this->element_queue         = array();
</span><span class="cx" style="display: block; padding: 0 10px">                        $this->current_element       = null;
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+
+                       if ( isset( $this->context_node ) ) {
+                               $this->breadcrumbs = array_slice( $this->breadcrumbs, 0, 2 );
+                       } else {
+                               $this->breadcrumbs = array();
+                       }
</ins><span class="cx" style="display: block; padding: 0 10px">                 }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                // When moving forwards, reparse the document until reaching the same location as the original bookmark.
</span></span></pre></div>
<a id="trunktestsphpunittestshtmlapiwpHtmlProcessorSemanticRulesphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/tests/phpunit/tests/html-api/wpHtmlProcessorSemanticRules.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/tests/phpunit/tests/html-api/wpHtmlProcessorSemanticRules.php       2024-07-12 21:58:20 UTC (rev 58712)
+++ trunk/tests/phpunit/tests/html-api/wpHtmlProcessorSemanticRules.php 2024-07-12 22:18:16 UTC (rev 58713)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -387,7 +387,16 @@
</span><span class="cx" style="display: block; padding: 0 10px">                $this->assertSame( 'CODE', $processor->get_tag(), "Expected to start test on CODE element but found {$processor->get_tag()} instead." );
</span><span class="cx" style="display: block; padding: 0 10px">                $this->assertSame( array( 'HTML', 'BODY', 'DIV', 'SPAN', 'CODE' ), $processor->get_breadcrumbs(), 'Failed to produce expected DOM nesting.' );
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                $this->assertTrue( $processor->next_token(), 'Failed to advance past CODE tag to expected SPAN closer.' );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         $this->assertTrue(
+                       $processor->next_tag(
+                               array(
+                                       'tag_name'    => 'SPAN',
+                                       'tag_closers' => 'visit',
+                               )
+                       ),
+                       'Failed to advance past CODE tag to expected SPAN closer.'
+               );
+               $this->assertSame( 'SPAN', $processor->get_tag() );
</ins><span class="cx" style="display: block; padding: 0 10px">                 $this->assertTrue( $processor->is_tag_closer(), 'Expected to find closing SPAN, but found opener instead.' );
</span><span class="cx" style="display: block; padding: 0 10px">                $this->assertSame( array( 'HTML', 'BODY', 'DIV' ), $processor->get_breadcrumbs(), 'Failed to advance past CODE tag to expected DIV opener.' );
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span></span></pre>
</div>
</div>

</body>
</html>