<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[57805] trunk: HTML API: Defer applying attribute updates until necessary.</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { white-space: pre-line; overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta" style="font-size: 105%">
<dt style="float: left; width: 6em; font-weight: bold">Revision</dt> <dd><a style="font-weight: bold" href="https://core.trac.wordpress.org/changeset/57805">57805</a><script type="application/ld+json">{"@context":"http://schema.org","@type":"EmailMessage","description":"Review this Commit","action":{"@type":"ViewAction","url":"https://core.trac.wordpress.org/changeset/57805","name":"Review Commit"}}</script></dd>
<dt style="float: left; width: 6em; font-weight: bold">Author</dt> <dd>dmsnell</dd>
<dt style="float: left; width: 6em; font-weight: bold">Date</dt> <dd>2024-03-11 23:53:07 +0000 (Mon, 11 Mar 2024)</dd>
</dl>

<pre style='padding-left: 1em; margin: 2em 0; border-left: 2px solid #ccc; line-height: 1.25; font-size: 105%; font-family: sans-serif'>HTML API: Defer applying attribute updates until necessary.

When making repeated updates to a document, the Tag Processor will end
up copying the entire document once for every update. This can lead to
catastrophic behavior in the worse case.

However, when batch-applying updates it's able to copy chunks of the
document in one thread and only end up copying the entire document once
for the entire batch.

Previously the Tag Processor has been eagerly applying udpates, but in
this patch it defers applying those updates as long as is possible.

Developed in https://github.com/WordPress/wordpress-develop/pull/6120
Discussed in https://core.trac.wordpress.org/ticket/60697

Props: dmsnell, bernhard-reiter, jonsurrell, westonruter.
Fixes <a href="https://core.trac.wordpress.org/ticket/60697">#60697</a>.
Follow-up to <a href="https://core.trac.wordpress.org/changeset/55706">[55706]</a>, <a href="https://core.trac.wordpress.org/changeset/56941">[56941]</a>, <a href="https://core.trac.wordpress.org/changeset/57348">[57348]</a>.</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunksrcwpincludeshtmlapiclasswphtmltagprocessorphp">trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php</a></li>
<li><a href="#trunktestsphpunittestshtmlapiwpHtmlTagProcessorbookmarkphp">trunk/tests/phpunit/tests/html-api/wpHtmlTagProcessor-bookmark.php</a></li>
<li><a href="#trunktestsphpunittestshtmlapiwpHtmlTagProcessorphp">trunk/tests/phpunit/tests/html-api/wpHtmlTagProcessor.php</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunksrcwpincludeshtmlapiclasswphtmltagprocessorphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php    2024-03-11 23:14:26 UTC (rev 57804)
+++ trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php      2024-03-11 23:53:07 UTC (rev 57805)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -837,8 +837,27 @@
</span><span class="cx" style="display: block; padding: 0 10px">         * @return bool Whether a token was parsed.
</span><span class="cx" style="display: block; padding: 0 10px">         */
</span><span class="cx" style="display: block; padding: 0 10px">        public function next_token() {
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                return $this->base_class_next_token();
+       }
+
+       /**
+        * Internal method which finds the next token in the HTML document.
+        *
+        * This method is a protected internal function which implements the logic for
+        * finding the next token in a document. It exists so that the parser can update
+        * its state without affecting the location of the cursor in the document and
+        * without triggering subclass methods for things like `next_token()`, e.g. when
+        * applying patches before searching for the next token.
+        *
+        * @since 6.5.0
+        *
+        * @access private
+        *
+        * @return bool Whether a token was parsed.
+        */
+       private function base_class_next_token() {
</ins><span class="cx" style="display: block; padding: 0 10px">                 $was_at = $this->bytes_already_parsed;
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                $this->get_updated_html();
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         $this->after_tag();
</ins><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                // Don't proceed if there's nothing more to scan.
</span><span class="cx" style="display: block; padding: 0 10px">                if (
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -2041,6 +2060,45 @@
</span><span class="cx" style="display: block; padding: 0 10px">         * @since 6.2.0
</span><span class="cx" style="display: block; padding: 0 10px">         */
</span><span class="cx" style="display: block; padding: 0 10px">        private function after_tag() {
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                /*
+                * There could be lexical updates enqueued for an attribute that
+                * also exists on the next tag. In order to avoid conflating the
+                * attributes across the two tags, lexical updates with names
+                * need to be flushed to raw lexical updates.
+                */
+               $this->class_name_updates_to_attributes_updates();
+
+               /*
+                * Purge updates if there are too many. The actual count isn't
+                * scientific, but a few values from 100 to a few thousand were
+                * tests to find a practially-useful limit.
+                *
+                * If the update queue grows too big, then the Tag Processor
+                * will spend more time iterating through them and lose the
+                * efficiency gains of deferring applying them.
+                */
+               if ( 1000 < count( $this->lexical_updates ) ) {
+                       $this->get_updated_html();
+               }
+
+               foreach ( $this->lexical_updates as $name => $update ) {
+                       /*
+                        * Any updates appearing after the cursor should be applied
+                        * before proceeding, otherwise they may be overlooked.
+                        */
+                       if ( $update->start >= $this->bytes_already_parsed ) {
+                               $this->get_updated_html();
+                               break;
+                       }
+
+                       if ( is_int( $name ) ) {
+                               continue;
+                       }
+
+                       $this->lexical_updates[] = $update;
+                       unset( $this->lexical_updates[ $name ] );
+               }
+
</ins><span class="cx" style="display: block; padding: 0 10px">                 $this->token_starts_at      = null;
</span><span class="cx" style="display: block; padding: 0 10px">                $this->token_length         = null;
</span><span class="cx" style="display: block; padding: 0 10px">                $this->tag_name_starts_at   = null;
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -2230,7 +2288,7 @@
</span><span class="cx" style="display: block; padding: 0 10px">                        $shift = strlen( $diff->text ) - $diff->length;
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                        // Adjust the cursor position by however much an update affects it.
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                        if ( $diff->start <= $this->bytes_already_parsed ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                 if ( $diff->start < $this->bytes_already_parsed ) {
</ins><span class="cx" style="display: block; padding: 0 10px">                                 $this->bytes_already_parsed += $shift;
</span><span class="cx" style="display: block; padding: 0 10px">                        }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -3164,16 +3222,8 @@
</span><span class="cx" style="display: block; padding: 0 10px">                 *                 └←─┘ back up by strlen("em") + 1 ==> 3
</span><span class="cx" style="display: block; padding: 0 10px">                 */
</span><span class="cx" style="display: block; padding: 0 10px">                $this->bytes_already_parsed = $before_current_tag;
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                $this->parse_next_tag();
-               // Reparse the attributes.
-               while ( $this->parse_next_attribute() ) {
-                       continue;
-               }
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         $this->base_class_next_token();
</ins><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                $tag_ends_at                = strpos( $this->html, '>', $this->bytes_already_parsed );
-               $this->token_length         = $tag_ends_at - $this->token_starts_at;
-               $this->bytes_already_parsed = $tag_ends_at;
-
</del><span class="cx" style="display: block; padding: 0 10px">                 return $this->html;
</span><span class="cx" style="display: block; padding: 0 10px">        }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span></span></pre></div>
<a id="trunktestsphpunittestshtmlapiwpHtmlTagProcessorbookmarkphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/tests/phpunit/tests/html-api/wpHtmlTagProcessor-bookmark.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/tests/phpunit/tests/html-api/wpHtmlTagProcessor-bookmark.php        2024-03-11 23:14:26 UTC (rev 57804)
+++ trunk/tests/phpunit/tests/html-api/wpHtmlTagProcessor-bookmark.php  2024-03-11 23:53:07 UTC (rev 57805)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -293,6 +293,7 @@
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">        /**
</span><span class="cx" style="display: block; padding: 0 10px">         * @ticket 56299
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         * @ticket 60697
</ins><span class="cx" style="display: block; padding: 0 10px">          *
</span><span class="cx" style="display: block; padding: 0 10px">         * @covers WP_HTML_Tag_Processor::seek
</span><span class="cx" style="display: block; padding: 0 10px">         */
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -299,8 +300,10 @@
</span><span class="cx" style="display: block; padding: 0 10px">        public function test_updates_bookmark_for_additions_after_both_sides() {
</span><span class="cx" style="display: block; padding: 0 10px">                $processor = new WP_HTML_Tag_Processor( '<div>First</div><div>Second</div>' );
</span><span class="cx" style="display: block; padding: 0 10px">                $processor->next_tag();
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                $processor->set_attribute( 'id', 'one' );
</ins><span class="cx" style="display: block; padding: 0 10px">                 $processor->set_bookmark( 'first' );
</span><span class="cx" style="display: block; padding: 0 10px">                $processor->next_tag();
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                $processor->set_attribute( 'id', 'two' );
</ins><span class="cx" style="display: block; padding: 0 10px">                 $processor->add_class( 'second' );
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                $processor->seek( 'first' );
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -307,7 +310,13 @@
</span><span class="cx" style="display: block; padding: 0 10px">                $processor->add_class( 'first' );
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                $this->assertSame(
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                        '<div class="first">First</div><div class="second">Second</div>',
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                 'one',
+                       $processor->get_attribute( 'id' ),
+                       'Should have remembered attribute change from before the seek.'
+               );
+
+               $this->assertSame(
+                       '<div class="first" id="one">First</div><div class="second" id="two">Second</div>',
</ins><span class="cx" style="display: block; padding: 0 10px">                         $processor->get_updated_html(),
</span><span class="cx" style="display: block; padding: 0 10px">                        'The bookmark was updated incorrectly in response to HTML markup updates'
</span><span class="cx" style="display: block; padding: 0 10px">                );
</span></span></pre></div>
<a id="trunktestsphpunittestshtmlapiwpHtmlTagProcessorphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/tests/phpunit/tests/html-api/wpHtmlTagProcessor.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/tests/phpunit/tests/html-api/wpHtmlTagProcessor.php 2024-03-11 23:14:26 UTC (rev 57804)
+++ trunk/tests/phpunit/tests/html-api/wpHtmlTagProcessor.php   2024-03-11 23:53:07 UTC (rev 57805)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -2727,4 +2727,49 @@
</span><span class="cx" style="display: block; padding: 0 10px">                $this->assertSame( '#text', $processor->get_token_type(), 'Did not find text node.' );
</span><span class="cx" style="display: block; padding: 0 10px">                $this->assertSame( 'test< /A>', $processor->get_modifiable_text(), 'Did not find complete text node.' );
</span><span class="cx" style="display: block; padding: 0 10px">        }
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+
+       /**
+        * Ensures that updates which are enqueued in front of the cursor
+        * are applied before moving forward in the document.
+        *
+        * @ticket 60697
+        */
+       public function test_applies_updates_before_proceeding() {
+               $html = '<div><img></div><div><img></div>';
+
+               $subclass = new class( $html ) extends WP_HTML_Tag_Processor {
+                       /**
+                        * Inserts raw text after the current token.
+                        *
+                        * @param string $new_html Raw text to insert.
+                        */
+                       public function insert_after( $new_html ) {
+                               $this->set_bookmark( 'here' );
+                               $this->lexical_updates[] = new WP_HTML_Text_Replacement(
+                                       $this->bookmarks['here']->start + $this->bookmarks['here']->length + 1,
+                                       0,
+                                       $new_html
+                               );
+                       }
+               };
+
+               $subclass->next_tag( 'img' );
+               $subclass->insert_after( '<p>snow-capped</p>' );
+
+               $subclass->next_tag();
+               $this->assertSame(
+                       'P',
+                       $subclass->get_tag(),
+                       'Should have matched inserted HTML as next tag.'
+               );
+
+               $subclass->next_tag( 'img' );
+               $subclass->set_attribute( 'alt', 'mountain' );
+
+               $this->assertSame(
+                       '<div><img><p>snow-capped</p></div><div><img alt="mountain"></div>',
+                       $subclass->get_updated_html(),
+                       'Should have properly applied the update from in front of the cursor.'
+               );
+       }
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span></span></pre>
</div>
</div>

</body>
</html>