<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[58233] trunk: HTML API: Fix token length bug in Tag Processor.</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { white-space: pre-line; overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta" style="font-size: 105%">
<dt style="float: left; width: 6em; font-weight: bold">Revision</dt> <dd><a style="font-weight: bold" href="https://core.trac.wordpress.org/changeset/58233">58233</a><script type="application/ld+json">{"@context":"http://schema.org","@type":"EmailMessage","description":"Review this Commit","action":{"@type":"ViewAction","url":"https://core.trac.wordpress.org/changeset/58233","name":"Review Commit"}}</script></dd>
<dt style="float: left; width: 6em; font-weight: bold">Author</dt> <dd>dmsnell</dd>
<dt style="float: left; width: 6em; font-weight: bold">Date</dt> <dd>2024-05-29 11:40:16 +0000 (Wed, 29 May 2024)</dd>
</dl>

<pre style='padding-left: 1em; margin: 2em 0; border-left: 2px solid #ccc; line-height: 1.25; font-size: 105%; font-family: sans-serif'>HTML API: Fix token length bug in Tag Processor.

The Tag Processor stores the byte-offsets into its HTML document where
the current token starts and ends, and also for every bookmark. In some
cases for tags, the end offset has been off by one.

In this patch the offset is fixed so that a bookmark always properly
refers to the full span of the token it's bookmarking. Also the current
token byte offsets are properly recorded.

While this is a defect in the Tag Processor, it hasn't been exposed 
through the public interface and has not affected any of the working
of the processor. Only subclasses which rely on the length of a bookmark
have been potentially affected, and these are not supported environments
in the ongoing work.

This fix is important for future work and for ensuring that subclasses
performing custom behaviors remain as reliable as the public interface.

Developed in https://github.com/WordPress/wordpress-develop/pull/6625
Discussed in https://core.trac.wordpress.org/ticket/61301

Props dmsnell, gziolo, jonsurrell, westonruter.
Fixes <a href="https://core.trac.wordpress.org/ticket/61301">#61301</a>.</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunksrcwpincludeshtmlapiclasswphtmltagprocessorphp">trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php</a></li>
<li><a href="#trunksrcwpincludesinteractivityapiclasswpinteractivityapidirectivesprocessorphp">trunk/src/wp-includes/interactivity-api/class-wp-interactivity-api-directives-processor.php</a></li>
<li><a href="#trunktestsphpunittestshtmlapiwpHtmlTagProcessorphp">trunk/tests/phpunit/tests/html-api/wpHtmlTagProcessor.php</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunksrcwpincludeshtmlapiclasswphtmltagprocessorphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php    2024-05-29 10:48:48 UTC (rev 58232)
+++ trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php      2024-05-29 11:40:16 UTC (rev 58233)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -926,8 +926,8 @@
</span><span class="cx" style="display: block; padding: 0 10px">                        return false;
</span><span class="cx" style="display: block; padding: 0 10px">                }
</span><span class="cx" style="display: block; padding: 0 10px">                $this->parser_state         = self::STATE_MATCHED_TAG;
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                $this->token_length         = $tag_ends_at - $this->token_starts_at;
</del><span class="cx" style="display: block; padding: 0 10px">                 $this->bytes_already_parsed = $tag_ends_at + 1;
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                $this->token_length         = $this->bytes_already_parsed - $this->token_starts_at;
</ins><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                /*
</span><span class="cx" style="display: block; padding: 0 10px">                 * For non-DATA sections which might contain text that looks like HTML tags but
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1013,7 +1013,7 @@
</span><span class="cx" style="display: block; padding: 0 10px">                 */
</span><span class="cx" style="display: block; padding: 0 10px">                $this->token_starts_at      = $was_at;
</span><span class="cx" style="display: block; padding: 0 10px">                $this->token_length         = $this->bytes_already_parsed - $this->token_starts_at;
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                $this->text_starts_at       = $tag_ends_at + 1;
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         $this->text_starts_at       = $tag_ends_at;
</ins><span class="cx" style="display: block; padding: 0 10px">                 $this->text_length          = $this->tag_name_starts_at - $this->text_starts_at;
</span><span class="cx" style="display: block; padding: 0 10px">                $this->tag_name_starts_at   = $tag_name_starts_at;
</span><span class="cx" style="display: block; padding: 0 10px">                $this->tag_name_length      = $tag_name_length;
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -2687,7 +2687,7 @@
</span><span class="cx" style="display: block; padding: 0 10px">                 *     <figure />
</span><span class="cx" style="display: block; padding: 0 10px">                 *             ^ this appears one character before the end of the closing ">".
</span><span class="cx" style="display: block; padding: 0 10px">                 */
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                return '/' === $this->html[ $this->token_starts_at + $this->token_length - 1 ];
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         return '/' === $this->html[ $this->token_starts_at + $this->token_length - 2 ];
</ins><span class="cx" style="display: block; padding: 0 10px">         }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">        /**
</span></span></pre></div>
<a id="trunksrcwpincludesinteractivityapiclasswpinteractivityapidirectivesprocessorphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/src/wp-includes/interactivity-api/class-wp-interactivity-api-directives-processor.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/src/wp-includes/interactivity-api/class-wp-interactivity-api-directives-processor.php       2024-05-29 10:48:48 UTC (rev 58232)
+++ trunk/src/wp-includes/interactivity-api/class-wp-interactivity-api-directives-processor.php 2024-05-29 11:40:16 UTC (rev 58233)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -107,7 +107,7 @@
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                $bookmark = 'append_content_after_template_tag_closer';
</span><span class="cx" style="display: block; padding: 0 10px">                $this->set_bookmark( $bookmark );
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                $after_closing_tag = $this->bookmarks[ $bookmark ]->start + $this->bookmarks[ $bookmark ]->length + 1;
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         $after_closing_tag = $this->bookmarks[ $bookmark ]->start + $this->bookmarks[ $bookmark ]->length;
</ins><span class="cx" style="display: block; padding: 0 10px">                 $this->release_bookmark( $bookmark );
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                // Appends the new content.
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -140,7 +140,7 @@
</span><span class="cx" style="display: block; padding: 0 10px">                }
</span><span class="cx" style="display: block; padding: 0 10px">                list( $opener_tag, $closer_tag ) = $bookmarks;
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                $after_opener_tag  = $this->bookmarks[ $opener_tag ]->start + $this->bookmarks[ $opener_tag ]->length + 1;
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         $after_opener_tag  = $this->bookmarks[ $opener_tag ]->start + $this->bookmarks[ $opener_tag ]->length;
</ins><span class="cx" style="display: block; padding: 0 10px">                 $before_closer_tag = $this->bookmarks[ $closer_tag ]->start;
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                if ( $rewind ) {
</span></span></pre></div>
<a id="trunktestsphpunittestshtmlapiwpHtmlTagProcessorphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/tests/phpunit/tests/html-api/wpHtmlTagProcessor.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/tests/phpunit/tests/html-api/wpHtmlTagProcessor.php 2024-05-29 10:48:48 UTC (rev 58232)
+++ trunk/tests/phpunit/tests/html-api/wpHtmlTagProcessor.php   2024-05-29 11:40:16 UTC (rev 58233)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -477,6 +477,109 @@
</span><span class="cx" style="display: block; padding: 0 10px">        }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">        /**
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         * Ensures that bookmarks start and length correctly describe a given token in HTML.
+        *
+        * @ticket 61301
+        *
+        * @dataProvider data_html_nth_token_substring
+        *
+        * @param string $html            Input HTML.
+        * @param int    $match_nth_token Which token to inspect from input HTML.
+        * @param string $expected_match  Expected full raw token bookmark should capture.
+        */
+       public function test_token_bookmark_span( string $html, int $match_nth_token, string $expected_match ) {
+               $processor = new class( $html ) extends WP_HTML_Tag_Processor {
+                       /**
+                        * Returns the raw span of HTML for the currently-matched
+                        * token, or null if not paused on any token.
+                        *
+                        * @return string|null Raw HTML content of currently-matched token,
+                        *                     otherwise `null` if not matched.
+                        */
+                       public function get_raw_token() {
+                               if (
+                                       WP_HTML_Tag_Processor::STATE_READY === $this->parser_state ||
+                                       WP_HTML_Tag_Processor::STATE_INCOMPLETE_INPUT === $this->parser_state ||
+                                       WP_HTML_Tag_Processor::STATE_COMPLETE === $this->parser_state
+                               ) {
+                                       return null;
+                               }
+
+                               $this->set_bookmark( 'mark' );
+                               $mark = $this->bookmarks['mark'];
+
+                               return substr( $this->html, $mark->start, $mark->length );
+                       }
+               };
+
+               for ( $i = 0; $i < $match_nth_token; $i++ ) {
+                       $processor->next_token();
+               }
+
+               $raw_token = $processor->get_raw_token();
+               $this->assertIsString(
+                       $raw_token,
+                       "Failed to find raw token at position {$match_nth_token}: check test data provider."
+               );
+
+               $this->assertSame(
+                       $expected_match,
+                       $raw_token,
+                       'Bookmarked wrong span of text for full matched token.'
+               );
+       }
+
+       /**
+        * Data provider.
+        *
+        * @return array
+        */
+       public static function data_html_nth_token_substring() {
+               return array(
+                       // Tags.
+                       'DIV start tag'                 => array( '<div>', 1, '<div>' ),
+                       'DIV start tag with attributes' => array( '<div class="x" disabled>', 1, '<div class="x" disabled>' ),
+                       'DIV end tag'                   => array( '</div>', 1, '</div>' ),
+                       'DIV end tag with attributes'   => array( '</div class="x" disabled>', 1, '</div class="x" disabled>' ),
+                       'Nested DIV'                    => array( '<div><div b>', 2, '<div b>' ),
+                       'Sibling DIV'                   => array( '<div></div><div b>', 3, '<div b>' ),
+                       'DIV after text'                => array( 'text <div>', 2, '<div>' ),
+                       'DIV before text'               => array( '<div> text', 1, '<div>' ),
+                       'DIV after comment'             => array( '<!-- comment --><div>', 2, '<div>' ),
+                       'DIV before comment'            => array( '<div><!-- c --> ', 1, '<div>' ),
+                       'Start "self-closing" tag'      => array( '<div />', 1, '<div />' ),
+                       'Void tag'                      => array( '<img src="img.png">', 1, '<img src="img.png">' ),
+                       'Void tag w/self-closing flag'  => array( '<img src="img.png" />', 1, '<img src="img.png" />' ),
+                       'Void tag inside DIV'           => array( '<div><img src="img.png"></div>', 2, '<img src="img.png">' ),
+
+                       // Special atomic tags.
+                       'SCRIPT tag'                    => array( '<script>inside text</script>', 1, '<script>inside text</script>' ),
+                       'SCRIPT double-escape'          => array( '<script><!-- <script> echo "</script>"; </script><div>', 1, '<script><!-- <script> echo "</script>"; </script>' ),
+
+                       // Text.
+                       'Text'                          => array( 'Just text', 1, 'Just text' ),
+                       'Text in DIV'                   => array( '<div>Text<div>', 2, 'Text' ),
+                       'Text before DIV'               => array( 'Text<div>', 1, 'Text' ),
+                       'Text after DIV'                => array( '<div></div>Text', 3, 'Text' ),
+                       'Text after comment'            => array( '<!-- comment -->Text', 2, 'Text' ),
+                       'Text before comment'           => array( 'Text<!-- c --> ', 1, 'Text' ),
+
+                       // Comments.
+                       'Comment'                       => array( '<!-- comment -->', 1, '<!-- comment -->' ),
+                       'Comment in DIV'                => array( '<div><!-- comment --><div>', 2, '<!-- comment -->' ),
+                       'Comment before DIV'            => array( '<!-- comment --><div>', 1, '<!-- comment -->' ),
+                       'Comment after DIV'             => array( '<div></div><!-- comment -->', 3, '<!-- comment -->' ),
+                       'Comment after comment'         => array( '<!-- comment --><!-- comment -->', 2, '<!-- comment -->' ),
+                       'Comment before comment'        => array( '<!-- comment --><!-- c --> ', 1, '<!-- comment -->' ),
+                       'Abruptly closed comment'       => array( '<!-->', 1, '<!-->' ),
+                       'Empty comment'                 => array( '<!---->', 1, '<!---->' ),
+                       'Funky comment'                 => array( '</_ funk >', 1, '</_ funk >' ),
+                       'PI lookalike comment'          => array( '<?processing instruction?>', 1, '<?processing instruction?>' ),
+                       'CDATA lookalike comment'       => array( '<![CDATA[ see? data ]]>', 1, '<![CDATA[ see? data ]]>' ),
+               );
+       }
+
+       /**
</ins><span class="cx" style="display: block; padding: 0 10px">          * @ticket 56299
</span><span class="cx" style="display: block; padding: 0 10px">         *
</span><span class="cx" style="display: block; padding: 0 10px">         * @covers WP_HTML_Tag_Processor::next_tag
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -2746,7 +2849,7 @@
</span><span class="cx" style="display: block; padding: 0 10px">                        public function insert_after( $new_html ) {
</span><span class="cx" style="display: block; padding: 0 10px">                                $this->set_bookmark( 'here' );
</span><span class="cx" style="display: block; padding: 0 10px">                                $this->lexical_updates[] = new WP_HTML_Text_Replacement(
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                                        $this->bookmarks['here']->start + $this->bookmarks['here']->length + 1,
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                                 $this->bookmarks['here']->start + $this->bookmarks['here']->length,
</ins><span class="cx" style="display: block; padding: 0 10px">                                         0,
</span><span class="cx" style="display: block; padding: 0 10px">                                        $new_html
</span><span class="cx" style="display: block; padding: 0 10px">                                );
</span></span></pre>
</div>
</div>

</body>
</html>