<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[57489] trunk: HTML API: Fix splitting single text node.</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { white-space: pre-line; overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta" style="font-size: 105%">
<dt style="float: left; width: 6em; font-weight: bold">Revision</dt> <dd><a style="font-weight: bold" href="https://core.trac.wordpress.org/changeset/57489">57489</a><script type="application/ld+json">{"@context":"http://schema.org","@type":"EmailMessage","description":"Review this Commit","action":{"@type":"ViewAction","url":"https://core.trac.wordpress.org/changeset/57489","name":"Review Commit"}}</script></dd>
<dt style="float: left; width: 6em; font-weight: bold">Author</dt> <dd>dmsnell</dd>
<dt style="float: left; width: 6em; font-weight: bold">Date</dt> <dd>2024-01-30 22:07:42 +0000 (Tue, 30 Jan 2024)</dd>
</dl>

<pre style='padding-left: 1em; margin: 2em 0; border-left: 2px solid #ccc; line-height: 1.25; font-size: 105%; font-family: sans-serif'>HTML API: Fix splitting single text node.

When `next_token()` was introduced, it brought a subtle bug. When encountering a `<` in the HTML stream which did not lead to a tag or comment or other token, it was treating the full text span to that point as one text node, and the following span another text node.

The entire span should be one text node.

In this patch the Tag Processor properly detects this scenario and combines the spans into one text node.

Follow-up to <a href="https://core.trac.wordpress.org/changeset/57348">[57348]</a>

Props jonsurrell
Fixes <a href="https://core.trac.wordpress.org/ticket/60385">#60385</a></pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunksrcwpincludeshtmlapiclasswphtmltagprocessorphp">trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php</a></li>
<li><a href="#trunktestsphpunittestshtmlapiwpHtmlTagProcessorphp">trunk/tests/phpunit/tests/html-api/wpHtmlTagProcessor.php</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunksrcwpincludeshtmlapiclasswphtmltagprocessorphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php    2024-01-30 21:25:28 UTC (rev 57488)
+++ trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php      2024-01-30 22:07:42 UTC (rev 57489)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1512,16 +1512,6 @@
</span><span class="cx" style="display: block; padding: 0 10px">                while ( false !== $at && $at < $doc_length ) {
</span><span class="cx" style="display: block; padding: 0 10px">                        $at = strpos( $html, '<', $at );
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                        if ( $at > $was_at ) {
-                               $this->parser_state         = self::STATE_TEXT_NODE;
-                               $this->token_starts_at      = $was_at;
-                               $this->token_length         = $at - $was_at;
-                               $this->text_starts_at       = $was_at;
-                               $this->text_length          = $this->token_length;
-                               $this->bytes_already_parsed = $at;
-                               return true;
-                       }
-
</del><span class="cx" style="display: block; padding: 0 10px">                         /*
</span><span class="cx" style="display: block; padding: 0 10px">                         * This does not imply an incomplete parse; it indicates that there
</span><span class="cx" style="display: block; padding: 0 10px">                         * can be nothing left in the document other than a #text node.
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1536,6 +1526,37 @@
</span><span class="cx" style="display: block; padding: 0 10px">                                return true;
</span><span class="cx" style="display: block; padding: 0 10px">                        }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                        if ( $at > $was_at ) {
+                               /*
+                                * A "<" has been found in the document. That may be the start of another node, or
+                                * it may be an "ivalid-first-character-of-tag-name" error. If this is not the start
+                                * of another node the "<" should be included in this text node and another
+                                * termination point should be found for the text node.
+                                *
+                                * @see https://html.spec.whatwg.org/#tag-open-state
+                                */
+                               if ( strlen( $html ) > $at + 1 ) {
+                                       $next_character  = $html[ $at + 1 ];
+                                       $at_another_node =
+                                               '!' === $next_character ||
+                                               '/' === $next_character ||
+                                               '?' === $next_character ||
+                                               ( 'A' <= $next_character && $next_character <= 'z' );
+                                       if ( ! $at_another_node ) {
+                                               ++$at;
+                                               continue;
+                                       }
+                               }
+
+                               $this->parser_state         = self::STATE_TEXT_NODE;
+                               $this->token_starts_at      = $was_at;
+                               $this->token_length         = $at - $was_at;
+                               $this->text_starts_at       = $was_at;
+                               $this->text_length          = $this->token_length;
+                               $this->bytes_already_parsed = $at;
+                               return true;
+                       }
+
</ins><span class="cx" style="display: block; padding: 0 10px">                         $this->token_starts_at = $at;
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                        if ( $at + 1 < $doc_length && '/' === $this->html[ $at + 1 ] ) {
</span></span></pre></div>
<a id="trunktestsphpunittestshtmlapiwpHtmlTagProcessorphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/tests/phpunit/tests/html-api/wpHtmlTagProcessor.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/tests/phpunit/tests/html-api/wpHtmlTagProcessor.php 2024-01-30 21:25:28 UTC (rev 57488)
+++ trunk/tests/phpunit/tests/html-api/wpHtmlTagProcessor.php   2024-01-30 22:07:42 UTC (rev 57489)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -2715,4 +2715,16 @@
</span><span class="cx" style="display: block; padding: 0 10px">                $result = $p->next_tag();
</span><span class="cx" style="display: block; padding: 0 10px">                $this->assertFalse( $result, 'Did not handle "</ " html properly.' );
</span><span class="cx" style="display: block; padding: 0 10px">        }
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+
+       /**
+        * Ensures that non-tag syntax starting with `<` is consumed inside a text node.
+        *
+        * @ticket 60385
+        */
+       public function test_single_text_node_with_taglike_text() {
+               $p = new WP_HTML_Tag_Processor( 'test< /A>' );
+               $p->next_token();
+               $this->assertSame( '#text', $p->get_token_type(), 'Did not find text node.' );
+               $this->assertSame( 'test< /A>', $p->get_modifiable_text(), 'Did not find complete text node.' );
+       }
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span></span></pre>
</div>
</div>

</body>
</html>