<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[58977] trunk/src/wp-includes/html-api: HTML API: Ensure that NULL and whitespace-only CDATA sections don't forbid FRAMESET.</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { white-space: pre-line; overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta" style="font-size: 105%">
<dt style="float: left; width: 6em; font-weight: bold">Revision</dt> <dd><a style="font-weight: bold" href="https://core.trac.wordpress.org/changeset/58977">58977</a><script type="application/ld+json">{"@context":"http://schema.org","@type":"EmailMessage","description":"Review this Commit","action":{"@type":"ViewAction","url":"https://core.trac.wordpress.org/changeset/58977","name":"Review Commit"}}</script></dd>
<dt style="float: left; width: 6em; font-weight: bold">Author</dt> <dd>dmsnell</dd>
<dt style="float: left; width: 6em; font-weight: bold">Date</dt> <dd>2024-09-03 19:48:57 +0000 (Tue, 03 Sep 2024)</dd>
</dl>

<pre style='padding-left: 1em; margin: 2em 0; border-left: 2px solid #ccc; line-height: 1.25; font-size: 105%; font-family: sans-serif'>HTML API: Ensure that NULL and whitespace-only CDATA sections don't forbid FRAMESET.

When CDATA sections (which can only occur inside SVG and MathML content) consist only of NULL bytes or whitespace characters they should not clear the "frameset ok" flag. Previously they have always been clearing this flag, but in this patch the logic is updated to detect these sequences properly.

Developed in https://github.com/WordPress/wordpress-develop/pull/7230
Discussed in https://core.trac.wordpress.org/ticket/61576

Follow-up to <a href="https://core.trac.wordpress.org/changeset/58867">[58867]</a>.

Props dmsnell, jonsurrell.
See <a href="https://core.trac.wordpress.org/ticket/61576">#61576</a>.</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunksrcwpincludeshtmlapiclasswphtmlprocessorphp">trunk/src/wp-includes/html-api/class-wp-html-processor.php</a></li>
<li><a href="#trunksrcwpincludeshtmlapiclasswphtmltagprocessorphp">trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunksrcwpincludeshtmlapiclasswphtmlprocessorphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/src/wp-includes/html-api/class-wp-html-processor.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/src/wp-includes/html-api/class-wp-html-processor.php        2024-09-03 18:49:16 UTC (rev 58976)
+++ trunk/src/wp-includes/html-api/class-wp-html-processor.php  2024-09-03 19:48:57 UTC (rev 58977)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -843,10 +843,7 @@
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                if ( self::PROCESS_NEXT_NODE === $node_to_process ) {
</span><span class="cx" style="display: block; padding: 0 10px">                        parent::next_token();
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                        if (
-                               WP_HTML_Tag_Processor::STATE_TEXT_NODE === $this->parser_state ||
-                               WP_HTML_Tag_Processor::STATE_CDATA_NODE === $this->parser_state
-                       ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                 if ( WP_HTML_Tag_Processor::STATE_TEXT_NODE === $this->parser_state ) {
</ins><span class="cx" style="display: block; padding: 0 10px">                                 parent::subdivide_text_appropriately();
</span><span class="cx" style="display: block; padding: 0 10px">                        }
</span><span class="cx" style="display: block; padding: 0 10px">                }
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -4375,7 +4372,6 @@
</span><span class="cx" style="display: block; padding: 0 10px">                }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                switch ( $op ) {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                        case '#cdata-section':
</del><span class="cx" style="display: block; padding: 0 10px">                         case '#text':
</span><span class="cx" style="display: block; padding: 0 10px">                                /*
</span><span class="cx" style="display: block; padding: 0 10px">                                 * > A character token that is U+0000 NULL
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -4396,6 +4392,24 @@
</span><span class="cx" style="display: block; padding: 0 10px">                                return true;
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                        /*
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                         * CDATA sections are alternate wrappers for text content and therefore
+                        * ought to follow the same rules as text nodes.
+                        */
+                       case '#cdata-section':
+                               /*
+                                * NULL bytes and whitespace do not change the frameset-ok flag.
+                                */
+                               $current_token        = $this->bookmarks[ $this->state->current_token->bookmark_name ];
+                               $cdata_content_start  = $current_token->start + 9;
+                               $cdata_content_length = $current_token->length - 12;
+                               if ( strspn( $this->html, "\0 \t\n\f\r", $cdata_content_start, $cdata_content_length ) !== $cdata_content_length ) {
+                                       $this->state->frameset_ok = false;
+                               }
+
+                               $this->insert_foreign_element( $this->state->current_token, false );
+                               return true;
+
+                       /*
</ins><span class="cx" style="display: block; padding: 0 10px">                          * > A comment token
</span><span class="cx" style="display: block; padding: 0 10px">                         */
</span><span class="cx" style="display: block; padding: 0 10px">                        case '#comment':
</span></span></pre></div>
<a id="trunksrcwpincludeshtmlapiclasswphtmltagprocessorphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php    2024-09-03 18:49:16 UTC (rev 58976)
+++ trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php      2024-09-03 19:48:57 UTC (rev 58977)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -3337,8 +3337,8 @@
</span><span class="cx" style="display: block; padding: 0 10px">        }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">        /**
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-         * Subdivides a matched text node or CDATA text node, splitting NULL byte sequences
-        * and decoded whitespace as distinct prefixes.
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+  * Subdivides a matched text node, splitting NULL byte sequences and decoded whitespace as
+        * distinct nodes prefixes.
</ins><span class="cx" style="display: block; padding: 0 10px">          *
</span><span class="cx" style="display: block; padding: 0 10px">         * Note that once anything that's neither a NULL byte nor decoded whitespace is
</span><span class="cx" style="display: block; padding: 0 10px">         * encountered, then the remainder of the text node is left intact as generic text.
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -3368,70 +3368,55 @@
</span><span class="cx" style="display: block; padding: 0 10px">         * @return bool Whether the text node was subdivided.
</span><span class="cx" style="display: block; padding: 0 10px">         */
</span><span class="cx" style="display: block; padding: 0 10px">        public function subdivide_text_appropriately(): bool {
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                if ( self::STATE_TEXT_NODE !== $this->parser_state ) {
+                       return false;
+               }
+
</ins><span class="cx" style="display: block; padding: 0 10px">                 $this->text_node_classification = self::TEXT_IS_GENERIC;
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                if ( self::STATE_TEXT_NODE === $this->parser_state ) {
-                       /*
-                        * NULL bytes are treated categorically different than numeric character
-                        * references whose number is zero. `&#x00;` is not the same as `"\x00"`.
-                        */
-                       $leading_nulls = strspn( $this->html, "\x00", $this->text_starts_at, $this->text_length );
-                       if ( $leading_nulls > 0 ) {
-                               $this->token_length             = $leading_nulls;
-                               $this->text_length              = $leading_nulls;
-                               $this->bytes_already_parsed     = $this->token_starts_at + $leading_nulls;
-                               $this->text_node_classification = self::TEXT_IS_NULL_SEQUENCE;
-                               return true;
-                       }
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         /*
+                * NULL bytes are treated categorically different than numeric character
+                * references whose number is zero. `&#x00;` is not the same as `"\x00"`.
+                */
+               $leading_nulls = strspn( $this->html, "\x00", $this->text_starts_at, $this->text_length );
+               if ( $leading_nulls > 0 ) {
+                       $this->token_length             = $leading_nulls;
+                       $this->text_length              = $leading_nulls;
+                       $this->bytes_already_parsed     = $this->token_starts_at + $leading_nulls;
+                       $this->text_node_classification = self::TEXT_IS_NULL_SEQUENCE;
+                       return true;
+               }
</ins><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                        /*
-                        * Start a decoding loop to determine the point at which the
-                        * text subdivides. This entails raw whitespace bytes and any
-                        * character reference that decodes to the same.
-                        */
-                       $at  = $this->text_starts_at;
-                       $end = $this->text_starts_at + $this->text_length;
-                       while ( $at < $end ) {
-                               $skipped = strspn( $this->html, " \t\f\r\n", $at, $end - $at );
-                               $at     += $skipped;
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         /*
+                * Start a decoding loop to determine the point at which the
+                * text subdivides. This entails raw whitespace bytes and any
+                * character reference that decodes to the same.
+                */
+               $at  = $this->text_starts_at;
+               $end = $this->text_starts_at + $this->text_length;
+               while ( $at < $end ) {
+                       $skipped = strspn( $this->html, " \t\f\r\n", $at, $end - $at );
+                       $at     += $skipped;
</ins><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                                if ( $at < $end && '&' === $this->html[ $at ] ) {
-                                       $matched_byte_length = null;
-                                       $replacement         = WP_HTML_Decoder::read_character_reference( 'data', $this->html, $at, $matched_byte_length );
-                                       if ( isset( $replacement ) && 1 === strspn( $replacement, " \t\f\r\n" ) ) {
-                                               $at += $matched_byte_length;
-                                               continue;
-                                       }
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                 if ( $at < $end && '&' === $this->html[ $at ] ) {
+                               $matched_byte_length = null;
+                               $replacement         = WP_HTML_Decoder::read_character_reference( 'data', $this->html, $at, $matched_byte_length );
+                               if ( isset( $replacement ) && 1 === strspn( $replacement, " \t\f\r\n" ) ) {
+                                       $at += $matched_byte_length;
+                                       continue;
</ins><span class="cx" style="display: block; padding: 0 10px">                                 }
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-
-                               break;
</del><span class="cx" style="display: block; padding: 0 10px">                         }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                        if ( $at > $this->text_starts_at ) {
-                               $new_length                     = $at - $this->text_starts_at;
-                               $this->text_length              = $new_length;
-                               $this->token_length             = $new_length;
-                               $this->bytes_already_parsed     = $at;
-                               $this->text_node_classification = self::TEXT_IS_WHITESPACE;
-                               return true;
-                       }
-
-                       return false;
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                 break;
</ins><span class="cx" style="display: block; padding: 0 10px">                 }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                // Unlike text nodes, there are no character references within CDATA sections.
-               if ( self::STATE_CDATA_NODE === $this->parser_state ) {
-                       $leading_nulls = strspn( $this->html, "\x00", $this->text_starts_at, $this->text_length );
-                       if ( $leading_nulls === $this->text_length ) {
-                               $this->text_node_classification = self::TEXT_IS_NULL_SEQUENCE;
-                               return true;
-                       }
-
-                       $leading_ws = strspn( $this->html, " \t\f\r\n", $this->text_starts_at, $this->text_length );
-                       if ( $leading_ws === $this->text_length ) {
-                               $this->text_node_classification = self::TEXT_IS_WHITESPACE;
-                               return true;
-                       }
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         if ( $at > $this->text_starts_at ) {
+                       $new_length                     = $at - $this->text_starts_at;
+                       $this->text_length              = $new_length;
+                       $this->token_length             = $new_length;
+                       $this->bytes_already_parsed     = $at;
+                       $this->text_node_classification = self::TEXT_IS_WHITESPACE;
+                       return true;
</ins><span class="cx" style="display: block; padding: 0 10px">                 }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                return false;
</span></span></pre>
</div>
</div>

</body>
</html>