<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[58836] trunk: HTML API: Introduce full parsing mode in HTML Processor.</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { white-space: pre-line; overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta" style="font-size: 105%">
<dt style="float: left; width: 6em; font-weight: bold">Revision</dt> <dd><a style="font-weight: bold" href="https://core.trac.wordpress.org/changeset/58836">58836</a><script type="application/ld+json">{"@context":"http://schema.org","@type":"EmailMessage","description":"Review this Commit","action":{"@type":"ViewAction","url":"https://core.trac.wordpress.org/changeset/58836","name":"Review Commit"}}</script></dd>
<dt style="float: left; width: 6em; font-weight: bold">Author</dt> <dd>dmsnell</dd>
<dt style="float: left; width: 6em; font-weight: bold">Date</dt> <dd>2024-07-31 16:54:23 +0000 (Wed, 31 Jul 2024)</dd>
</dl>

<pre style='padding-left: 1em; margin: 2em 0; border-left: 2px solid #ccc; line-height: 1.25; font-size: 105%; font-family: sans-serif'>HTML API: Introduce full parsing mode in HTML Processor.

The HTML Processor has only supported a specific kind of parsing mode
called _the fragment parsing mode_, where it behaves in the same way
that `node.innerHTML = html` does in the DOM. This mode assumes a
context node and doesn't support parsing an entire document.

As part of work to add more spec support to the HTML API, this patch
introduces a full parsing mode, which can parse a full HTML document
from start to end, including the doctype declaration and head tags.

Developed in https://github.com/wordpress/wordpress-develop/pull/6977
Discussed in https://core.trac.wordpress.org/ticket/61576

Props: dmsnell, jonsurrell.
See <a href="https://core.trac.wordpress.org/ticket/61576">#61576</a>.</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunksrcwpincludeshtmlapiclasswphtmlprocessorstatephp">trunk/src/wp-includes/html-api/class-wp-html-processor-state.php</a></li>
<li><a href="#trunksrcwpincludeshtmlapiclasswphtmlprocessorphp">trunk/src/wp-includes/html-api/class-wp-html-processor.php</a></li>
<li><a href="#trunktestsphpunittestshtmlapiwpHtmlProcessorBreadcrumbsphp">trunk/tests/phpunit/tests/html-api/wpHtmlProcessorBreadcrumbs.php</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunksrcwpincludeshtmlapiclasswphtmlprocessorstatephp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/src/wp-includes/html-api/class-wp-html-processor-state.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/src/wp-includes/html-api/class-wp-html-processor-state.php  2024-07-31 14:03:24 UTC (rev 58835)
+++ trunk/src/wp-includes/html-api/class-wp-html-processor-state.php    2024-07-31 16:54:23 UTC (rev 58836)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -429,6 +429,38 @@
</span><span class="cx" style="display: block; padding: 0 10px">        public $context_node = null;
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">        /**
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         * The recognized encoding of the input byte stream.
+        *
+        * > The stream of code points that comprises the input to the tokenization
+        * > stage will be initially seen by the user agent as a stream of bytes
+        * > (typically coming over the network or from the local file system).
+        * > The bytes encode the actual characters according to a particular character
+        * > encoding, which the user agent uses to decode the bytes into characters.
+        *
+        * @since 6.7.0
+        *
+        * @var string|null
+        */
+       public $encoding = null;
+
+       /**
+        * The parser's confidence in the input encoding.
+        *
+        * > When the HTML parser is decoding an input byte stream, it uses a character
+        * > encoding and a confidence. The confidence is either tentative, certain, or
+        * > irrelevant. The encoding used, and whether the confidence in that encoding
+        * > is tentative or certain, is used during the parsing to determine whether to
+        * > change the encoding. If no encoding is necessary, e.g. because the parser is
+        * > operating on a Unicode stream and doesn't have to use a character encoding
+        * > at all, then the confidence is irrelevant.
+        *
+        * @since 6.7.0
+        *
+        * @var string
+        */
+       public $encoding_confidence = 'tentative';
+
+       /**
</ins><span class="cx" style="display: block; padding: 0 10px">          * HEAD element pointer.
</span><span class="cx" style="display: block; padding: 0 10px">         *
</span><span class="cx" style="display: block; padding: 0 10px">         * @since 6.7.0
</span></span></pre></div>
<a id="trunksrcwpincludeshtmlapiclasswphtmlprocessorphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/src/wp-includes/html-api/class-wp-html-processor.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/src/wp-includes/html-api/class-wp-html-processor.php        2024-07-31 14:03:24 UTC (rev 58835)
+++ trunk/src/wp-includes/html-api/class-wp-html-processor.php  2024-07-31 16:54:23 UTC (rev 58836)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -256,21 +256,6 @@
</span><span class="cx" style="display: block; padding: 0 10px">         */
</span><span class="cx" style="display: block; padding: 0 10px">        private $context_node = null;
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-        /**
-        * Whether the parser has yet processed the context node,
-        * if created as a fragment parser.
-        *
-        * The context node will be initially pushed onto the stack of open elements,
-        * but when created as a fragment parser, this context element (and the implicit
-        * HTML document node above it) should not be exposed as a matched token or node.
-        *
-        * This boolean indicates whether the processor should skip over the current
-        * node in its initial search for the first node created from the input HTML.
-        *
-        * @var bool
-        */
-       private $has_seen_context_node = false;
-
</del><span class="cx" style="display: block; padding: 0 10px">         /*
</span><span class="cx" style="display: block; padding: 0 10px">         * Public Interface Functions
</span><span class="cx" style="display: block; padding: 0 10px">         */
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -312,9 +297,11 @@
</span><span class="cx" style="display: block; padding: 0 10px">                        return null;
</span><span class="cx" style="display: block; padding: 0 10px">                }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                $processor                        = new static( $html, self::CONSTRUCTOR_UNLOCK_CODE );
-               $processor->state->context_node   = array( 'BODY', array() );
-               $processor->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         $processor                             = new static( $html, self::CONSTRUCTOR_UNLOCK_CODE );
+               $processor->state->context_node        = array( 'BODY', array() );
+               $processor->state->insertion_mode      = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
+               $processor->state->encoding            = $encoding;
+               $processor->state->encoding_confidence = 'certain';
</ins><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                // @todo Create "fake" bookmarks for non-existent but implied nodes.
</span><span class="cx" style="display: block; padding: 0 10px">                $processor->bookmarks['root-node']    = new WP_HTML_Span( 0, 0 );
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -341,6 +328,34 @@
</span><span class="cx" style="display: block; padding: 0 10px">        }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">        /**
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         * Creates an HTML processor in the full parsing mode.
+        *
+        * It's likely that a fragment parser is more appropriate, unless sending an
+        * entire HTML document from start to finish. Consider a fragment parser with
+        * a context node of `<body>`.
+        *
+        * Since UTF-8 is the only currently-accepted charset, if working with a
+        * document that isn't UTF-8, it's important to convert the document before
+        * creating the processor: pass in the converted HTML.
+        *
+        * @param string      $html                    Input HTML document to process.
+        * @param string|null $known_definite_encoding Optional. If provided, specifies the charset used
+        *                                             in the input byte stream. Currently must be UTF-8.
+        * @return static|null The created processor if successful, otherwise null.
+        */
+       public static function create_full_parser( $html, $known_definite_encoding = 'UTF-8' ) {
+               if ( 'UTF-8' !== $known_definite_encoding ) {
+                       return null;
+               }
+
+               $processor                             = new static( $html, self::CONSTRUCTOR_UNLOCK_CODE );
+               $processor->state->encoding            = $known_definite_encoding;
+               $processor->state->encoding_confidence = 'certain';
+
+               return $processor;
+       }
+
+       /**
</ins><span class="cx" style="display: block; padding: 0 10px">          * Constructor.
</span><span class="cx" style="display: block; padding: 0 10px">         *
</span><span class="cx" style="display: block; padding: 0 10px">         * Do not use this method. Use the static creator methods instead.
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -993,7 +1008,62 @@
</span><span class="cx" style="display: block; padding: 0 10px">         * @return bool Whether an element was found.
</span><span class="cx" style="display: block; padding: 0 10px">         */
</span><span class="cx" style="display: block; padding: 0 10px">        private function step_initial(): bool {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                $this->bail( 'No support for parsing in the ' . WP_HTML_Processor_State::INSERTION_MODE_INITIAL . ' state.' );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         $token_name = $this->get_token_name();
+               $token_type = $this->get_token_type();
+               $op_sigil   = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : '';
+               $op         = "{$op_sigil}{$token_name}";
+
+               switch ( $op ) {
+                       /*
+                        * > A character token that is one of U+0009 CHARACTER TABULATION,
+                        * > U+000A LINE FEED (LF), U+000C FORM FEED (FF),
+                        * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
+                        *
+                        * Parse error: ignore the token.
+                        */
+                       case '#text':
+                               $text = $this->get_modifiable_text();
+                               if ( strlen( $text ) === strspn( $text, " \t\n\f\r" ) ) {
+                                       return $this->step();
+                               }
+                               goto initial_anything_else;
+                               break;
+
+                       /*
+                        * > A comment token
+                        */
+                       case '#comment':
+                       case '#funky-comment':
+                       case '#presumptuous-tag':
+                               $this->insert_html_element( $this->state->current_token );
+                               return true;
+
+                       /*
+                        * > A DOCTYPE token
+                        */
+                       case 'html':
+                               $contents = $this->get_modifiable_text();
+                               if ( ' html' !== $contents ) {
+                                       /*
+                                        * @todo When the HTML Tag Processor fully parses the DOCTYPE declaration,
+                                        *       this code should examine the contents to set the compatability mode.
+                                        */
+                                       $this->bail( 'Cannot process any DOCTYPE other than a normative HTML5 doctype.' );
+                               }
+
+                               /*
+                                * > Then, switch the insertion mode to "before html".
+                                */
+                               $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HTML;
+                               return true;
+               }
+
+               /*
+                * > Anything else
+                */
+               initial_anything_else:
+               $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HTML;
+               return $this->step( self::REPROCESS_CURRENT_NODE );
</ins><span class="cx" style="display: block; padding: 0 10px">         }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">        /**
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1002,7 +1072,7 @@
</span><span class="cx" style="display: block; padding: 0 10px">         * This internal function performs the 'before html' insertion mode
</span><span class="cx" style="display: block; padding: 0 10px">         * logic for the generalized WP_HTML_Processor::step() function.
</span><span class="cx" style="display: block; padding: 0 10px">         *
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-         * @since 6.7.0 Stub implementation.
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+  * @since 6.7.0
</ins><span class="cx" style="display: block; padding: 0 10px">          *
</span><span class="cx" style="display: block; padding: 0 10px">         * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
</span><span class="cx" style="display: block; padding: 0 10px">         *
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1012,7 +1082,86 @@
</span><span class="cx" style="display: block; padding: 0 10px">         * @return bool Whether an element was found.
</span><span class="cx" style="display: block; padding: 0 10px">         */
</span><span class="cx" style="display: block; padding: 0 10px">        private function step_before_html(): bool {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                $this->bail( 'No support for parsing in the ' . WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HTML . ' state.' );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         $token_name = $this->get_token_name();
+               $token_type = $this->get_token_type();
+               $is_closer  = parent::is_tag_closer();
+               $op_sigil   = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : '';
+               $op         = "{$op_sigil}{$token_name}";
+
+               switch ( $op ) {
+                       /*
+                        * > A DOCTYPE token
+                        */
+                       case 'html':
+                               // Parse error: ignore the token.
+                               return $this->step();
+
+                       /*
+                        * > A comment token
+                        */
+                       case '#comment':
+                       case '#funky-comment':
+                       case '#presumptuous-tag':
+                               $this->insert_html_element( $this->state->current_token );
+                               return true;
+
+                       /*
+                        * > A character token that is one of U+0009 CHARACTER TABULATION,
+                        * > U+000A LINE FEED (LF), U+000C FORM FEED (FF),
+                        * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
+                        *
+                        * Parse error: ignore the token.
+                        */
+                       case '#text':
+                               $text = $this->get_modifiable_text();
+                               if ( strlen( $text ) === strspn( $text, " \t\n\f\r" ) ) {
+                                       return $this->step();
+                               }
+                               goto before_html_anything_else;
+                               break;
+
+                       /*
+                        * > A start tag whose tag name is "html"
+                        */
+                       case '+HTML':
+                               $this->insert_html_element( $this->state->current_token );
+                               $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HEAD;
+                               return true;
+
+                       /*
+                        * > An end tag whose tag name is one of: "head", "body", "html", "br"
+                        *
+                        * Closing BR tags are always reported by the Tag Processor as opening tags.
+                        */
+                       case '-HEAD':
+                       case '-BODY':
+                       case '-HTML':
+                               /*
+                                * > Act as described in the "anything else" entry below.
+                                */
+                               goto before_html_anything_else;
+                               break;
+               }
+
+               /*
+                * > Any other end tag
+                */
+               if ( $is_closer ) {
+                       // Parse error: ignore the token.
+                       return $this->step();
+               }
+
+               /*
+                * > Anything else.
+                *
+                * > Create an html element whose node document is the Document object.
+                * > Append it to the Document object. Put this element in the stack of open elements.
+                * > Switch the insertion mode to "before head", then reprocess the token.
+                */
+               before_html_anything_else:
+               $this->insert_virtual_node( 'HTML' );
+               $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HEAD;
+               return $this->step( self::REPROCESS_CURRENT_NODE );
</ins><span class="cx" style="display: block; padding: 0 10px">         }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">        /**
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1031,7 +1180,86 @@
</span><span class="cx" style="display: block; padding: 0 10px">         * @return bool Whether an element was found.
</span><span class="cx" style="display: block; padding: 0 10px">         */
</span><span class="cx" style="display: block; padding: 0 10px">        private function step_before_head(): bool {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                $this->bail( 'No support for parsing in the ' . WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HEAD . ' state.' );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         $token_name = $this->get_token_name();
+               $token_type = $this->get_token_type();
+               $is_closer  = parent::is_tag_closer();
+               $op_sigil   = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : '';
+               $op         = "{$op_sigil}{$token_name}";
+
+               switch ( $op ) {
+                       /*
+                        * > A character token that is one of U+0009 CHARACTER TABULATION,
+                        * > U+000A LINE FEED (LF), U+000C FORM FEED (FF),
+                        * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
+                        *
+                        * Parse error: ignore the token.
+                        */
+                       case '#text':
+                               $text = $this->get_modifiable_text();
+                               if ( strlen( $text ) === strspn( $text, " \t\n\f\r" ) ) {
+                                       return $this->step();
+                               }
+                               goto before_head_anything_else;
+                               break;
+
+                       /*
+                        * > A comment token
+                        */
+                       case '#comment':
+                       case '#funky-comment':
+                       case '#presumptuous-tag':
+                               $this->insert_html_element( $this->state->current_token );
+                               return true;
+
+                       /*
+                        * > A DOCTYPE token
+                        */
+                       case 'html':
+                               // Parse error: ignore the token.
+                               return $this->step();
+
+                       /*
+                        * > A start tag whose tag name is "html"
+                        */
+                       case '+HTML':
+                               return $this->step_in_body();
+
+                       /*
+                        * > A start tag whose tag name is "head"
+                        */
+                       case '+HEAD':
+                               $this->insert_html_element( $this->state->current_token );
+                               $this->state->head_element   = $this->state->current_token;
+                               $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD;
+                               return true;
+
+                       /*
+                        * > An end tag whose tag name is one of: "head", "body", "html", "br"
+                        * > Act as described in the "anything else" entry below.
+                        *
+                        * Closing BR tags are always reported by the Tag Processor as opening tags.
+                        */
+                       case '-HEAD':
+                       case '-BODY':
+                       case '-HTML':
+                               goto before_head_anything_else;
+                               break;
+               }
+
+               if ( $is_closer ) {
+                       // Parse error: ignore the token.
+                       return $this->step();
+               }
+
+               /*
+                * > Anything else
+                *
+                * > Insert an HTML element for a "head" start tag token with no attributes.
+                */
+               before_head_anything_else:
+               $this->state->head_element   = $this->insert_virtual_node( 'HEAD' );
+               $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD;
+               return $this->step( self::REPROCESS_CURRENT_NODE );
</ins><span class="cx" style="display: block; padding: 0 10px">         }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">        /**
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1056,29 +1284,31 @@
</span><span class="cx" style="display: block; padding: 0 10px">                $op_sigil   = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : '';
</span><span class="cx" style="display: block; padding: 0 10px">                $op         = "{$op_sigil}{$token_name}";
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                /*
-                * > A character token that is one of U+0009 CHARACTER TABULATION,
-                * > U+000A LINE FEED (LF), U+000C FORM FEED (FF),
-                * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
-                */
-               if ( '#text' === $op ) {
-                       $text = $this->get_modifiable_text();
-                       if ( '' === $text ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         switch ( $op ) {
+                       case '#text':
</ins><span class="cx" style="display: block; padding: 0 10px">                                 /*
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                                 * If the text is empty after processing HTML entities and stripping
-                                * U+0000 NULL bytes then ignore the token.
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                          * > A character token that is one of U+0009 CHARACTER TABULATION,
+                                * > U+000A LINE FEED (LF), U+000C FORM FEED (FF),
+                                * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
</ins><span class="cx" style="display: block; padding: 0 10px">                                  */
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                                return $this->step();
-                       }
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                         $text = $this->get_modifiable_text();
+                               if ( '' === $text ) {
+                                       /*
+                                        * If the text is empty after processing HTML entities and stripping
+                                        * U+0000 NULL bytes then ignore the token.
+                                        */
+                                       return $this->step();
+                               }
</ins><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                        if ( strlen( $text ) === strspn( $text, " \t\n\f\r" ) ) {
-                               // Insert the character.
-                               $this->insert_html_element( $this->state->current_token );
-                               return true;
-                       }
-               }
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                         if ( strlen( $text ) === strspn( $text, " \t\n\f\r" ) ) {
+                                       // Insert the character.
+                                       $this->insert_html_element( $this->state->current_token );
+                                       return true;
+                               }
</ins><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                switch ( $op ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                         goto in_head_anything_else;
+                               break;
+
</ins><span class="cx" style="display: block; padding: 0 10px">                         /*
</span><span class="cx" style="display: block; padding: 0 10px">                         * > A comment token
</span><span class="cx" style="display: block; padding: 0 10px">                         */
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1124,7 +1354,7 @@
</span><span class="cx" style="display: block; padding: 0 10px">                                 * >     tentative, then change the encoding to the resulting encoding.
</span><span class="cx" style="display: block; padding: 0 10px">                                 */
</span><span class="cx" style="display: block; padding: 0 10px">                                $charset = $this->get_attribute( 'charset' );
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                                if ( is_string( $charset ) ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                         if ( is_string( $charset ) && 'tentative' === $this->state->encoding_confidence ) {
</ins><span class="cx" style="display: block; padding: 0 10px">                                         $this->bail( 'Cannot yet process META tags with charset to determine encoding.' );
</span><span class="cx" style="display: block; padding: 0 10px">                                }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1141,7 +1371,8 @@
</span><span class="cx" style="display: block; padding: 0 10px">                                if (
</span><span class="cx" style="display: block; padding: 0 10px">                                        is_string( $http_equiv ) &&
</span><span class="cx" style="display: block; padding: 0 10px">                                        is_string( $content ) &&
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                                        0 === strcasecmp( $http_equiv, 'Content-Type' )
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                                 0 === strcasecmp( $http_equiv, 'Content-Type' ) &&
+                                       'tentative' === $this->state->encoding_confidence
</ins><span class="cx" style="display: block; padding: 0 10px">                                 ) {
</span><span class="cx" style="display: block; padding: 0 10px">                                        $this->bail( 'Cannot yet process META tags with http-equiv Content-Type to determine encoding.' );
</span><span class="cx" style="display: block; padding: 0 10px">                                }
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1193,10 +1424,11 @@
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                        /*
</span><span class="cx" style="display: block; padding: 0 10px">                         * > An end tag whose tag name is one of: "body", "html", "br"
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                         *
+                        * BR tags are always reported by the Tag Processor as opening tags.
</ins><span class="cx" style="display: block; padding: 0 10px">                          */
</span><span class="cx" style="display: block; padding: 0 10px">                        case '-BODY':
</span><span class="cx" style="display: block; padding: 0 10px">                        case '-HTML':
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                        case '-BR':
</del><span class="cx" style="display: block; padding: 0 10px">                                 /*
</span><span class="cx" style="display: block; padding: 0 10px">                                 * > Act as described in the "anything else" entry below.
</span><span class="cx" style="display: block; padding: 0 10px">                                 */
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1273,7 +1505,92 @@
</span><span class="cx" style="display: block; padding: 0 10px">         * @return bool Whether an element was found.
</span><span class="cx" style="display: block; padding: 0 10px">         */
</span><span class="cx" style="display: block; padding: 0 10px">        private function step_in_head_noscript(): bool {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                $this->bail( 'No support for parsing in the ' . WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD_NOSCRIPT . ' state.' );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         $token_name = $this->get_token_name();
+               $token_type = $this->get_token_type();
+               $is_closer  = parent::is_tag_closer();
+               $op_sigil   = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : '';
+               $op         = "{$op_sigil}{$token_name}";
+
+               switch ( $op ) {
+                       /*
+                        * > A character token that is one of U+0009 CHARACTER TABULATION,
+                        * > U+000A LINE FEED (LF), U+000C FORM FEED (FF),
+                        * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
+                        *
+                        * Parse error: ignore the token.
+                        */
+                       case '#text':
+                               $text = $this->get_modifiable_text();
+                               if ( strlen( $text ) === strspn( $text, " \t\n\f\r" ) ) {
+                                       return $this->step_in_head();
+                               }
+
+                               goto in_head_noscript_anything_else;
+                               break;
+
+                       /*
+                        * > A DOCTYPE token
+                        */
+                       case 'html':
+                               // Parse error: ignore the token.
+                               return $this->step();
+
+                       /*
+                        * > A start tag whose tag name is "html"
+                        */
+                       case '+HTML':
+                               return $this->step_in_body();
+
+                       /*
+                        * > An end tag whose tag name is "noscript"
+                        */
+                       case '-NOSCRIPT':
+                               $this->state->stack_of_open_elements->pop();
+                               $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD;
+                               return true;
+
+                       /*
+                        * > A comment token
+                        * >
+                        * > A start tag whose tag name is one of: "basefont", "bgsound",
+                        * > "link", "meta", "noframes", "style"
+                        */
+                       case '#comment':
+                       case '#funky-comment':
+                       case '#presumptuous-tag':
+                       case '+BASEFONT':
+                       case '+BGSOUND':
+                       case '+LINK':
+                       case '+META':
+                       case '+NOFRAMES':
+                       case '+STYLE':
+                               return $this->step_in_head();
+
+                       /*
+                        * > An end tag whose tag name is "br"
+                        *
+                        * This should never happen, as the Tag Processor prevents showing a BR closing tag.
+                        */
+               }
+
+               /*
+                * > A start tag whose tag name is one of: "head", "noscript"
+                * > Any other end tag
+                */
+               if ( '+HEAD' === $op || '+NOSCRIPT' === $op || $is_closer ) {
+                       // Parse error: ignore the token.
+                       return $this->step();
+               }
+
+               /*
+                * > Anything else
+                *
+                * Anything here is a parse error.
+                */
+               in_head_noscript_anything_else:
+               $this->state->stack_of_open_elements->pop();
+               $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD;
+               return $this->step( self::REPROCESS_CURRENT_NODE );
</ins><span class="cx" style="display: block; padding: 0 10px">         }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">        /**
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1292,7 +1609,133 @@
</span><span class="cx" style="display: block; padding: 0 10px">         * @return bool Whether an element was found.
</span><span class="cx" style="display: block; padding: 0 10px">         */
</span><span class="cx" style="display: block; padding: 0 10px">        private function step_after_head(): bool {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                $this->bail( 'No support for parsing in the ' . WP_HTML_Processor_State::INSERTION_MODE_AFTER_HEAD . ' state.' );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         $token_name = $this->get_token_name();
+               $token_type = $this->get_token_type();
+               $is_closer  = parent::is_tag_closer();
+               $op_sigil   = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : '';
+               $op         = "{$op_sigil}{$token_name}";
+
+               switch ( $op ) {
+                       /*
+                        * > A character token that is one of U+0009 CHARACTER TABULATION,
+                        * > U+000A LINE FEED (LF), U+000C FORM FEED (FF),
+                        * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
+                        */
+                       case '#text':
+                               $text = $this->get_modifiable_text();
+                               if ( strlen( $text ) === strspn( $text, " \t\n\f\r" ) ) {
+                                       // Insert the character.
+                                       $this->insert_html_element( $this->state->current_token );
+                                       return true;
+                               }
+                               goto after_head_anything_else;
+                               break;
+
+                       /*
+                        * > A comment token
+                        */
+                       case '#comment':
+                       case '#funky-comment':
+                       case '#presumptuous-tag':
+                               $this->insert_html_element( $this->state->current_token );
+                               return true;
+
+                       /*
+                        * > A DOCTYPE token
+                        */
+                       case 'html':
+                               // Parse error: ignore the token.
+                               return $this->step();
+
+                       /*
+                        * > A start tag whose tag name is "html"
+                        */
+                       case '+HTML':
+                               return $this->step_in_body();
+
+                       /*
+                        * > A start tag whose tag name is "body"
+                        */
+                       case '+BODY':
+                               $this->insert_html_element( $this->state->current_token );
+                               $this->state->frameset_ok    = false;
+                               $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
+                               return true;
+
+                       /*
+                        * > A start tag whose tag name is "frameset"
+                        */
+                       case '+FRAMESET':
+                               $this->insert_html_element( $this->state->current_token );
+                               $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_FRAMESET;
+                               return true;
+
+                       /*
+                        * > A start tag whose tag name is one of: "base", "basefont", "bgsound",
+                        * > "link", "meta", "noframes", "script", "style", "template", "title"
+                        *
+                        * Anything here is a parse error.
+                        */
+                       case '+BASE':
+                       case '+BASEFONT':
+                       case '+BGSOUND':
+                       case '+LINK':
+                       case '+META':
+                       case '+NOFRAMES':
+                       case '+SCRIPT':
+                       case '+STYLE':
+                       case '+TEMPLATE':
+                       case '+TITLE':
+                               /*
+                                * > Push the node pointed to by the head element pointer onto the stack of open elements.
+                                * > Process the token using the rules for the "in head" insertion mode.
+                                * > Remove the node pointed to by the head element pointer from the stack of open elements. (It might not be the current node at this point.)
+                                */
+                               $this->bail( 'Cannot process elements after HEAD which reopen the HEAD element.' );
+                               /*
+                                * Do not leave this break in when adding support; it's here to prevent
+                                * WPCS from getting confused at the switch structure without a return,
+                                * because it doesn't know that `bail()` always throws.
+                                */
+                               break;
+
+                       /*
+                        * > An end tag whose tag name is "template"
+                        */
+                       case '-TEMPLATE':
+                               return $this->step_in_head();
+
+                       /*
+                        * > An end tag whose tag name is one of: "body", "html", "br"
+                        *
+                        * Closing BR tags are always reported by the Tag Processor as opening tags.
+                        */
+                       case '-BODY':
+                       case '-HTML':
+                               /*
+                                * > Act as described in the "anything else" entry below.
+                                */
+                               goto after_head_anything_else;
+                               break;
+               }
+
+               /*
+                * > A start tag whose tag name is "head"
+                * > Any other end tag
+                */
+               if ( '+HEAD' === $op || $is_closer ) {
+                       // Parse error: ignore the token.
+                       return $this->step();
+               }
+
+               /*
+                * > Anything else
+                * > Insert an HTML element for a "body" start tag token with no attributes.
+                */
+               after_head_anything_else:
+               $this->insert_virtual_node( 'BODY' );
+               $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
+               return $this->step( self::REPROCESS_CURRENT_NODE );
</ins><span class="cx" style="display: block; padding: 0 10px">         }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">        /**
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -4469,14 +4912,17 @@
</span><span class="cx" style="display: block; padding: 0 10px">         * @param string      $token_name    Name of token to create and insert into the stack of open elements.
</span><span class="cx" style="display: block; padding: 0 10px">         * @param string|null $bookmark_name Optional. Name to give bookmark for created virtual node.
</span><span class="cx" style="display: block; padding: 0 10px">         *                                   Defaults to auto-creating a bookmark name.
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         * @return WP_HTML_Token Newly-created virtual token.
</ins><span class="cx" style="display: block; padding: 0 10px">          */
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-        private function insert_virtual_node( $token_name, $bookmark_name = null ): void {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ private function insert_virtual_node( $token_name, $bookmark_name = null ): WP_HTML_Token {
</ins><span class="cx" style="display: block; padding: 0 10px">                 $here = $this->bookmarks[ $this->state->current_token->bookmark_name ];
</span><span class="cx" style="display: block; padding: 0 10px">                $name = $bookmark_name ?? $this->bookmark_token();
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                $this->bookmarks[ $name ] = new WP_HTML_Span( $here->start, 0 );
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                $this->insert_html_element( new WP_HTML_Token( $name, $token_name, false ) );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         $token = new WP_HTML_Token( $name, $token_name, false );
+               $this->insert_html_element( $token );
+               return $token;
</ins><span class="cx" style="display: block; padding: 0 10px">         }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">        /*
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -4633,6 +5079,53 @@
</span><span class="cx" style="display: block; padding: 0 10px">                );
</span><span class="cx" style="display: block; padding: 0 10px">        }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+        /**
+        * Gets an encoding from a given string.
+        *
+        * This is an algorithm defined in the WHAT-WG specification.
+        *
+        * Example:
+        *
+        *     'UTF-8' === self::get_encoding( 'utf8' );
+        *     'UTF-8' === self::get_encoding( "  \tUTF-8 " );
+        *     null    === self::get_encoding( 'UTF-7' );
+        *     null    === self::get_encoding( 'utf8; charset=' );
+        *
+        * @see https://encoding.spec.whatwg.org/#concept-encoding-get
+        *
+        * @todo As this parser only supports UTF-8, only the UTF-8
+        *       encodings are detected. Add more as desired, but the
+        *       parser will bail on non-UTF-8 encodings.
+        *
+        * @since 6.7.0
+        *
+        * @param string $label A string which may specify a known encoding.
+        * @return string|null Known encoding if matched, otherwise null.
+        */
+       protected static function get_encoding( string $label ): ?string {
+               /*
+                * > Remove any leading and trailing ASCII whitespace from label.
+                */
+               $label = trim( $label, " \t\f\r\n" );
+
+               /*
+                * > If label is an ASCII case-insensitive match for any of the labels listed in the
+                * > table below, then return the corresponding encoding; otherwise return failure.
+                */
+               switch ( strtolower( $label ) ) {
+                       case 'unicode-1-1-utf-8':
+                       case 'unicode11utf8':
+                       case 'unicode20utf8':
+                       case 'utf-8':
+                       case 'utf8':
+                       case 'x-unicode20utf8':
+                               return 'UTF-8';
+
+                       default:
+                               return null;
+               }
+       }
+
</ins><span class="cx" style="display: block; padding: 0 10px">         /*
</span><span class="cx" style="display: block; padding: 0 10px">         * Constants that would pollute the top of the class if they were found there.
</span><span class="cx" style="display: block; padding: 0 10px">         */
</span></span></pre></div>
<a id="trunktestsphpunittestshtmlapiwpHtmlProcessorBreadcrumbsphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/tests/phpunit/tests/html-api/wpHtmlProcessorBreadcrumbs.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/tests/phpunit/tests/html-api/wpHtmlProcessorBreadcrumbs.php 2024-07-31 14:03:24 UTC (rev 58835)
+++ trunk/tests/phpunit/tests/html-api/wpHtmlProcessorBreadcrumbs.php   2024-07-31 16:54:23 UTC (rev 58836)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -25,7 +25,7 @@
</span><span class="cx" style="display: block; padding: 0 10px">        public function test_navigates_into_normative_html_for_supported_elements( $html, $tag_name ) {
</span><span class="cx" style="display: block; padding: 0 10px">                $processor = WP_HTML_Processor::create_fragment( $html );
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                $this->assertTrue( $processor->step(), "Failed to step into supported {$tag_name} element." );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         $this->assertTrue( $processor->next_token(), "Failed to step into supported {$tag_name} element." );
</ins><span class="cx" style="display: block; padding: 0 10px">                 $this->assertSame( $tag_name, $processor->get_tag(), "Misread {$tag_name} as a {$processor->get_tag()} element." );
</span><span class="cx" style="display: block; padding: 0 10px">        }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -90,6 +90,7 @@
</span><span class="cx" style="display: block; padding: 0 10px">                        'IMG',
</span><span class="cx" style="display: block; padding: 0 10px">                        'INS',
</span><span class="cx" style="display: block; padding: 0 10px">                        'LI',
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                        'LINK',
</ins><span class="cx" style="display: block; padding: 0 10px">                         'ISINDEX', // Deprecated.
</span><span class="cx" style="display: block; padding: 0 10px">                        'KBD',
</span><span class="cx" style="display: block; padding: 0 10px">                        'KEYGEN', // Deprecated.
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -108,6 +109,8 @@
</span><span class="cx" style="display: block; padding: 0 10px">                        'NAV',
</span><span class="cx" style="display: block; padding: 0 10px">                        'NEXTID', // Deprecated.
</span><span class="cx" style="display: block; padding: 0 10px">                        'NOBR', // Neutralized.
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                        'NOEMBED', // Neutralized.
+                       'NOFRAMES', // Neutralized.
</ins><span class="cx" style="display: block; padding: 0 10px">                         'NOSCRIPT',
</span><span class="cx" style="display: block; padding: 0 10px">                        'OBJECT',
</span><span class="cx" style="display: block; padding: 0 10px">                        'OL',
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -122,6 +125,7 @@
</span><span class="cx" style="display: block; padding: 0 10px">                        'RTC', // Neutralized.
</span><span class="cx" style="display: block; padding: 0 10px">                        'RUBY',
</span><span class="cx" style="display: block; padding: 0 10px">                        'SAMP',
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                        'SCRIPT',
</ins><span class="cx" style="display: block; padding: 0 10px">                         'SEARCH',
</span><span class="cx" style="display: block; padding: 0 10px">                        'SECTION',
</span><span class="cx" style="display: block; padding: 0 10px">                        'SLOT',
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -130,21 +134,29 @@
</span><span class="cx" style="display: block; padding: 0 10px">                        'SPAN',
</span><span class="cx" style="display: block; padding: 0 10px">                        'STRIKE',
</span><span class="cx" style="display: block; padding: 0 10px">                        'STRONG',
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                        'STYLE',
</ins><span class="cx" style="display: block; padding: 0 10px">                         'SUB',
</span><span class="cx" style="display: block; padding: 0 10px">                        'SUMMARY',
</span><span class="cx" style="display: block; padding: 0 10px">                        'SUP',
</span><span class="cx" style="display: block; padding: 0 10px">                        'TABLE',
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                        'TEXTAREA',
</ins><span class="cx" style="display: block; padding: 0 10px">                         'TIME',
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                        'TITLE',
</ins><span class="cx" style="display: block; padding: 0 10px">                         'TT',
</span><span class="cx" style="display: block; padding: 0 10px">                        'U',
</span><span class="cx" style="display: block; padding: 0 10px">                        'UL',
</span><span class="cx" style="display: block; padding: 0 10px">                        'VAR',
</span><span class="cx" style="display: block; padding: 0 10px">                        'VIDEO',
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                        'XMP', // Deprecated, use PRE instead.
</ins><span class="cx" style="display: block; padding: 0 10px">                 );
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                $data = array();
</span><span class="cx" style="display: block; padding: 0 10px">                foreach ( $supported_elements as $tag_name ) {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                        $data[ $tag_name ] = array( "<{$tag_name}>", $tag_name );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                 $closer = in_array( $tag_name, array( 'NOEMBED', 'NOFRAMES', 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE', 'XMP' ), true )
+                               ? "</{$tag_name}>"
+                               : '';
+
+                       $data[ $tag_name ] = array( "<{$tag_name}>{$closer}", $tag_name );
</ins><span class="cx" style="display: block; padding: 0 10px">                 }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                $data['IMAGE (treated as an IMG)'] = array( '<image>', 'IMG' );
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -182,22 +194,9 @@
</span><span class="cx" style="display: block; padding: 0 10px">         */
</span><span class="cx" style="display: block; padding: 0 10px">        public static function data_unsupported_elements() {
</span><span class="cx" style="display: block; padding: 0 10px">                $unsupported_elements = array(
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                        'BODY',
-                       'FRAME',
-                       'FRAMESET',
-                       'HEAD',
-                       'HTML',
-                       'IFRAME',
</del><span class="cx" style="display: block; padding: 0 10px">                         'MATH',
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                        'NOEMBED', // Neutralized.
-                       'NOFRAMES', // Neutralized.
</del><span class="cx" style="display: block; padding: 0 10px">                         'PLAINTEXT', // Neutralized.
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                        'SCRIPT',
-                       'STYLE',
</del><span class="cx" style="display: block; padding: 0 10px">                         'SVG',
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                        'TEXTAREA',
-                       'TITLE',
-                       'XMP', // Deprecated, use PRE instead.
</del><span class="cx" style="display: block; padding: 0 10px">                 );
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                $data = array();
</span></span></pre>
</div>
</div>

</body>
</html>