<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[58836] trunk: HTML API: Introduce full parsing mode in HTML Processor.</title>
</head>
<body>
<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; }
#msg dl a { font-weight: bold}
#msg dl a:link { color:#fc3; }
#msg dl a:active { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { white-space: pre-line; overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta" style="font-size: 105%">
<dt style="float: left; width: 6em; font-weight: bold">Revision</dt> <dd><a style="font-weight: bold" href="https://core.trac.wordpress.org/changeset/58836">58836</a><script type="application/ld+json">{"@context":"http://schema.org","@type":"EmailMessage","description":"Review this Commit","action":{"@type":"ViewAction","url":"https://core.trac.wordpress.org/changeset/58836","name":"Review Commit"}}</script></dd>
<dt style="float: left; width: 6em; font-weight: bold">Author</dt> <dd>dmsnell</dd>
<dt style="float: left; width: 6em; font-weight: bold">Date</dt> <dd>2024-07-31 16:54:23 +0000 (Wed, 31 Jul 2024)</dd>
</dl>
<pre style='padding-left: 1em; margin: 2em 0; border-left: 2px solid #ccc; line-height: 1.25; font-size: 105%; font-family: sans-serif'>HTML API: Introduce full parsing mode in HTML Processor.
The HTML Processor has only supported a specific kind of parsing mode
called _the fragment parsing mode_, where it behaves in the same way
that `node.innerHTML = html` does in the DOM. This mode assumes a
context node and doesn't support parsing an entire document.
As part of work to add more spec support to the HTML API, this patch
introduces a full parsing mode, which can parse a full HTML document
from start to end, including the doctype declaration and head tags.
Developed in https://github.com/wordpress/wordpress-develop/pull/6977
Discussed in https://core.trac.wordpress.org/ticket/61576
Props: dmsnell, jonsurrell.
See <a href="https://core.trac.wordpress.org/ticket/61576">#61576</a>.</pre>
<h3>Modified Paths</h3>
<ul>
<li><a href="#trunksrcwpincludeshtmlapiclasswphtmlprocessorstatephp">trunk/src/wp-includes/html-api/class-wp-html-processor-state.php</a></li>
<li><a href="#trunksrcwpincludeshtmlapiclasswphtmlprocessorphp">trunk/src/wp-includes/html-api/class-wp-html-processor.php</a></li>
<li><a href="#trunktestsphpunittestshtmlapiwpHtmlProcessorBreadcrumbsphp">trunk/tests/phpunit/tests/html-api/wpHtmlProcessorBreadcrumbs.php</a></li>
</ul>
</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunksrcwpincludeshtmlapiclasswphtmlprocessorstatephp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/src/wp-includes/html-api/class-wp-html-processor-state.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/src/wp-includes/html-api/class-wp-html-processor-state.php 2024-07-31 14:03:24 UTC (rev 58835)
+++ trunk/src/wp-includes/html-api/class-wp-html-processor-state.php 2024-07-31 16:54:23 UTC (rev 58836)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -429,6 +429,38 @@
</span><span class="cx" style="display: block; padding: 0 10px"> public $context_node = null;
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * The recognized encoding of the input byte stream.
+ *
+ * > The stream of code points that comprises the input to the tokenization
+ * > stage will be initially seen by the user agent as a stream of bytes
+ * > (typically coming over the network or from the local file system).
+ * > The bytes encode the actual characters according to a particular character
+ * > encoding, which the user agent uses to decode the bytes into characters.
+ *
+ * @since 6.7.0
+ *
+ * @var string|null
+ */
+ public $encoding = null;
+
+ /**
+ * The parser's confidence in the input encoding.
+ *
+ * > When the HTML parser is decoding an input byte stream, it uses a character
+ * > encoding and a confidence. The confidence is either tentative, certain, or
+ * > irrelevant. The encoding used, and whether the confidence in that encoding
+ * > is tentative or certain, is used during the parsing to determine whether to
+ * > change the encoding. If no encoding is necessary, e.g. because the parser is
+ * > operating on a Unicode stream and doesn't have to use a character encoding
+ * > at all, then the confidence is irrelevant.
+ *
+ * @since 6.7.0
+ *
+ * @var string
+ */
+ public $encoding_confidence = 'tentative';
+
+ /**
</ins><span class="cx" style="display: block; padding: 0 10px"> * HEAD element pointer.
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><span class="cx" style="display: block; padding: 0 10px"> * @since 6.7.0
</span></span></pre></div>
<a id="trunksrcwpincludeshtmlapiclasswphtmlprocessorphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/src/wp-includes/html-api/class-wp-html-processor.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/src/wp-includes/html-api/class-wp-html-processor.php 2024-07-31 14:03:24 UTC (rev 58835)
+++ trunk/src/wp-includes/html-api/class-wp-html-processor.php 2024-07-31 16:54:23 UTC (rev 58836)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -256,21 +256,6 @@
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> private $context_node = null;
</span><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- /**
- * Whether the parser has yet processed the context node,
- * if created as a fragment parser.
- *
- * The context node will be initially pushed onto the stack of open elements,
- * but when created as a fragment parser, this context element (and the implicit
- * HTML document node above it) should not be exposed as a matched token or node.
- *
- * This boolean indicates whether the processor should skip over the current
- * node in its initial search for the first node created from the input HTML.
- *
- * @var bool
- */
- private $has_seen_context_node = false;
-
</del><span class="cx" style="display: block; padding: 0 10px"> /*
</span><span class="cx" style="display: block; padding: 0 10px"> * Public Interface Functions
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -312,9 +297,11 @@
</span><span class="cx" style="display: block; padding: 0 10px"> return null;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $processor = new static( $html, self::CONSTRUCTOR_UNLOCK_CODE );
- $processor->state->context_node = array( 'BODY', array() );
- $processor->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $processor = new static( $html, self::CONSTRUCTOR_UNLOCK_CODE );
+ $processor->state->context_node = array( 'BODY', array() );
+ $processor->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
+ $processor->state->encoding = $encoding;
+ $processor->state->encoding_confidence = 'certain';
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> // @todo Create "fake" bookmarks for non-existent but implied nodes.
</span><span class="cx" style="display: block; padding: 0 10px"> $processor->bookmarks['root-node'] = new WP_HTML_Span( 0, 0 );
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -341,6 +328,34 @@
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * Creates an HTML processor in the full parsing mode.
+ *
+ * It's likely that a fragment parser is more appropriate, unless sending an
+ * entire HTML document from start to finish. Consider a fragment parser with
+ * a context node of `<body>`.
+ *
+ * Since UTF-8 is the only currently-accepted charset, if working with a
+ * document that isn't UTF-8, it's important to convert the document before
+ * creating the processor: pass in the converted HTML.
+ *
+ * @param string $html Input HTML document to process.
+ * @param string|null $known_definite_encoding Optional. If provided, specifies the charset used
+ * in the input byte stream. Currently must be UTF-8.
+ * @return static|null The created processor if successful, otherwise null.
+ */
+ public static function create_full_parser( $html, $known_definite_encoding = 'UTF-8' ) {
+ if ( 'UTF-8' !== $known_definite_encoding ) {
+ return null;
+ }
+
+ $processor = new static( $html, self::CONSTRUCTOR_UNLOCK_CODE );
+ $processor->state->encoding = $known_definite_encoding;
+ $processor->state->encoding_confidence = 'certain';
+
+ return $processor;
+ }
+
+ /**
</ins><span class="cx" style="display: block; padding: 0 10px"> * Constructor.
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><span class="cx" style="display: block; padding: 0 10px"> * Do not use this method. Use the static creator methods instead.
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -993,7 +1008,62 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * @return bool Whether an element was found.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> private function step_initial(): bool {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $this->bail( 'No support for parsing in the ' . WP_HTML_Processor_State::INSERTION_MODE_INITIAL . ' state.' );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $token_name = $this->get_token_name();
+ $token_type = $this->get_token_type();
+ $op_sigil = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : '';
+ $op = "{$op_sigil}{$token_name}";
+
+ switch ( $op ) {
+ /*
+ * > A character token that is one of U+0009 CHARACTER TABULATION,
+ * > U+000A LINE FEED (LF), U+000C FORM FEED (FF),
+ * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
+ *
+ * Parse error: ignore the token.
+ */
+ case '#text':
+ $text = $this->get_modifiable_text();
+ if ( strlen( $text ) === strspn( $text, " \t\n\f\r" ) ) {
+ return $this->step();
+ }
+ goto initial_anything_else;
+ break;
+
+ /*
+ * > A comment token
+ */
+ case '#comment':
+ case '#funky-comment':
+ case '#presumptuous-tag':
+ $this->insert_html_element( $this->state->current_token );
+ return true;
+
+ /*
+ * > A DOCTYPE token
+ */
+ case 'html':
+ $contents = $this->get_modifiable_text();
+ if ( ' html' !== $contents ) {
+ /*
+ * @todo When the HTML Tag Processor fully parses the DOCTYPE declaration,
+ * this code should examine the contents to set the compatability mode.
+ */
+ $this->bail( 'Cannot process any DOCTYPE other than a normative HTML5 doctype.' );
+ }
+
+ /*
+ * > Then, switch the insertion mode to "before html".
+ */
+ $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HTML;
+ return true;
+ }
+
+ /*
+ * > Anything else
+ */
+ initial_anything_else:
+ $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HTML;
+ return $this->step( self::REPROCESS_CURRENT_NODE );
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1002,7 +1072,7 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * This internal function performs the 'before html' insertion mode
</span><span class="cx" style="display: block; padding: 0 10px"> * logic for the generalized WP_HTML_Processor::step() function.
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- * @since 6.7.0 Stub implementation.
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * @since 6.7.0
</ins><span class="cx" style="display: block; padding: 0 10px"> *
</span><span class="cx" style="display: block; padding: 0 10px"> * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1012,7 +1082,86 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * @return bool Whether an element was found.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> private function step_before_html(): bool {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $this->bail( 'No support for parsing in the ' . WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HTML . ' state.' );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $token_name = $this->get_token_name();
+ $token_type = $this->get_token_type();
+ $is_closer = parent::is_tag_closer();
+ $op_sigil = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : '';
+ $op = "{$op_sigil}{$token_name}";
+
+ switch ( $op ) {
+ /*
+ * > A DOCTYPE token
+ */
+ case 'html':
+ // Parse error: ignore the token.
+ return $this->step();
+
+ /*
+ * > A comment token
+ */
+ case '#comment':
+ case '#funky-comment':
+ case '#presumptuous-tag':
+ $this->insert_html_element( $this->state->current_token );
+ return true;
+
+ /*
+ * > A character token that is one of U+0009 CHARACTER TABULATION,
+ * > U+000A LINE FEED (LF), U+000C FORM FEED (FF),
+ * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
+ *
+ * Parse error: ignore the token.
+ */
+ case '#text':
+ $text = $this->get_modifiable_text();
+ if ( strlen( $text ) === strspn( $text, " \t\n\f\r" ) ) {
+ return $this->step();
+ }
+ goto before_html_anything_else;
+ break;
+
+ /*
+ * > A start tag whose tag name is "html"
+ */
+ case '+HTML':
+ $this->insert_html_element( $this->state->current_token );
+ $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HEAD;
+ return true;
+
+ /*
+ * > An end tag whose tag name is one of: "head", "body", "html", "br"
+ *
+ * Closing BR tags are always reported by the Tag Processor as opening tags.
+ */
+ case '-HEAD':
+ case '-BODY':
+ case '-HTML':
+ /*
+ * > Act as described in the "anything else" entry below.
+ */
+ goto before_html_anything_else;
+ break;
+ }
+
+ /*
+ * > Any other end tag
+ */
+ if ( $is_closer ) {
+ // Parse error: ignore the token.
+ return $this->step();
+ }
+
+ /*
+ * > Anything else.
+ *
+ * > Create an html element whose node document is the Document object.
+ * > Append it to the Document object. Put this element in the stack of open elements.
+ * > Switch the insertion mode to "before head", then reprocess the token.
+ */
+ before_html_anything_else:
+ $this->insert_virtual_node( 'HTML' );
+ $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HEAD;
+ return $this->step( self::REPROCESS_CURRENT_NODE );
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1031,7 +1180,86 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * @return bool Whether an element was found.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> private function step_before_head(): bool {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $this->bail( 'No support for parsing in the ' . WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HEAD . ' state.' );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $token_name = $this->get_token_name();
+ $token_type = $this->get_token_type();
+ $is_closer = parent::is_tag_closer();
+ $op_sigil = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : '';
+ $op = "{$op_sigil}{$token_name}";
+
+ switch ( $op ) {
+ /*
+ * > A character token that is one of U+0009 CHARACTER TABULATION,
+ * > U+000A LINE FEED (LF), U+000C FORM FEED (FF),
+ * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
+ *
+ * Parse error: ignore the token.
+ */
+ case '#text':
+ $text = $this->get_modifiable_text();
+ if ( strlen( $text ) === strspn( $text, " \t\n\f\r" ) ) {
+ return $this->step();
+ }
+ goto before_head_anything_else;
+ break;
+
+ /*
+ * > A comment token
+ */
+ case '#comment':
+ case '#funky-comment':
+ case '#presumptuous-tag':
+ $this->insert_html_element( $this->state->current_token );
+ return true;
+
+ /*
+ * > A DOCTYPE token
+ */
+ case 'html':
+ // Parse error: ignore the token.
+ return $this->step();
+
+ /*
+ * > A start tag whose tag name is "html"
+ */
+ case '+HTML':
+ return $this->step_in_body();
+
+ /*
+ * > A start tag whose tag name is "head"
+ */
+ case '+HEAD':
+ $this->insert_html_element( $this->state->current_token );
+ $this->state->head_element = $this->state->current_token;
+ $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD;
+ return true;
+
+ /*
+ * > An end tag whose tag name is one of: "head", "body", "html", "br"
+ * > Act as described in the "anything else" entry below.
+ *
+ * Closing BR tags are always reported by the Tag Processor as opening tags.
+ */
+ case '-HEAD':
+ case '-BODY':
+ case '-HTML':
+ goto before_head_anything_else;
+ break;
+ }
+
+ if ( $is_closer ) {
+ // Parse error: ignore the token.
+ return $this->step();
+ }
+
+ /*
+ * > Anything else
+ *
+ * > Insert an HTML element for a "head" start tag token with no attributes.
+ */
+ before_head_anything_else:
+ $this->state->head_element = $this->insert_virtual_node( 'HEAD' );
+ $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD;
+ return $this->step( self::REPROCESS_CURRENT_NODE );
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1056,29 +1284,31 @@
</span><span class="cx" style="display: block; padding: 0 10px"> $op_sigil = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : '';
</span><span class="cx" style="display: block; padding: 0 10px"> $op = "{$op_sigil}{$token_name}";
</span><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- /*
- * > A character token that is one of U+0009 CHARACTER TABULATION,
- * > U+000A LINE FEED (LF), U+000C FORM FEED (FF),
- * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
- */
- if ( '#text' === $op ) {
- $text = $this->get_modifiable_text();
- if ( '' === $text ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ switch ( $op ) {
+ case '#text':
</ins><span class="cx" style="display: block; padding: 0 10px"> /*
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- * If the text is empty after processing HTML entities and stripping
- * U+0000 NULL bytes then ignore the token.
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * > A character token that is one of U+0009 CHARACTER TABULATION,
+ * > U+000A LINE FEED (LF), U+000C FORM FEED (FF),
+ * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
</ins><span class="cx" style="display: block; padding: 0 10px"> */
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- return $this->step();
- }
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $text = $this->get_modifiable_text();
+ if ( '' === $text ) {
+ /*
+ * If the text is empty after processing HTML entities and stripping
+ * U+0000 NULL bytes then ignore the token.
+ */
+ return $this->step();
+ }
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( strlen( $text ) === strspn( $text, " \t\n\f\r" ) ) {
- // Insert the character.
- $this->insert_html_element( $this->state->current_token );
- return true;
- }
- }
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( strlen( $text ) === strspn( $text, " \t\n\f\r" ) ) {
+ // Insert the character.
+ $this->insert_html_element( $this->state->current_token );
+ return true;
+ }
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- switch ( $op ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ goto in_head_anything_else;
+ break;
+
</ins><span class="cx" style="display: block; padding: 0 10px"> /*
</span><span class="cx" style="display: block; padding: 0 10px"> * > A comment token
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1124,7 +1354,7 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * > tentative, then change the encoding to the resulting encoding.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> $charset = $this->get_attribute( 'charset' );
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( is_string( $charset ) ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( is_string( $charset ) && 'tentative' === $this->state->encoding_confidence ) {
</ins><span class="cx" style="display: block; padding: 0 10px"> $this->bail( 'Cannot yet process META tags with charset to determine encoding.' );
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1141,7 +1371,8 @@
</span><span class="cx" style="display: block; padding: 0 10px"> if (
</span><span class="cx" style="display: block; padding: 0 10px"> is_string( $http_equiv ) &&
</span><span class="cx" style="display: block; padding: 0 10px"> is_string( $content ) &&
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- 0 === strcasecmp( $http_equiv, 'Content-Type' )
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ 0 === strcasecmp( $http_equiv, 'Content-Type' ) &&
+ 'tentative' === $this->state->encoding_confidence
</ins><span class="cx" style="display: block; padding: 0 10px"> ) {
</span><span class="cx" style="display: block; padding: 0 10px"> $this->bail( 'Cannot yet process META tags with http-equiv Content-Type to determine encoding.' );
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1193,10 +1424,11 @@
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /*
</span><span class="cx" style="display: block; padding: 0 10px"> * > An end tag whose tag name is one of: "body", "html", "br"
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ *
+ * BR tags are always reported by the Tag Processor as opening tags.
</ins><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> case '-BODY':
</span><span class="cx" style="display: block; padding: 0 10px"> case '-HTML':
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- case '-BR':
</del><span class="cx" style="display: block; padding: 0 10px"> /*
</span><span class="cx" style="display: block; padding: 0 10px"> * > Act as described in the "anything else" entry below.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1273,7 +1505,92 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * @return bool Whether an element was found.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> private function step_in_head_noscript(): bool {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $this->bail( 'No support for parsing in the ' . WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD_NOSCRIPT . ' state.' );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $token_name = $this->get_token_name();
+ $token_type = $this->get_token_type();
+ $is_closer = parent::is_tag_closer();
+ $op_sigil = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : '';
+ $op = "{$op_sigil}{$token_name}";
+
+ switch ( $op ) {
+ /*
+ * > A character token that is one of U+0009 CHARACTER TABULATION,
+ * > U+000A LINE FEED (LF), U+000C FORM FEED (FF),
+ * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
+ *
+ * Parse error: ignore the token.
+ */
+ case '#text':
+ $text = $this->get_modifiable_text();
+ if ( strlen( $text ) === strspn( $text, " \t\n\f\r" ) ) {
+ return $this->step_in_head();
+ }
+
+ goto in_head_noscript_anything_else;
+ break;
+
+ /*
+ * > A DOCTYPE token
+ */
+ case 'html':
+ // Parse error: ignore the token.
+ return $this->step();
+
+ /*
+ * > A start tag whose tag name is "html"
+ */
+ case '+HTML':
+ return $this->step_in_body();
+
+ /*
+ * > An end tag whose tag name is "noscript"
+ */
+ case '-NOSCRIPT':
+ $this->state->stack_of_open_elements->pop();
+ $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD;
+ return true;
+
+ /*
+ * > A comment token
+ * >
+ * > A start tag whose tag name is one of: "basefont", "bgsound",
+ * > "link", "meta", "noframes", "style"
+ */
+ case '#comment':
+ case '#funky-comment':
+ case '#presumptuous-tag':
+ case '+BASEFONT':
+ case '+BGSOUND':
+ case '+LINK':
+ case '+META':
+ case '+NOFRAMES':
+ case '+STYLE':
+ return $this->step_in_head();
+
+ /*
+ * > An end tag whose tag name is "br"
+ *
+ * This should never happen, as the Tag Processor prevents showing a BR closing tag.
+ */
+ }
+
+ /*
+ * > A start tag whose tag name is one of: "head", "noscript"
+ * > Any other end tag
+ */
+ if ( '+HEAD' === $op || '+NOSCRIPT' === $op || $is_closer ) {
+ // Parse error: ignore the token.
+ return $this->step();
+ }
+
+ /*
+ * > Anything else
+ *
+ * Anything here is a parse error.
+ */
+ in_head_noscript_anything_else:
+ $this->state->stack_of_open_elements->pop();
+ $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD;
+ return $this->step( self::REPROCESS_CURRENT_NODE );
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1292,7 +1609,133 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * @return bool Whether an element was found.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> private function step_after_head(): bool {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $this->bail( 'No support for parsing in the ' . WP_HTML_Processor_State::INSERTION_MODE_AFTER_HEAD . ' state.' );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $token_name = $this->get_token_name();
+ $token_type = $this->get_token_type();
+ $is_closer = parent::is_tag_closer();
+ $op_sigil = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : '';
+ $op = "{$op_sigil}{$token_name}";
+
+ switch ( $op ) {
+ /*
+ * > A character token that is one of U+0009 CHARACTER TABULATION,
+ * > U+000A LINE FEED (LF), U+000C FORM FEED (FF),
+ * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
+ */
+ case '#text':
+ $text = $this->get_modifiable_text();
+ if ( strlen( $text ) === strspn( $text, " \t\n\f\r" ) ) {
+ // Insert the character.
+ $this->insert_html_element( $this->state->current_token );
+ return true;
+ }
+ goto after_head_anything_else;
+ break;
+
+ /*
+ * > A comment token
+ */
+ case '#comment':
+ case '#funky-comment':
+ case '#presumptuous-tag':
+ $this->insert_html_element( $this->state->current_token );
+ return true;
+
+ /*
+ * > A DOCTYPE token
+ */
+ case 'html':
+ // Parse error: ignore the token.
+ return $this->step();
+
+ /*
+ * > A start tag whose tag name is "html"
+ */
+ case '+HTML':
+ return $this->step_in_body();
+
+ /*
+ * > A start tag whose tag name is "body"
+ */
+ case '+BODY':
+ $this->insert_html_element( $this->state->current_token );
+ $this->state->frameset_ok = false;
+ $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
+ return true;
+
+ /*
+ * > A start tag whose tag name is "frameset"
+ */
+ case '+FRAMESET':
+ $this->insert_html_element( $this->state->current_token );
+ $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_FRAMESET;
+ return true;
+
+ /*
+ * > A start tag whose tag name is one of: "base", "basefont", "bgsound",
+ * > "link", "meta", "noframes", "script", "style", "template", "title"
+ *
+ * Anything here is a parse error.
+ */
+ case '+BASE':
+ case '+BASEFONT':
+ case '+BGSOUND':
+ case '+LINK':
+ case '+META':
+ case '+NOFRAMES':
+ case '+SCRIPT':
+ case '+STYLE':
+ case '+TEMPLATE':
+ case '+TITLE':
+ /*
+ * > Push the node pointed to by the head element pointer onto the stack of open elements.
+ * > Process the token using the rules for the "in head" insertion mode.
+ * > Remove the node pointed to by the head element pointer from the stack of open elements. (It might not be the current node at this point.)
+ */
+ $this->bail( 'Cannot process elements after HEAD which reopen the HEAD element.' );
+ /*
+ * Do not leave this break in when adding support; it's here to prevent
+ * WPCS from getting confused at the switch structure without a return,
+ * because it doesn't know that `bail()` always throws.
+ */
+ break;
+
+ /*
+ * > An end tag whose tag name is "template"
+ */
+ case '-TEMPLATE':
+ return $this->step_in_head();
+
+ /*
+ * > An end tag whose tag name is one of: "body", "html", "br"
+ *
+ * Closing BR tags are always reported by the Tag Processor as opening tags.
+ */
+ case '-BODY':
+ case '-HTML':
+ /*
+ * > Act as described in the "anything else" entry below.
+ */
+ goto after_head_anything_else;
+ break;
+ }
+
+ /*
+ * > A start tag whose tag name is "head"
+ * > Any other end tag
+ */
+ if ( '+HEAD' === $op || $is_closer ) {
+ // Parse error: ignore the token.
+ return $this->step();
+ }
+
+ /*
+ * > Anything else
+ * > Insert an HTML element for a "body" start tag token with no attributes.
+ */
+ after_head_anything_else:
+ $this->insert_virtual_node( 'BODY' );
+ $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
+ return $this->step( self::REPROCESS_CURRENT_NODE );
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -4469,14 +4912,17 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * @param string $token_name Name of token to create and insert into the stack of open elements.
</span><span class="cx" style="display: block; padding: 0 10px"> * @param string|null $bookmark_name Optional. Name to give bookmark for created virtual node.
</span><span class="cx" style="display: block; padding: 0 10px"> * Defaults to auto-creating a bookmark name.
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * @return WP_HTML_Token Newly-created virtual token.
</ins><span class="cx" style="display: block; padding: 0 10px"> */
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- private function insert_virtual_node( $token_name, $bookmark_name = null ): void {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ private function insert_virtual_node( $token_name, $bookmark_name = null ): WP_HTML_Token {
</ins><span class="cx" style="display: block; padding: 0 10px"> $here = $this->bookmarks[ $this->state->current_token->bookmark_name ];
</span><span class="cx" style="display: block; padding: 0 10px"> $name = $bookmark_name ?? $this->bookmark_token();
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> $this->bookmarks[ $name ] = new WP_HTML_Span( $here->start, 0 );
</span><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $this->insert_html_element( new WP_HTML_Token( $name, $token_name, false ) );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $token = new WP_HTML_Token( $name, $token_name, false );
+ $this->insert_html_element( $token );
+ return $token;
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /*
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -4633,6 +5079,53 @@
</span><span class="cx" style="display: block; padding: 0 10px"> );
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ /**
+ * Gets an encoding from a given string.
+ *
+ * This is an algorithm defined in the WHAT-WG specification.
+ *
+ * Example:
+ *
+ * 'UTF-8' === self::get_encoding( 'utf8' );
+ * 'UTF-8' === self::get_encoding( " \tUTF-8 " );
+ * null === self::get_encoding( 'UTF-7' );
+ * null === self::get_encoding( 'utf8; charset=' );
+ *
+ * @see https://encoding.spec.whatwg.org/#concept-encoding-get
+ *
+ * @todo As this parser only supports UTF-8, only the UTF-8
+ * encodings are detected. Add more as desired, but the
+ * parser will bail on non-UTF-8 encodings.
+ *
+ * @since 6.7.0
+ *
+ * @param string $label A string which may specify a known encoding.
+ * @return string|null Known encoding if matched, otherwise null.
+ */
+ protected static function get_encoding( string $label ): ?string {
+ /*
+ * > Remove any leading and trailing ASCII whitespace from label.
+ */
+ $label = trim( $label, " \t\f\r\n" );
+
+ /*
+ * > If label is an ASCII case-insensitive match for any of the labels listed in the
+ * > table below, then return the corresponding encoding; otherwise return failure.
+ */
+ switch ( strtolower( $label ) ) {
+ case 'unicode-1-1-utf-8':
+ case 'unicode11utf8':
+ case 'unicode20utf8':
+ case 'utf-8':
+ case 'utf8':
+ case 'x-unicode20utf8':
+ return 'UTF-8';
+
+ default:
+ return null;
+ }
+ }
+
</ins><span class="cx" style="display: block; padding: 0 10px"> /*
</span><span class="cx" style="display: block; padding: 0 10px"> * Constants that would pollute the top of the class if they were found there.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span></span></pre></div>
<a id="trunktestsphpunittestshtmlapiwpHtmlProcessorBreadcrumbsphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/tests/phpunit/tests/html-api/wpHtmlProcessorBreadcrumbs.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/tests/phpunit/tests/html-api/wpHtmlProcessorBreadcrumbs.php 2024-07-31 14:03:24 UTC (rev 58835)
+++ trunk/tests/phpunit/tests/html-api/wpHtmlProcessorBreadcrumbs.php 2024-07-31 16:54:23 UTC (rev 58836)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -25,7 +25,7 @@
</span><span class="cx" style="display: block; padding: 0 10px"> public function test_navigates_into_normative_html_for_supported_elements( $html, $tag_name ) {
</span><span class="cx" style="display: block; padding: 0 10px"> $processor = WP_HTML_Processor::create_fragment( $html );
</span><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $this->assertTrue( $processor->step(), "Failed to step into supported {$tag_name} element." );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $this->assertTrue( $processor->next_token(), "Failed to step into supported {$tag_name} element." );
</ins><span class="cx" style="display: block; padding: 0 10px"> $this->assertSame( $tag_name, $processor->get_tag(), "Misread {$tag_name} as a {$processor->get_tag()} element." );
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -90,6 +90,7 @@
</span><span class="cx" style="display: block; padding: 0 10px"> 'IMG',
</span><span class="cx" style="display: block; padding: 0 10px"> 'INS',
</span><span class="cx" style="display: block; padding: 0 10px"> 'LI',
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ 'LINK',
</ins><span class="cx" style="display: block; padding: 0 10px"> 'ISINDEX', // Deprecated.
</span><span class="cx" style="display: block; padding: 0 10px"> 'KBD',
</span><span class="cx" style="display: block; padding: 0 10px"> 'KEYGEN', // Deprecated.
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -108,6 +109,8 @@
</span><span class="cx" style="display: block; padding: 0 10px"> 'NAV',
</span><span class="cx" style="display: block; padding: 0 10px"> 'NEXTID', // Deprecated.
</span><span class="cx" style="display: block; padding: 0 10px"> 'NOBR', // Neutralized.
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ 'NOEMBED', // Neutralized.
+ 'NOFRAMES', // Neutralized.
</ins><span class="cx" style="display: block; padding: 0 10px"> 'NOSCRIPT',
</span><span class="cx" style="display: block; padding: 0 10px"> 'OBJECT',
</span><span class="cx" style="display: block; padding: 0 10px"> 'OL',
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -122,6 +125,7 @@
</span><span class="cx" style="display: block; padding: 0 10px"> 'RTC', // Neutralized.
</span><span class="cx" style="display: block; padding: 0 10px"> 'RUBY',
</span><span class="cx" style="display: block; padding: 0 10px"> 'SAMP',
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ 'SCRIPT',
</ins><span class="cx" style="display: block; padding: 0 10px"> 'SEARCH',
</span><span class="cx" style="display: block; padding: 0 10px"> 'SECTION',
</span><span class="cx" style="display: block; padding: 0 10px"> 'SLOT',
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -130,21 +134,29 @@
</span><span class="cx" style="display: block; padding: 0 10px"> 'SPAN',
</span><span class="cx" style="display: block; padding: 0 10px"> 'STRIKE',
</span><span class="cx" style="display: block; padding: 0 10px"> 'STRONG',
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ 'STYLE',
</ins><span class="cx" style="display: block; padding: 0 10px"> 'SUB',
</span><span class="cx" style="display: block; padding: 0 10px"> 'SUMMARY',
</span><span class="cx" style="display: block; padding: 0 10px"> 'SUP',
</span><span class="cx" style="display: block; padding: 0 10px"> 'TABLE',
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ 'TEXTAREA',
</ins><span class="cx" style="display: block; padding: 0 10px"> 'TIME',
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ 'TITLE',
</ins><span class="cx" style="display: block; padding: 0 10px"> 'TT',
</span><span class="cx" style="display: block; padding: 0 10px"> 'U',
</span><span class="cx" style="display: block; padding: 0 10px"> 'UL',
</span><span class="cx" style="display: block; padding: 0 10px"> 'VAR',
</span><span class="cx" style="display: block; padding: 0 10px"> 'VIDEO',
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ 'XMP', // Deprecated, use PRE instead.
</ins><span class="cx" style="display: block; padding: 0 10px"> );
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> $data = array();
</span><span class="cx" style="display: block; padding: 0 10px"> foreach ( $supported_elements as $tag_name ) {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $data[ $tag_name ] = array( "<{$tag_name}>", $tag_name );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $closer = in_array( $tag_name, array( 'NOEMBED', 'NOFRAMES', 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE', 'XMP' ), true )
+ ? "</{$tag_name}>"
+ : '';
+
+ $data[ $tag_name ] = array( "<{$tag_name}>{$closer}", $tag_name );
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> $data['IMAGE (treated as an IMG)'] = array( '<image>', 'IMG' );
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -182,22 +194,9 @@
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> public static function data_unsupported_elements() {
</span><span class="cx" style="display: block; padding: 0 10px"> $unsupported_elements = array(
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- 'BODY',
- 'FRAME',
- 'FRAMESET',
- 'HEAD',
- 'HTML',
- 'IFRAME',
</del><span class="cx" style="display: block; padding: 0 10px"> 'MATH',
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- 'NOEMBED', // Neutralized.
- 'NOFRAMES', // Neutralized.
</del><span class="cx" style="display: block; padding: 0 10px"> 'PLAINTEXT', // Neutralized.
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- 'SCRIPT',
- 'STYLE',
</del><span class="cx" style="display: block; padding: 0 10px"> 'SVG',
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- 'TEXTAREA',
- 'TITLE',
- 'XMP', // Deprecated, use PRE instead.
</del><span class="cx" style="display: block; padding: 0 10px"> );
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> $data = array();
</span></span></pre>
</div>
</div>
</body>
</html>