<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[57211] trunk: HTML API: Avoid processing incomplete tokens.</title>
</head>
<body>
<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; }
#msg dl a { font-weight: bold}
#msg dl a:link { color:#fc3; }
#msg dl a:active { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { white-space: pre-line; overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta" style="font-size: 105%">
<dt style="float: left; width: 6em; font-weight: bold">Revision</dt> <dd><a style="font-weight: bold" href="https://core.trac.wordpress.org/changeset/57211">57211</a><script type="application/ld+json">{"@context":"http://schema.org","@type":"EmailMessage","description":"Review this Commit","action":{"@type":"ViewAction","url":"https://core.trac.wordpress.org/changeset/57211","name":"Review Commit"}}</script></dd>
<dt style="float: left; width: 6em; font-weight: bold">Author</dt> <dd>Bernhard Reiter</dd>
<dt style="float: left; width: 6em; font-weight: bold">Date</dt> <dd>2023-12-20 17:50:04 +0000 (Wed, 20 Dec 2023)</dd>
</dl>
<pre style='padding-left: 1em; margin: 2em 0; border-left: 2px solid #ccc; line-height: 1.25; font-size: 105%; font-family: sans-serif'>HTML API: Avoid processing incomplete tokens.
Currently the Tag Processor assumes that an input document is a ''full'' HTML document. Because of this, if there's lingering content after the last tag match it will treat that content as plaintext and skip over it. This is fine for the Tag Processor because if there is lingering content that isn't a valid tag then there's nothing for `next_tag()` to match.
However, in order to support a number of feature expansions it is important to recognize that the remaining content ''may'' involve partial syntax elements, such as incomplete tags, attributes, or comments.
In this patch we're adding a mode inside the Tag Processor which will flip when we start parsing HTML syntax but the document finishes before the token does. This will provide the ability to:
- extend the input document,
- avoid misinterpreting syntax as text, and
- guess if we have a complete document, know if we have an incomplete document.
In the process of building this patch a few fixes were identified and fixed in the Tag Processor, namely in the handling of incomplete syntax elements.
Props dmsnell, jonsurrell.
Fixes <a href="https://core.trac.wordpress.org/ticket/60122">#60122</a>, <a href="https://core.trac.wordpress.org/ticket/60108">#60108</a>.</pre>
<h3>Modified Paths</h3>
<ul>
<li><a href="#trunksrcwpincludeshtmlapiclasswphtmltagprocessorphp">trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php</a></li>
<li><a href="#trunktestsphpunittestshtmlapiwpHtmlTagProcessorphp">trunk/tests/phpunit/tests/html-api/wpHtmlTagProcessor.php</a></li>
</ul>
</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunksrcwpincludeshtmlapiclasswphtmltagprocessorphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php 2023-12-20 14:50:11 UTC (rev 57210)
+++ trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php 2023-12-20 17:50:04 UTC (rev 57211)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -15,9 +15,6 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * - Prune the whitespace when removing classes/attributes: e.g. "a b c" -> "c" not " c".
</span><span class="cx" style="display: block; padding: 0 10px"> * This would increase the size of the changes for some operations but leave more
</span><span class="cx" style="display: block; padding: 0 10px"> * natural-looking output HTML.
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- * - Decode HTML character references within class names when matching. E.g. match having
- * class `1<"2` needs to recognize `class="1<"2"`. Currently the Tag Processor
- * will fail to find the right tag if the class name is encoded as such.
</del><span class="cx" style="display: block; padding: 0 10px"> * - Properly decode HTML character references in `get_attribute()`. PHP's
</span><span class="cx" style="display: block; padding: 0 10px"> * `html_entity_decode()` is wrong in a couple ways: it doesn't account for the
</span><span class="cx" style="display: block; padding: 0 10px"> * no-ambiguous-ampersand rule, and it improperly handles the way semicolons may
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -107,6 +104,56 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * given, it will return `true` (the only way to set `false` for an
</span><span class="cx" style="display: block; padding: 0 10px"> * attribute is to remove it).
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * #### When matching fails
+ *
+ * When `next_tag()` returns `false` it could mean different things:
+ *
+ * - The requested tag wasn't found in the input document.
+ * - The input document ended in the middle of an HTML syntax element.
+ *
+ * When a document ends in the middle of a syntax element it will pause
+ * the processor. This is to make it possible in the future to extend the
+ * input document and proceed - an important requirement for chunked
+ * streaming parsing of a document.
+ *
+ * Example:
+ *
+ * $processor = new WP_HTML_Tag_Processor( 'This <div is="a" partial="token' );
+ * false === $processor->next_tag();
+ *
+ * If a special element (see next section) is encountered but no closing tag
+ * is found it will count as an incomplete tag. The parser will pause as if
+ * the opening tag were incomplete.
+ *
+ * Example:
+ *
+ * $processor = new WP_HTML_Tag_Processor( '<style>// there could be more styling to come' );
+ * false === $processor->next_tag();
+ *
+ * $processor = new WP_HTML_Tag_Processor( '<style>// this is everything</style><div>' );
+ * true === $processor->next_tag( 'DIV' );
+ *
+ * #### Special elements
+ *
+ * Some HTML elements are handled in a special way; their start and end tags
+ * act like a void tag. These are special because their contents can't contain
+ * HTML markup. Everything inside these elements is handled in a special way
+ * and content that _appears_ like HTML tags inside of them isn't. There can
+ * be no nesting in these elements.
+ *
+ * In the following list, "raw text" means that all of the content in the HTML
+ * until the matching closing tag is treated verbatim without any replacements
+ * and without any parsing.
+ *
+ * - IFRAME allows no content but requires a closing tag.
+ * - NOEMBED (deprecated) content is raw text.
+ * - NOFRAMES (deprecated) content is raw text.
+ * - SCRIPT content is plaintext apart from legacy rules allowing `</script>` inside an HTML comment.
+ * - STYLE content is raw text.
+ * - TITLE content is plain text but character references are decoded.
+ * - TEXTAREA content is plain text but character references are decoded.
+ * - XMP (deprecated) content is raw text.
+ *
</ins><span class="cx" style="display: block; padding: 0 10px"> * ### Modifying HTML attributes for a found tag
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><span class="cx" style="display: block; padding: 0 10px"> * Once you've found the start of an opening tag you can modify
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -241,9 +288,39 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * double-quoted strings, meaning that attributes on input with single-quoted or
</span><span class="cx" style="display: block; padding: 0 10px"> * unquoted values will appear in the output with double-quotes.
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * ### Scripting Flag
+ *
+ * The Tag Processor parses HTML with the "scripting flag" disabled. This means
+ * that it doesn't run any scripts while parsing the page. In a browser with
+ * JavaScript enabled, for example, the script can change the parse of the
+ * document as it loads. On the server, however, evaluating JavaScript is not
+ * only impractical, but also unwanted.
+ *
+ * Practically this means that the Tag Processor will descend into NOSCRIPT
+ * elements and process its child tags. Were the scripting flag enabled, such
+ * as in a typical browser, the contents of NOSCRIPT are skipped entirely.
+ *
+ * This allows the HTML API to process the content that will be presented in
+ * a browser when scripting is disabled, but it offers a different view of a
+ * page than most browser sessions will experience. E.g. the tags inside the
+ * NOSCRIPT disappear.
+ *
+ * ### Text Encoding
+ *
+ * The Tag Processor assumes that the input HTML document is encoded with a
+ * text encoding compatible with 7-bit ASCII's '<', '>', '&', ';', '/', '=',
+ * "'", '"', 'a' - 'z', 'A' - 'Z', and the whitespace characters ' ', tab,
+ * carriage-return, newline, and form-feed.
+ *
+ * In practice, this includes almost every single-byte encoding as well as
+ * UTF-8. Notably, however, it does not include UTF-16. If providing input
+ * that's incompatible, then convert the encoding beforehand.
+ *
</ins><span class="cx" style="display: block; padding: 0 10px"> * @since 6.2.0
</span><span class="cx" style="display: block; padding: 0 10px"> * @since 6.2.1 Fix: Support for various invalid comments; attribute updates are case-insensitive.
</span><span class="cx" style="display: block; padding: 0 10px"> * @since 6.3.2 Fix: Skip HTML-like content inside rawtext elements such as STYLE.
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * @since 6.5.0 Pauses processor when input ends in an incomplete syntax token.
+ * Introduces "special" elements which act like void elements, e.g. STYLE.
</ins><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> class WP_HTML_Tag_Processor {
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -317,6 +394,27 @@
</span><span class="cx" style="display: block; padding: 0 10px"> private $stop_on_tag_closers;
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * Specifies mode of operation of the parser at any given time.
+ *
+ * | State | Meaning |
+ * | --------------|----------------------------------------------------------------------|
+ * | *Ready* | The parser is ready to run. |
+ * | *Complete* | There is nothing left to parse. |
+ * | *Incomplete* | The HTML ended in the middle of a token; nothing more can be parsed. |
+ * | *Matched tag* | Found an HTML tag; it's possible to modify its attributes. |
+ *
+ * @since 6.5.0
+ *
+ * @see WP_HTML_Tag_Processor::STATE_READY
+ * @see WP_HTML_Tag_Processor::STATE_COMPLETE
+ * @see WP_HTML_Tag_Processor::STATE_INCOMPLETE
+ * @see WP_HTML_Tag_Processor::STATE_MATCHED_TAG
+ *
+ * @var string
+ */
+ private $parser_state = self::STATE_READY;
+
+ /**
</ins><span class="cx" style="display: block; padding: 0 10px"> * How many bytes from the original HTML document have been read and parsed.
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><span class="cx" style="display: block; padding: 0 10px"> * This value points to the latest byte offset in the input document which
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -544,6 +642,7 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * Finds the next tag matching the $query.
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><span class="cx" style="display: block; padding: 0 10px"> * @since 6.2.0
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * @since 6.5.0 No longer processes incomplete tokens at end of document; pauses the processor at start of token.
</ins><span class="cx" style="display: block; padding: 0 10px"> *
</span><span class="cx" style="display: block; padding: 0 10px"> * @param array|string|null $query {
</span><span class="cx" style="display: block; padding: 0 10px"> * Optional. Which tag name to find, having which class, etc. Default is to find any tag.
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -562,90 +661,177 @@
</span><span class="cx" style="display: block; padding: 0 10px"> $already_found = 0;
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> do {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( $this->bytes_already_parsed >= strlen( $this->html ) ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( false === $this->next_token() ) {
</ins><span class="cx" style="display: block; padding: 0 10px"> return false;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- // Find the next tag if it exists.
- if ( false === $this->parse_next_tag() ) {
- $this->bytes_already_parsed = strlen( $this->html );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( self::STATE_MATCHED_TAG !== $this->parser_state ) {
+ continue;
+ }
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- return false;
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( $this->matches() ) {
+ ++$already_found;
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ } while ( $already_found < $this->sought_match_offset );
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- // Parse all of its attributes.
- while ( $this->parse_next_attribute() ) {
- continue;
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ return true;
+ }
+
+ /**
+ * Finds the next token in the HTML document.
+ *
+ * An HTML document can be viewed as a stream of tokens,
+ * where tokens are things like HTML tags, HTML comments,
+ * text nodes, etc. This method finds the next token in
+ * the HTML document and returns whether it found one.
+ *
+ * If it starts parsing a token and reaches the end of the
+ * document then it will seek to the start of the last
+ * token and pause, returning `false` to indicate that it
+ * failed to find a complete token.
+ *
+ * Possible token types, based on the HTML specification:
+ *
+ * - an HTML tag, whether opening, closing, or void.
+ * - a text node - the plaintext inside tags.
+ * - an HTML comment.
+ * - a DOCTYPE declaration.
+ * - a processing instruction, e.g. `<?xml version="1.0" ?>`.
+ *
+ * The Tag Processor currently only supports the tag token.
+ *
+ * @since 6.5.0
+ *
+ * @return bool Whether a token was parsed.
+ */
+ public function next_token() {
+ $this->get_updated_html();
+ $was_at = $this->bytes_already_parsed;
+
+ // Don't proceed if there's nothing more to scan.
+ if (
+ self::STATE_COMPLETE === $this->parser_state ||
+ self::STATE_INCOMPLETE === $this->parser_state
+ ) {
+ return false;
+ }
+
+ /*
+ * The next step in the parsing loop determines the parsing state;
+ * clear it so that state doesn't linger from the previous step.
+ */
+ $this->parser_state = self::STATE_READY;
+
+ if ( $this->bytes_already_parsed >= strlen( $this->html ) ) {
+ $this->parser_state = self::STATE_COMPLETE;
+ return false;
+ }
+
+ // Find the next tag if it exists.
+ if ( false === $this->parse_next_tag() ) {
+ if ( self::STATE_INCOMPLETE === $this->parser_state ) {
+ $this->bytes_already_parsed = $was_at;
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- // Ensure that the tag closes before the end of the document.
- if ( $this->bytes_already_parsed >= strlen( $this->html ) ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ return false;
+ }
+
+ // Parse all of its attributes.
+ while ( $this->parse_next_attribute() ) {
+ continue;
+ }
+
+ // Ensure that the tag closes before the end of the document.
+ if (
+ self::STATE_INCOMPLETE === $this->parser_state ||
+ $this->bytes_already_parsed >= strlen( $this->html )
+ ) {
+ // Does this appropriately clear state (parsed attributes)?
+ $this->parser_state = self::STATE_INCOMPLETE;
+ $this->bytes_already_parsed = $was_at;
+
+ return false;
+ }
+
+ $tag_ends_at = strpos( $this->html, '>', $this->bytes_already_parsed );
+ if ( false === $tag_ends_at ) {
+ $this->parser_state = self::STATE_INCOMPLETE;
+ $this->bytes_already_parsed = $was_at;
+
+ return false;
+ }
+ $this->parser_state = self::STATE_MATCHED_TAG;
+ $this->token_length = $tag_ends_at - $this->token_starts_at;
+ $this->bytes_already_parsed = $tag_ends_at;
+
+ /*
+ * For non-DATA sections which might contain text that looks like HTML tags but
+ * isn't, scan with the appropriate alternative mode. Looking at the first letter
+ * of the tag name as a pre-check avoids a string allocation when it's not needed.
+ */
+ $t = $this->html[ $this->tag_name_starts_at ];
+ if (
+ ! $this->is_closing_tag &&
+ (
+ 'i' === $t || 'I' === $t ||
+ 'n' === $t || 'N' === $t ||
+ 's' === $t || 'S' === $t ||
+ 't' === $t || 'T' === $t ||
+ 'x' === $t || 'X' === $t
+ )
+ ) {
+ $tag_name = $this->get_tag();
+
+ if ( 'SCRIPT' === $tag_name && ! $this->skip_script_data() ) {
+ $this->parser_state = self::STATE_INCOMPLETE;
+ $this->bytes_already_parsed = $was_at;
+
</ins><span class="cx" style="display: block; padding: 0 10px"> return false;
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- }
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ } elseif (
+ ( 'TEXTAREA' === $tag_name || 'TITLE' === $tag_name ) &&
+ ! $this->skip_rcdata( $tag_name )
+ ) {
+ $this->parser_state = self::STATE_INCOMPLETE;
+ $this->bytes_already_parsed = $was_at;
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $tag_ends_at = strpos( $this->html, '>', $this->bytes_already_parsed );
- if ( false === $tag_ends_at ) {
</del><span class="cx" style="display: block; padding: 0 10px"> return false;
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- }
- $this->token_length = $tag_ends_at - $this->token_starts_at;
- $this->bytes_already_parsed = $tag_ends_at;
-
- // Finally, check if the parsed tag and its attributes match the search query.
- if ( $this->matches() ) {
- ++$already_found;
- }
-
- /*
- * For non-DATA sections which might contain text that looks like HTML tags but
- * isn't, scan with the appropriate alternative mode. Looking at the first letter
- * of the tag name as a pre-check avoids a string allocation when it's not needed.
- */
- $t = $this->html[ $this->tag_name_starts_at ];
- if (
- ! $this->is_closing_tag &&
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ } elseif (
</ins><span class="cx" style="display: block; padding: 0 10px"> (
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- 'i' === $t || 'I' === $t ||
- 'n' === $t || 'N' === $t ||
- 's' === $t || 'S' === $t ||
- 't' === $t || 'T' === $t
- ) ) {
- $tag_name = $this->get_tag();
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ 'IFRAME' === $tag_name ||
+ 'NOEMBED' === $tag_name ||
+ 'NOFRAMES' === $tag_name ||
+ 'STYLE' === $tag_name ||
+ 'XMP' === $tag_name
+ ) &&
+ ! $this->skip_rawtext( $tag_name )
+ ) {
+ $this->parser_state = self::STATE_INCOMPLETE;
+ $this->bytes_already_parsed = $was_at;
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( 'SCRIPT' === $tag_name && ! $this->skip_script_data() ) {
- $this->bytes_already_parsed = strlen( $this->html );
- return false;
- } elseif (
- ( 'TEXTAREA' === $tag_name || 'TITLE' === $tag_name ) &&
- ! $this->skip_rcdata( $tag_name )
- ) {
- $this->bytes_already_parsed = strlen( $this->html );
- return false;
- } elseif (
- (
- 'IFRAME' === $tag_name ||
- 'NOEMBED' === $tag_name ||
- 'NOFRAMES' === $tag_name ||
- 'NOSCRIPT' === $tag_name ||
- 'STYLE' === $tag_name
- ) &&
- ! $this->skip_rawtext( $tag_name )
- ) {
- /*
- * "XMP" should be here too but its rules are more complicated and require the
- * complexity of the HTML Processor (it needs to close out any open P element,
- * meaning it can't be skipped here or else the HTML Processor will lose its
- * place). For now, it can be ignored as it's a rare HTML tag in practice and
- * any normative HTML should be using PRE instead.
- */
- $this->bytes_already_parsed = strlen( $this->html );
- return false;
- }
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ return false;
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- } while ( $already_found < $this->sought_match_offset );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ }
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> return true;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ /**
+ * Whether the processor paused because the input HTML document ended
+ * in the middle of a syntax element, such as in the middle of a tag.
+ *
+ * Example:
+ *
+ * $processor = new WP_HTML_Tag_Processor( '<input type="text" value="Th' );
+ * false === $processor->get_next_tag();
+ * true === $processor->paused_at_incomplete_token();
+ *
+ * @since 6.5.0
+ *
+ * @return bool Whether the parse paused at the start of an incomplete token.
+ */
+ public function paused_at_incomplete_token() {
+ return self::STATE_INCOMPLETE === $this->parser_state;
+ }
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><span class="cx" style="display: block; padding: 0 10px"> * Generator for a foreach loop to step through each class name for the matched tag.
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -664,6 +850,10 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * @since 6.4.0
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> public function class_list() {
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( self::STATE_MATCHED_TAG !== $this->parser_state ) {
+ return;
+ }
+
</ins><span class="cx" style="display: block; padding: 0 10px"> /** @var string $class contains the string value of the class attribute, with character references decoded. */
</span><span class="cx" style="display: block; padding: 0 10px"> $class = $this->get_attribute( 'class' );
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -719,7 +909,7 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * @return bool|null Whether the matched tag contains the given class name, or null if not matched.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> public function has_class( $wanted_class ) {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( ! $this->tag_name_starts_at ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( self::STATE_MATCHED_TAG !== $this->parser_state ) {
</ins><span class="cx" style="display: block; padding: 0 10px"> return null;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -816,7 +1006,8 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * @return bool Whether the bookmark was successfully created.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> public function set_bookmark( $name ) {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( null === $this->tag_name_starts_at ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ // It only makes sense to set a bookmark if the parser has paused on a concrete token.
+ if ( self::STATE_MATCHED_TAG !== $this->parser_state ) {
</ins><span class="cx" style="display: block; padding: 0 10px"> return false;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -895,7 +1086,6 @@
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> // Fail if there is no possible tag closer.
</span><span class="cx" style="display: block; padding: 0 10px"> if ( false === $at || ( $at + $tag_length ) >= $doc_length ) {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $this->bytes_already_parsed = $doc_length;
</del><span class="cx" style="display: block; padding: 0 10px"> return false;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -923,6 +1113,10 @@
</span><span class="cx" style="display: block; padding: 0 10px"> $at += $tag_length;
</span><span class="cx" style="display: block; padding: 0 10px"> $this->bytes_already_parsed = $at;
</span><span class="cx" style="display: block; padding: 0 10px">
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( $at >= strlen( $html ) ) {
+ return false;
+ }
+
</ins><span class="cx" style="display: block; padding: 0 10px"> /*
</span><span class="cx" style="display: block; padding: 0 10px"> * Ensure that the tag name terminates to avoid matching on
</span><span class="cx" style="display: block; padding: 0 10px"> * substrings of a longer tag name. For example, the sequence
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1073,6 +1267,12 @@
</span><span class="cx" style="display: block; padding: 0 10px"> continue;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( $this->bytes_already_parsed >= $doc_length ) {
+ $this->parser_state = self::STATE_INCOMPLETE;
+
+ return false;
+ }
+
</ins><span class="cx" style="display: block; padding: 0 10px"> if ( '>' === $html[ $this->bytes_already_parsed ] ) {
</span><span class="cx" style="display: block; padding: 0 10px"> $this->bytes_already_parsed = $closer_potentially_starts_at;
</span><span class="cx" style="display: block; padding: 0 10px"> return true;
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1107,6 +1307,11 @@
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> while ( false !== $at && $at < $doc_length ) {
</span><span class="cx" style="display: block; padding: 0 10px"> $at = strpos( $html, '<', $at );
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+
+ /*
+ * This does not imply an incomplete parse; it indicates that there
+ * can be nothing left in the document other than a #text node.
+ */
</ins><span class="cx" style="display: block; padding: 0 10px"> if ( false === $at ) {
</span><span class="cx" style="display: block; padding: 0 10px"> return false;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1113,7 +1318,7 @@
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> $this->token_starts_at = $at;
</span><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( '/' === $this->html[ $at + 1 ] ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( $at + 1 < $doc_length && '/' === $this->html[ $at + 1 ] ) {
</ins><span class="cx" style="display: block; padding: 0 10px"> $this->is_closing_tag = true;
</span><span class="cx" style="display: block; padding: 0 10px"> ++$at;
</span><span class="cx" style="display: block; padding: 0 10px"> } else {
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1147,7 +1352,9 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * Abort if no tag is found before the end of
</span><span class="cx" style="display: block; padding: 0 10px"> * the document. There is nothing left to parse.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( $at + 1 >= strlen( $html ) ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( $at + 1 >= $doc_length ) {
+ $this->parser_state = self::STATE_INCOMPLETE;
+
</ins><span class="cx" style="display: block; padding: 0 10px"> return false;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1161,13 +1368,15 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * https://html.spec.whatwg.org/multipage/parsing.html#tag-open-state
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> if (
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- strlen( $html ) > $at + 3 &&
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $doc_length > $at + 3 &&
</ins><span class="cx" style="display: block; padding: 0 10px"> '-' === $html[ $at + 2 ] &&
</span><span class="cx" style="display: block; padding: 0 10px"> '-' === $html[ $at + 3 ]
</span><span class="cx" style="display: block; padding: 0 10px"> ) {
</span><span class="cx" style="display: block; padding: 0 10px"> $closer_at = $at + 4;
</span><span class="cx" style="display: block; padding: 0 10px"> // If it's not possible to close the comment then there is nothing more to scan.
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( strlen( $html ) <= $closer_at ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( $doc_length <= $closer_at ) {
+ $this->parser_state = self::STATE_INCOMPLETE;
+
</ins><span class="cx" style="display: block; padding: 0 10px"> return false;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1185,18 +1394,20 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * See https://html.spec.whatwg.org/#parse-error-incorrectly-closed-comment
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> --$closer_at; // Pre-increment inside condition below reduces risk of accidental infinite looping.
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- while ( ++$closer_at < strlen( $html ) ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ while ( ++$closer_at < $doc_length ) {
</ins><span class="cx" style="display: block; padding: 0 10px"> $closer_at = strpos( $html, '--', $closer_at );
</span><span class="cx" style="display: block; padding: 0 10px"> if ( false === $closer_at ) {
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $this->parser_state = self::STATE_INCOMPLETE;
+
</ins><span class="cx" style="display: block; padding: 0 10px"> return false;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( $closer_at + 2 < strlen( $html ) && '>' === $html[ $closer_at + 2 ] ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( $closer_at + 2 < $doc_length && '>' === $html[ $closer_at + 2 ] ) {
</ins><span class="cx" style="display: block; padding: 0 10px"> $at = $closer_at + 3;
</span><span class="cx" style="display: block; padding: 0 10px"> continue 2;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( $closer_at + 3 < strlen( $html ) && '!' === $html[ $closer_at + 2 ] && '>' === $html[ $closer_at + 3 ] ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( $closer_at + 3 < $doc_length && '!' === $html[ $closer_at + 2 ] && '>' === $html[ $closer_at + 3 ] ) {
</ins><span class="cx" style="display: block; padding: 0 10px"> $at = $closer_at + 4;
</span><span class="cx" style="display: block; padding: 0 10px"> continue 2;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1209,7 +1420,7 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * https://html.spec.whatwg.org/multipage/parsing.html#tag-open-state
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> if (
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- strlen( $html ) > $at + 8 &&
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $doc_length > $at + 8 &&
</ins><span class="cx" style="display: block; padding: 0 10px"> '[' === $html[ $at + 2 ] &&
</span><span class="cx" style="display: block; padding: 0 10px"> 'C' === $html[ $at + 3 ] &&
</span><span class="cx" style="display: block; padding: 0 10px"> 'D' === $html[ $at + 4 ] &&
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1220,6 +1431,8 @@
</span><span class="cx" style="display: block; padding: 0 10px"> ) {
</span><span class="cx" style="display: block; padding: 0 10px"> $closer_at = strpos( $html, ']]>', $at + 9 );
</span><span class="cx" style="display: block; padding: 0 10px"> if ( false === $closer_at ) {
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $this->parser_state = self::STATE_INCOMPLETE;
+
</ins><span class="cx" style="display: block; padding: 0 10px"> return false;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1233,7 +1446,7 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * https://html.spec.whatwg.org/multipage/parsing.html#tag-open-state
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> if (
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- strlen( $html ) > $at + 8 &&
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $doc_length > $at + 8 &&
</ins><span class="cx" style="display: block; padding: 0 10px"> ( 'D' === $html[ $at + 2 ] || 'd' === $html[ $at + 2 ] ) &&
</span><span class="cx" style="display: block; padding: 0 10px"> ( 'O' === $html[ $at + 3 ] || 'o' === $html[ $at + 3 ] ) &&
</span><span class="cx" style="display: block; padding: 0 10px"> ( 'C' === $html[ $at + 4 ] || 'c' === $html[ $at + 4 ] ) &&
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1244,6 +1457,8 @@
</span><span class="cx" style="display: block; padding: 0 10px"> ) {
</span><span class="cx" style="display: block; padding: 0 10px"> $closer_at = strpos( $html, '>', $at + 9 );
</span><span class="cx" style="display: block; padding: 0 10px"> if ( false === $closer_at ) {
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $this->parser_state = self::STATE_INCOMPLETE;
+
</ins><span class="cx" style="display: block; padding: 0 10px"> return false;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1253,9 +1468,16 @@
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /*
</span><span class="cx" style="display: block; padding: 0 10px"> * Anything else here is an incorrectly-opened comment and transitions
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- * to the bogus comment state - skip to the nearest >.
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * to the bogus comment state - skip to the nearest >. If no closer is
+ * found then the HTML was truncated inside the markup declaration.
</ins><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> $at = strpos( $html, '>', $at + 1 );
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( false === $at ) {
+ $this->parser_state = self::STATE_INCOMPLETE;
+
+ return false;
+ }
+
</ins><span class="cx" style="display: block; padding: 0 10px"> continue;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1262,6 +1484,10 @@
</span><span class="cx" style="display: block; padding: 0 10px"> /*
</span><span class="cx" style="display: block; padding: 0 10px"> * </> is a missing end tag name, which is ignored.
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * This was also known as the "presumptuous empty tag"
+ * in early discussions as it was proposed to close
+ * the nearest previous opening tag.
+ *
</ins><span class="cx" style="display: block; padding: 0 10px"> * See https://html.spec.whatwg.org/#parse-error-missing-end-tag-name
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> if ( '>' === $html[ $at + 1 ] ) {
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1276,6 +1502,8 @@
</span><span class="cx" style="display: block; padding: 0 10px"> if ( '?' === $html[ $at + 1 ] ) {
</span><span class="cx" style="display: block; padding: 0 10px"> $closer_at = strpos( $html, '>', $at + 2 );
</span><span class="cx" style="display: block; padding: 0 10px"> if ( false === $closer_at ) {
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $this->parser_state = self::STATE_INCOMPLETE;
+
</ins><span class="cx" style="display: block; padding: 0 10px"> return false;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1290,8 +1518,15 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * See https://html.spec.whatwg.org/#parse-error-invalid-first-character-of-tag-name
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> if ( $this->is_closing_tag ) {
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ // No chance of finding a closer.
+ if ( $at + 3 > $doc_length ) {
+ return false;
+ }
+
</ins><span class="cx" style="display: block; padding: 0 10px"> $closer_at = strpos( $html, '>', $at + 3 );
</span><span class="cx" style="display: block; padding: 0 10px"> if ( false === $closer_at ) {
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $this->parser_state = self::STATE_INCOMPLETE;
+
</ins><span class="cx" style="display: block; padding: 0 10px"> return false;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1316,6 +1551,8 @@
</span><span class="cx" style="display: block; padding: 0 10px"> // Skip whitespace and slashes.
</span><span class="cx" style="display: block; padding: 0 10px"> $this->bytes_already_parsed += strspn( $this->html, " \t\f\r\n/", $this->bytes_already_parsed );
</span><span class="cx" style="display: block; padding: 0 10px"> if ( $this->bytes_already_parsed >= strlen( $this->html ) ) {
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $this->parser_state = self::STATE_INCOMPLETE;
+
</ins><span class="cx" style="display: block; padding: 0 10px"> return false;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1338,11 +1575,15 @@
</span><span class="cx" style="display: block; padding: 0 10px"> $attribute_name = substr( $this->html, $attribute_start, $name_length );
</span><span class="cx" style="display: block; padding: 0 10px"> $this->bytes_already_parsed += $name_length;
</span><span class="cx" style="display: block; padding: 0 10px"> if ( $this->bytes_already_parsed >= strlen( $this->html ) ) {
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $this->parser_state = self::STATE_INCOMPLETE;
+
</ins><span class="cx" style="display: block; padding: 0 10px"> return false;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> $this->skip_whitespace();
</span><span class="cx" style="display: block; padding: 0 10px"> if ( $this->bytes_already_parsed >= strlen( $this->html ) ) {
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $this->parser_state = self::STATE_INCOMPLETE;
+
</ins><span class="cx" style="display: block; padding: 0 10px"> return false;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1351,6 +1592,8 @@
</span><span class="cx" style="display: block; padding: 0 10px"> ++$this->bytes_already_parsed;
</span><span class="cx" style="display: block; padding: 0 10px"> $this->skip_whitespace();
</span><span class="cx" style="display: block; padding: 0 10px"> if ( $this->bytes_already_parsed >= strlen( $this->html ) ) {
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $this->parser_state = self::STATE_INCOMPLETE;
+
</ins><span class="cx" style="display: block; padding: 0 10px"> return false;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1377,6 +1620,8 @@
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> if ( $attribute_end >= strlen( $this->html ) ) {
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $this->parser_state = self::STATE_INCOMPLETE;
+
</ins><span class="cx" style="display: block; padding: 0 10px"> return false;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1443,7 +1688,6 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * @since 6.2.0
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> private function after_tag() {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $this->get_updated_html();
</del><span class="cx" style="display: block; padding: 0 10px"> $this->token_starts_at = null;
</span><span class="cx" style="display: block; padding: 0 10px"> $this->token_length = null;
</span><span class="cx" style="display: block; padding: 0 10px"> $this->tag_name_starts_at = null;
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1786,6 +2030,10 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * @return string|boolean|null Value of enqueued update if present, otherwise false.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> private function get_enqueued_attribute_value( $comparable_name ) {
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( self::STATE_MATCHED_TAG !== $this->parser_state ) {
+ return false;
+ }
+
</ins><span class="cx" style="display: block; padding: 0 10px"> if ( ! isset( $this->lexical_updates[ $comparable_name ] ) ) {
</span><span class="cx" style="display: block; padding: 0 10px"> return false;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1853,7 +2101,7 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * @return string|true|null Value of attribute or `null` if not available. Boolean attributes return `true`.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> public function get_attribute( $name ) {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( null === $this->tag_name_starts_at ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( self::STATE_MATCHED_TAG !== $this->parser_state ) {
</ins><span class="cx" style="display: block; padding: 0 10px"> return null;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1933,7 +2181,10 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * @return array|null List of attribute names, or `null` when no tag opener is matched.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> public function get_attribute_names_with_prefix( $prefix ) {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( $this->is_closing_tag || null === $this->tag_name_starts_at ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if (
+ self::STATE_MATCHED_TAG !== $this->parser_state ||
+ $this->is_closing_tag
+ ) {
</ins><span class="cx" style="display: block; padding: 0 10px"> return null;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1965,7 +2216,7 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * @return string|null Name of currently matched tag in input HTML, or `null` if none found.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> public function get_tag() {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( null === $this->tag_name_starts_at ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( self::STATE_MATCHED_TAG !== $this->parser_state ) {
</ins><span class="cx" style="display: block; padding: 0 10px"> return null;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1992,7 +2243,7 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * @return bool Whether the currently matched tag contains the self-closing flag.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> public function has_self_closing_flag() {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( ! $this->tag_name_starts_at ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( self::STATE_MATCHED_TAG !== $this->parser_state ) {
</ins><span class="cx" style="display: block; padding: 0 10px"> return false;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -2024,7 +2275,10 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * @return bool Whether the current tag is a tag closer.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> public function is_tag_closer() {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- return $this->is_closing_tag;
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ return (
+ self::STATE_MATCHED_TAG === $this->parser_state &&
+ $this->is_closing_tag
+ );
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -2044,7 +2298,10 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * @return bool Whether an attribute value was set.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> public function set_attribute( $name, $value ) {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( $this->is_closing_tag || null === $this->tag_name_starts_at ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if (
+ self::STATE_MATCHED_TAG !== $this->parser_state ||
+ $this->is_closing_tag
+ ) {
</ins><span class="cx" style="display: block; padding: 0 10px"> return false;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -2177,7 +2434,10 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * @return bool Whether an attribute was removed.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> public function remove_attribute( $name ) {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( $this->is_closing_tag ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if (
+ self::STATE_MATCHED_TAG !== $this->parser_state ||
+ $this->is_closing_tag
+ ) {
</ins><span class="cx" style="display: block; padding: 0 10px"> return false;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -2254,13 +2514,14 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * @return bool Whether the class was set to be added.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> public function add_class( $class_name ) {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( $this->is_closing_tag ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if (
+ self::STATE_MATCHED_TAG !== $this->parser_state ||
+ $this->is_closing_tag
+ ) {
</ins><span class="cx" style="display: block; padding: 0 10px"> return false;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( null !== $this->tag_name_starts_at ) {
- $this->classname_updates[ $class_name ] = self::ADD_CLASS;
- }
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $this->classname_updates[ $class_name ] = self::ADD_CLASS;
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> return true;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -2274,7 +2535,10 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * @return bool Whether the class was set to be removed.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> public function remove_class( $class_name ) {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( $this->is_closing_tag ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if (
+ self::STATE_MATCHED_TAG !== $this->parser_state ||
+ $this->is_closing_tag
+ ) {
</ins><span class="cx" style="display: block; padding: 0 10px"> return false;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -2480,4 +2744,57 @@
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> return true;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+
+ /**
+ * Parser Ready State
+ *
+ * Indicates that the parser is ready to run and waiting for a state transition.
+ * It may not have started yet, or it may have just finished parsing a token and
+ * is ready to find the next one.
+ *
+ * @since 6.5.0
+ *
+ * @access private
+ */
+ const STATE_READY = 'STATE_READY';
+
+ /**
+ * Parser Complete State
+ *
+ * Indicates that the parser has reached the end of the document and there is
+ * nothing left to scan. It finished parsing the last token completely.
+ *
+ * @since 6.5.0
+ *
+ * @access private
+ */
+ const STATE_COMPLETE = 'STATE_COMPLETE';
+
+ /**
+ * Parser Incomplete State
+ *
+ * Indicates that the parser has reached the end of the document before finishing
+ * a token. It started parsing a token but there is a possibility that the input
+ * HTML document was truncated in the middle of a token.
+ *
+ * The parser is reset at the start of the incomplete token and has paused. There
+ * is nothing more than can be scanned unless provided a more complete document.
+ *
+ * @since 6.5.0
+ *
+ * @access private
+ */
+ const STATE_INCOMPLETE = 'STATE_INCOMPLETE';
+
+ /**
+ * Parser Matched Tag State
+ *
+ * Indicates that the parser has found an HTML tag and it's possible to get
+ * the tag name and read or modify its attributes (if it's not a closing tag).
+ *
+ * @since 6.5.0
+ *
+ * @access private
+ */
+ const STATE_MATCHED_TAG = 'STATE_MATCHED_TAG';
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span></span></pre></div>
<a id="trunktestsphpunittestshtmlapiwpHtmlTagProcessorphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/tests/phpunit/tests/html-api/wpHtmlTagProcessor.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/tests/phpunit/tests/html-api/wpHtmlTagProcessor.php 2023-12-20 14:50:11 UTC (rev 57210)
+++ trunk/tests/phpunit/tests/html-api/wpHtmlTagProcessor.php 2023-12-20 17:50:04 UTC (rev 57211)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1756,12 +1756,21 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * @ticket 56299
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><span class="cx" style="display: block; padding: 0 10px"> * @covers WP_HTML_Tag_Processor::next_tag
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * @covers WP_HTML_Tag_Processor::paused_at_incomplete_token
</ins><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> public function test_unclosed_script_tag_should_not_cause_an_infinite_loop() {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $p = new WP_HTML_Tag_Processor( '<script>' );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $p = new WP_HTML_Tag_Processor( '<script><div>' );
+ $this->assertFalse(
+ $p->next_tag(),
+ 'Should not have stopped on an opening SCRIPT tag without a proper closing tag in the document.'
+ );
+ $this->assertTrue(
+ $p->paused_at_incomplete_token(),
+ "Should have paused the parser because of the incomplete SCRIPT tag but didn't."
+ );
+
+ // Run this to ensure that the test ends (not in an infinite loop).
</ins><span class="cx" style="display: block; padding: 0 10px"> $p->next_tag();
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $this->assertSame( 'SCRIPT', $p->get_tag(), 'Did not find script tag' );
- $p->next_tag();
</del><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1932,6 +1941,30 @@
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * Ensures matching elements inside NOSCRIPT elements.
+ *
+ * In a browser when the scripting flag is enabled, everything inside
+ * the NOSCRIPT element will be ignored and treated at RAW TEXT. This
+ * means that it's valid to send what looks like incomplete or partial
+ * HTML syntax without impacting a rendered page. The Tag Processor is
+ * a parser with the scripting flag disabled, however, and needs to
+ * expose all the potential content that some code might want to modify.
+ *
+ * Were it not for this then the NOSCRIPT tag would be handled like the
+ * other tags in the RAW TEXT special group, e.g. NOEMBED or STYLE.
+ *
+ * @ticket 60122
+ *
+ * @covers WP_HTML_Tag_Processor::next_tag
+ */
+ public function test_processes_inside_of_noscript_elements() {
+ $p = new WP_HTML_Tag_Processor( '<noscript><input type="submit"></noscript><div>' );
+
+ $this->assertTrue( $p->next_tag( 'INPUT' ), 'Failed to find INPUT element inside NOSCRIPT element.' );
+ $this->assertTrue( $p->next_tag( 'DIV' ), 'Failed to find DIV element after NOSCRIPT element.' );
+ }
+
+ /**
</ins><span class="cx" style="display: block; padding: 0 10px"> * @ticket 59292
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><span class="cx" style="display: block; padding: 0 10px"> * @covers WP_HTML_Tag_Processor::next_tag
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1962,7 +1995,6 @@
</span><span class="cx" style="display: block; padding: 0 10px"> 'IFRAME' => array( '<iframe><section>Inside</section></iframe><section target>' ),
</span><span class="cx" style="display: block; padding: 0 10px"> 'NOEMBED' => array( '<noembed><p></p></noembed><div target>' ),
</span><span class="cx" style="display: block; padding: 0 10px"> 'NOFRAMES' => array( '<noframes><p>Check the rules here.</p></noframes><div target>' ),
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- 'NOSCRIPT' => array( '<noscript><span>This assumes that scripting mode is enabled.</span></noscript><p target>' ),
</del><span class="cx" style="display: block; padding: 0 10px"> 'STYLE' => array( '<style>* { margin: 0 }</style><div target>' ),
</span><span class="cx" style="display: block; padding: 0 10px"> 'STYLE hiding DIV' => array( '<style>li::before { content: "<div non-target>" }</style><div target>' ),
</span><span class="cx" style="display: block; padding: 0 10px"> );
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -2139,15 +2171,24 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * @ticket 58007
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><span class="cx" style="display: block; padding: 0 10px"> * @covers WP_HTML_Tag_Processor::next_tag
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * @covers WP_HTML_Tag_Processor::paused_at_incomplete_token
</ins><span class="cx" style="display: block; padding: 0 10px"> *
</span><span class="cx" style="display: block; padding: 0 10px"> * @dataProvider data_html_with_unclosed_comments
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- * @param string $html_ending_before_comment_close HTML with opened comments that aren't closed
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * @param string $html_ending_before_comment_close HTML with opened comments that aren't closed.
</ins><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> public function test_documents_may_end_with_unclosed_comment( $html_ending_before_comment_close ) {
</span><span class="cx" style="display: block; padding: 0 10px"> $p = new WP_HTML_Tag_Processor( $html_ending_before_comment_close );
</span><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $this->assertFalse( $p->next_tag() );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $this->assertFalse(
+ $p->next_tag(),
+ "Should not have found any tag, but found {$p->get_tag()}."
+ );
+
+ $this->assertTrue(
+ $p->paused_at_incomplete_token(),
+ "Should have indicated that the parser found an incomplete token but didn't."
+ );
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -2280,17 +2321,71 @@
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * Ensures that no tags are matched in a document containing only non-tag content.
+ *
+ * @ticket 60122
+ *
+ * @covers WP_HTML_Tag_Processor::next_tag
+ * @covers WP_HTML_Tag_Processor::paused_at_incomplete_token
+ *
+ * @dataProvider data_html_without_tags
+ *
+ * @param string $html_without_tags HTML without any tags in it.
+ */
+ public function test_next_tag_returns_false_when_there_are_no_tags( $html_without_tags ) {
+ $processor = new WP_HTML_Tag_Processor( $html_without_tags );
+
+ $this->assertFalse(
+ $processor->next_tag(),
+ "Shouldn't have found any tags but found {$processor->get_tag()}."
+ );
+
+ $this->assertFalse(
+ $processor->paused_at_incomplete_token(),
+ 'Should have indicated that end of document was reached without evidence that elements were truncated.'
+ );
+ }
+
+ /**
+ * Data provider.
+ *
+ * @return array[]
+ */
+ public function data_html_without_tags() {
+ return array(
+ 'DOCTYPE declaration' => array( '<!DOCTYPE html>Just some HTML' ),
+ 'No tags' => array( 'this is nothing more than a text node' ),
+ 'Text with comments' => array( 'One <!-- sneaky --> comment.' ),
+ 'Empty tag closer' => array( '</>' ),
+ 'Processing instruction' => array( '<?xml version="1.0"?>' ),
+ 'Combination XML-like' => array( '<!DOCTYPE xml><?xml version=""?><!-- this is not a real document. --><![CDATA[it only serves as a test]]>' ),
+ );
+ }
+
+ /**
+ * Ensures that the processor doesn't attempt to match an incomplete token.
+ *
</ins><span class="cx" style="display: block; padding: 0 10px"> * @ticket 58637
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><span class="cx" style="display: block; padding: 0 10px"> * @covers WP_HTML_Tag_Processor::next_tag
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * @covers WP_HTML_Tag_Processor::paused_at_incomplete_token
</ins><span class="cx" style="display: block; padding: 0 10px"> *
</span><span class="cx" style="display: block; padding: 0 10px"> * @dataProvider data_incomplete_syntax_elements
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><span class="cx" style="display: block; padding: 0 10px"> * @param string $incomplete_html HTML text containing some kind of incomplete syntax.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- public function test_returns_false_for_incomplete_syntax_elements( $incomplete_html ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ public function test_next_tag_returns_false_for_incomplete_syntax_elements( $incomplete_html ) {
</ins><span class="cx" style="display: block; padding: 0 10px"> $p = new WP_HTML_Tag_Processor( $incomplete_html );
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $this->assertFalse( $p->next_tag() );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+
+ $this->assertFalse(
+ $p->next_tag(),
+ "Shouldn't have found any tags but found {$p->get_tag()}."
+ );
+
+ $this->assertTrue(
+ $p->paused_at_incomplete_token(),
+ "Should have indicated that the parser found an incomplete token but didn't."
+ );
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -2300,7 +2395,6 @@
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> public function data_incomplete_syntax_elements() {
</span><span class="cx" style="display: block; padding: 0 10px"> return array(
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- 'No tags' => array( 'this is nothing more than a text node' ),
</del><span class="cx" style="display: block; padding: 0 10px"> 'Incomplete tag name' => array( '<swit' ),
</span><span class="cx" style="display: block; padding: 0 10px"> 'Incomplete tag (no attributes)' => array( '<div' ),
</span><span class="cx" style="display: block; padding: 0 10px"> 'Incomplete tag (attributes)' => array( '<div inert title="test"' ),
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -2313,10 +2407,26 @@
</span><span class="cx" style="display: block; padding: 0 10px"> 'Incomplete comment (bogus comment)' => array( '</3 is not a tag' ),
</span><span class="cx" style="display: block; padding: 0 10px"> 'Incomplete DOCTYPE' => array( '<!DOCTYPE html' ),
</span><span class="cx" style="display: block; padding: 0 10px"> 'Partial DOCTYPE' => array( '<!DOCTY' ),
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- 'Incomplete CDATA' => array( '<[CDATA[something inside of here needs to get out' ),
- 'Partial CDATA' => array( '<[CDA' ),
- 'Partially closed CDATA]' => array( '<[CDATA[cannot escape]' ),
- 'Partially closed CDATA]>' => array( '<[CDATA[cannot escape]>' ),
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ 'Incomplete CDATA' => array( '<![CDATA[something inside of here needs to get out' ),
+ 'Partial CDATA' => array( '<![CDA' ),
+ 'Partially closed CDATA]' => array( '<![CDATA[cannot escape]' ),
+ 'Partially closed CDATA]>' => array( '<![CDATA[cannot escape]>' ),
+ 'Unclosed IFRAME' => array( '<iframe><div>' ),
+ 'Unclosed NOEMBED' => array( '<noembed><div>' ),
+ 'Unclosed NOFRAMES' => array( '<noframes><div>' ),
+ 'Unclosed SCRIPT' => array( '<script><div>' ),
+ 'Unclosed STYLE' => array( '<style><div>' ),
+ 'Unclosed TEXTAREA' => array( '<textarea><div>' ),
+ 'Unclosed TITLE' => array( '<title><div>' ),
+ 'Unclosed XMP' => array( '<xmp><div>' ),
+ 'Partially closed IFRAME' => array( '<iframe><div></iframe' ),
+ 'Partially closed NOEMBED' => array( '<noembed><div></noembed' ),
+ 'Partially closed NOFRAMES' => array( '<noframes><div></noframes' ),
+ 'Partially closed SCRIPT' => array( '<script><div></script' ),
+ 'Partially closed STYLE' => array( '<style><div></style' ),
+ 'Partially closed TEXTAREA' => array( '<textarea><div></textarea' ),
+ 'Partially closed TITLE' => array( '<title><div></title' ),
+ 'Partially closed XMP' => array( '<xmp><div></xmp' ),
</ins><span class="cx" style="display: block; padding: 0 10px"> );
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -2415,7 +2525,7 @@
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> public function test_updating_attributes_in_malformed_html( $html, $expected ) {
</span><span class="cx" style="display: block; padding: 0 10px"> $p = new WP_HTML_Tag_Processor( $html );
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $p->next_tag();
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $this->assertTrue( $p->next_tag(), 'Could not find first tag.' );
</ins><span class="cx" style="display: block; padding: 0 10px"> $p->set_attribute( 'foo', 'bar' );
</span><span class="cx" style="display: block; padding: 0 10px"> $p->add_class( 'firstTag' );
</span><span class="cx" style="display: block; padding: 0 10px"> $p->next_tag();
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -2434,8 +2544,6 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * @return array[]
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> public function data_updating_attributes_in_malformed_html() {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $null_byte = chr( 0 );
-
</del><span class="cx" style="display: block; padding: 0 10px"> return array(
</span><span class="cx" style="display: block; padding: 0 10px"> 'Invalid entity inside attribute value' => array(
</span><span class="cx" style="display: block; padding: 0 10px"> 'input' => '<img src="https://s0.wp.com/i/atat.png" title="&; First <title> is ¬it;" TITLE="second title" title="An Imperial &imperial; AT-AT"><span>test</span>',
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -2494,8 +2602,8 @@
</span><span class="cx" style="display: block; padding: 0 10px"> 'expected' => '<hr class="firstTag" foo="bar" id"quo="test"><span class="secondTag">test</span>',
</span><span class="cx" style="display: block; padding: 0 10px"> ),
</span><span class="cx" style="display: block; padding: 0 10px"> 'id without double quotation marks around null byte' => array(
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- 'input' => '<hr id' . $null_byte . 'zero="test"><span>test</span>',
- 'expected' => '<hr class="firstTag" foo="bar" id' . $null_byte . 'zero="test"><span class="secondTag">test</span>',
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ 'input' => "<hr id\x00zero=\"test\"><span>test</span>",
+ 'expected' => "<hr class=\"firstTag\" foo=\"bar\" id\x00zero=\"test\"><span class=\"secondTag\">test</span>",
</ins><span class="cx" style="display: block; padding: 0 10px"> ),
</span><span class="cx" style="display: block; padding: 0 10px"> 'Unexpected > before an attribute' => array(
</span><span class="cx" style="display: block; padding: 0 10px"> 'input' => '<hr >id="test"><span>test</span>',
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -2583,4 +2691,22 @@
</span><span class="cx" style="display: block; padding: 0 10px"> ),
</span><span class="cx" style="display: block; padding: 0 10px"> );
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+
+ /**
+ * @covers WP_HTML_Tag_Processor::next_tag
+ */
+ public function test_handles_malformed_taglike_open_short_html() {
+ $p = new WP_HTML_Tag_Processor( '<' );
+ $result = $p->next_tag();
+ $this->assertFalse( $result, 'Did not handle "<" html properly.' );
+ }
+
+ /**
+ * @covers WP_HTML_Tag_Processor::next_tag
+ */
+ public function test_handles_malformed_taglike_close_short_html() {
+ $p = new WP_HTML_Tag_Processor( '</ ' );
+ $result = $p->next_tag();
+ $this->assertFalse( $result, 'Did not handle "</ " html properly.' );
+ }
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span></span></pre>
</div>
</div>
</body>
</html>