<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[56703] trunk: HTML API: Add class name utilities `has_class()` and `class_list()`.</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { white-space: pre-line; overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta" style="font-size: 105%">
<dt style="float: left; width: 6em; font-weight: bold">Revision</dt> <dd><a style="font-weight: bold" href="https://core.trac.wordpress.org/changeset/56703">56703</a><script type="application/ld+json">{"@context":"http://schema.org","@type":"EmailMessage","description":"Review this Commit","action":{"@type":"ViewAction","url":"https://core.trac.wordpress.org/changeset/56703","name":"Review Commit"}}</script></dd>
<dt style="float: left; width: 6em; font-weight: bold">Author</dt> <dd>Bernhard Reiter</dd>
<dt style="float: left; width: 6em; font-weight: bold">Date</dt> <dd>2023-09-26 09:15:21 +0000 (Tue, 26 Sep 2023)</dd>
</dl>

<pre style='padding-left: 1em; margin: 2em 0; border-left: 2px solid #ccc; line-height: 1.25; font-size: 105%; font-family: sans-serif'>HTML API: Add class name utilities `has_class()` and `class_list()`.

This patch adds two new public methods to the HTML Tag Processor:
 - `has_class()` indicates if a matched tag contains a given CSS class name.
 - `class_list()` returns a generator to iterate over all the class names in a matched tag.

Included in this patch is a refactoring of the internal logic when matching
a tag to reuse the new `has_class()` function. Previously it was relying on
optimized code in the `matches()` function which performed byte-for-byte
class name comparison. With the change in this patch it will perform class
name matching on the decoded value, which might differ if a class attribute
contains character references.

These methods may be useful for running more complicated queries based
on the presence or absence of CSS class names. The use of these methods
avoids the need to manually decode the class attribute as reported by
`$process->get_attribute( 'class' )`.

Props dmsnell.
Fixes <a href="https://core.trac.wordpress.org/ticket/59209">#59209</a>.</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunksrcwpincludeshtmlapiclasswphtmltagprocessorphp">trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php</a></li>
<li><a href="#trunktestsphpunittestshtmlapiwpHtmlTagProcessorphp">trunk/tests/phpunit/tests/html-api/wpHtmlTagProcessor.php</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunksrcwpincludeshtmlapiclasswphtmltagprocessorphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php    2023-09-26 08:18:25 UTC (rev 56702)
+++ trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php      2023-09-26 09:15:21 UTC (rev 56703)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -627,6 +627,94 @@
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">        /**
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         * Generator for a foreach loop to step through each class name for the matched tag.
+        *
+        * This generator function is designed to be used inside a "foreach" loop.
+        *
+        * Example:
+        *
+        *     $p = new WP_HTML_Tag_Processor( "<div class='free &lt;egg&lt;\tlang-en'>" );
+        *     $p->next_tag();
+        *     foreach ( $p->class_list() as $class_name ) {
+        *         echo "{$class_name} ";
+        *     }
+        *     // Outputs: "free <egg> lang-en "
+        *
+        * @since 6.4.0
+        */
+       public function class_list() {
+               /** @var string $class contains the string value of the class attribute, with character references decoded. */
+               $class = $this->get_attribute( 'class' );
+
+               if ( ! is_string( $class ) ) {
+                       return;
+               }
+
+               $seen = array();
+
+               $at = 0;
+               while ( $at < strlen( $class ) ) {
+                       // Skip past any initial boundary characters.
+                       $at += strspn( $class, " \t\f\r\n", $at );
+                       if ( $at >= strlen( $class ) ) {
+                               return;
+                       }
+
+                       // Find the byte length until the next boundary.
+                       $length = strcspn( $class, " \t\f\r\n", $at );
+                       if ( 0 === $length ) {
+                               return;
+                       }
+
+                       /*
+                        * CSS class names are case-insensitive in the ASCII range.
+                        *
+                        * @see https://www.w3.org/TR/CSS2/syndata.html#x1
+                        */
+                       $name = strtolower( substr( $class, $at, $length ) );
+                       $at  += $length;
+
+                       /*
+                        * It's expected that the number of class names for a given tag is relatively small.
+                        * Given this, it is probably faster overall to scan an array for a value rather
+                        * than to use the class name as a key and check if it's a key of $seen.
+                        */
+                       if ( in_array( $name, $seen, true ) ) {
+                               continue;
+                       }
+
+                       $seen[] = $name;
+                       yield $name;
+               }
+       }
+
+
+       /**
+        * Returns if a matched tag contains the given ASCII case-insensitive class name.
+        *
+        * @since 6.4.0
+        *
+        * @param string $wanted_class Look for this CSS class name, ASCII case-insensitive.
+        * @return bool|null Whether the matched tag contains the given class name, or null if not matched.
+        */
+       public function has_class( $wanted_class ) {
+               if ( ! $this->tag_name_starts_at ) {
+                       return null;
+               }
+
+               $wanted_class = strtolower( $wanted_class );
+
+               foreach ( $this->class_list() as $class_name ) {
+                       if ( $class_name === $wanted_class ) {
+                               return true;
+                       }
+               }
+
+               return false;
+       }
+
+
+       /**
</ins><span class="cx" style="display: block; padding: 0 10px">          * Sets a bookmark in the HTML document.
</span><span class="cx" style="display: block; padding: 0 10px">         *
</span><span class="cx" style="display: block; padding: 0 10px">         * Bookmarks represent specific places or tokens in the HTML
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -2347,67 +2435,10 @@
</span><span class="cx" style="display: block; padding: 0 10px">                        }
</span><span class="cx" style="display: block; padding: 0 10px">                }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                $needs_class_name = null !== $this->sought_class_name;
-
-               if ( $needs_class_name && ! isset( $this->attributes['class'] ) ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         if ( null !== $this->sought_class_name && ! $this->has_class( $this->sought_class_name ) ) {
</ins><span class="cx" style="display: block; padding: 0 10px">                         return false;
</span><span class="cx" style="display: block; padding: 0 10px">                }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                /*
-                * Match byte-for-byte (case-sensitive and encoding-form-sensitive) on the class name.
-                *
-                * This will overlook certain classes that exist in other lexical variations
-                * than was supplied to the search query, but requires more complicated searching.
-                */
-               if ( $needs_class_name ) {
-                       $class_start = $this->attributes['class']->value_starts_at;
-                       $class_end   = $class_start + $this->attributes['class']->value_length;
-                       $class_at    = $class_start;
-
-                       /*
-                        * Ensure that boundaries surround the class name to avoid matching on
-                        * substrings of a longer name. For example, the sequence "not-odd"
-                        * should not match for the class "odd" even though "odd" is found
-                        * within the class attribute text.
-                        *
-                        * See https://html.spec.whatwg.org/#attributes-3
-                        * See https://html.spec.whatwg.org/#space-separated-tokens
-                        */
-                       while (
-                               // phpcs:ignore WordPress.CodeAnalysis.AssignmentInCondition.FoundInWhileCondition
-                               false !== ( $class_at = strpos( $this->html, $this->sought_class_name, $class_at ) ) &&
-                               $class_at < $class_end
-                       ) {
-                               /*
-                                * Verify this class starts at a boundary.
-                                */
-                               if ( $class_at > $class_start ) {
-                                       $character = $this->html[ $class_at - 1 ];
-
-                                       if ( ' ' !== $character && "\t" !== $character && "\f" !== $character && "\r" !== $character && "\n" !== $character ) {
-                                               $class_at += strlen( $this->sought_class_name );
-                                               continue;
-                                       }
-                               }
-
-                               /*
-                                * Verify this class ends at a boundary as well.
-                                */
-                               if ( $class_at + strlen( $this->sought_class_name ) < $class_end ) {
-                                       $character = $this->html[ $class_at + strlen( $this->sought_class_name ) ];
-
-                                       if ( ' ' !== $character && "\t" !== $character && "\f" !== $character && "\r" !== $character && "\n" !== $character ) {
-                                               $class_at += strlen( $this->sought_class_name );
-                                               continue;
-                                       }
-                               }
-
-                               return true;
-                       }
-
-                       return false;
-               }
-
</del><span class="cx" style="display: block; padding: 0 10px">                 return true;
</span><span class="cx" style="display: block; padding: 0 10px">        }
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span></span></pre></div>
<a id="trunktestsphpunittestshtmlapiwpHtmlTagProcessorphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/tests/phpunit/tests/html-api/wpHtmlTagProcessor.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/tests/phpunit/tests/html-api/wpHtmlTagProcessor.php 2023-09-26 08:18:25 UTC (rev 56702)
+++ trunk/tests/phpunit/tests/html-api/wpHtmlTagProcessor.php   2023-09-26 09:15:21 UTC (rev 56703)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -499,6 +499,17 @@
</span><span class="cx" style="display: block; padding: 0 10px">        }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">        /**
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         * @ticket 59209
+        *
+        * @covers WP_HTML_Tag_Processor::next_tag
+        */
+       public function test_next_tag_matches_decoded_class_names() {
+               $p = new WP_HTML_Tag_Processor( '<div class="&lt;egg&gt;">' );
+
+               $this->assertTrue( $p->next_tag( array( 'class_name' => '<egg>' ) ), 'Failed to find tag with HTML-encoded class name.' );
+       }
+
+       /**
</ins><span class="cx" style="display: block; padding: 0 10px">          * @ticket 56299
</span><span class="cx" style="display: block; padding: 0 10px">         * @ticket 57852
</span><span class="cx" style="display: block; padding: 0 10px">         *
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1958,6 +1969,150 @@
</span><span class="cx" style="display: block; padding: 0 10px">        }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">        /**
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         * @ticket 59209
+        *
+        * @covers WP_HTML_Tag_Processor::class_list
+        */
+       public function test_class_list_empty_when_missing_class() {
+               $p = new WP_HTML_Tag_Processor( '<div>' );
+               $p->next_tag();
+
+               $found_classes = false;
+               foreach ( $p->class_list() as $class ) {
+                       $found_classes = true;
+               }
+
+               $this->assertFalse( $found_classes, 'Found classes when none exist.' );
+       }
+
+       /**
+        * @ticket 59209
+        *
+        * @covers WP_HTML_Tag_Processor::class_list
+        */
+       public function test_class_list_empty_when_class_is_boolean() {
+               $p = new WP_HTML_Tag_Processor( '<div class>' );
+               $p->next_tag();
+
+               $found_classes = false;
+               foreach ( $p->class_list() as $class ) {
+                       $found_classes = true;
+               }
+
+               $this->assertFalse( $found_classes, 'Found classes when none exist.' );
+       }
+
+       /**
+        * @ticket 59209
+        *
+        * @covers WP_HTML_Tag_Processor::class_list
+        */
+       public function test_class_list_empty_when_class_is_empty() {
+               $p = new WP_HTML_Tag_Processor( '<div class="">' );
+               $p->next_tag();
+
+               $found_classes = false;
+               foreach ( $p->class_list() as $class ) {
+                       $found_classes = true;
+               }
+
+               $this->assertFalse( $found_classes, 'Found classes when none exist.' );
+       }
+
+       /**
+        * @ticket 59209
+        *
+        * @covers WP_HTML_Tag_Processor::class_list
+        */
+       public function test_class_list_visits_each_class_in_order() {
+               $p = new WP_HTML_Tag_Processor( '<div class="one two three">' );
+               $p->next_tag();
+
+               $found_classes = array();
+               foreach ( $p->class_list() as $class ) {
+                       $found_classes[] = $class;
+               }
+
+               $this->assertSame( array( 'one', 'two', 'three' ), $found_classes, 'Failed to visit the class names in their original order.' );
+       }
+
+       /**
+        * @ticket 59209
+        *
+        * @covers WP_HTML_Tag_Processor::class_list
+        */
+       public function test_class_list_decodes_class_names() {
+               $p = new WP_HTML_Tag_Processor( '<div class="&notin;-class &lt;egg&gt; &#xff03;">' );
+               $p->next_tag();
+
+               $found_classes = array();
+               foreach ( $p->class_list() as $class ) {
+                       $found_classes[] = $class;
+               }
+
+               $this->assertSame( array( '∉-class', '<egg>', "\u{ff03}" ), $found_classes, 'Failed to report class names in their decoded form.' );
+       }
+
+       /**
+        * @ticket 59209
+        *
+        * @covers WP_HTML_Tag_Processor::class_list
+        */
+       public function test_class_list_visits_unique_class_names_only_once() {
+               $p = new WP_HTML_Tag_Processor( '<div class="one one &#x6f;ne">' );
+               $p->next_tag();
+
+               $found_classes = array();
+               foreach ( $p->class_list() as $class ) {
+                       $found_classes[] = $class;
+               }
+
+               $this->assertSame( array( 'one' ), $found_classes, 'Visited multiple copies of the same class name when it should have skipped the duplicates.' );
+       }
+
+       /**
+        * @ticket 59209
+        *
+        * @covers WP_HTML_Tag_Processor::has_class
+        *
+        * @dataProvider data_html_with_variations_of_class_values_and_sought_class_names
+        *
+        * @param string $html         Contains a tag optionally containing a `class` attribute.
+        * @param string $sought_class Name of class to find in the input tag's `class`.
+        * @param bool   $has_class    Whether the sought class exists in the given HTML.
+        */
+       public function test_has_class_handles_expected_class_name_variations( $html, $sought_class, $has_class ) {
+               $p = new WP_HTML_Tag_Processor( $html );
+               $p->next_tag();
+
+               if ( $has_class ) {
+                       $this->assertTrue( $p->has_class( $sought_class ), "Failed to find expected class {$sought_class}." );
+               } else {
+                       $this->assertFalse( $p->has_class( $sought_class ), "Found class {$sought_class} when it doesn't exist." );
+               }
+       }
+
+       /**
+        * Data provider.
+        *
+        * @return array[]
+        */
+       public function data_html_with_variations_of_class_values_and_sought_class_names() {
+               return array(
+                       'Tag without any classes'      => array( '<div>', 'foo', false ),
+                       'Tag with boolean class'       => array( '<img class>', 'foo', false ),
+                       'Tag with empty class'         => array( '<p class="">', 'foo', false ),
+                       'Tag with exact match'         => array( '<button class="foo">', 'foo', true ),
+                       'Tag with duplicate matches'   => array( '<span class="foo bar foo">', 'foo', true ),
+                       'Tag with non-initial match'   => array( '<section class="bar foo">', 'foo', true ),
+                       'Tag with encoded match'       => array( '<main class="&hellip;">', '…', true ),
+                       'Class with tab separator'     => array( "<div class='one\ttwo'>", 'two', true ),
+                       'Class with newline separator' => array( "<div class='one\ntwo\n'>", 'two', true ),
+                       'False duplicate attribute'    => array( '<img class=dog class=cat>', 'cat', false ),
+               );
+       }
+
+       /**
</ins><span class="cx" style="display: block; padding: 0 10px">          * Ensures that the invalid comment closing syntax "--!>" properly closes a comment.
</span><span class="cx" style="display: block; padding: 0 10px">         *
</span><span class="cx" style="display: block; padding: 0 10px">         * @ticket 58007
</span></span></pre>
</div>
</div>

</body>
</html>