<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[58712] trunk/tests/phpunit/tests/html-api/wpHtmlProcessorHtml5lib.php: HTML API: Join successive text nodes in html5lib test representation.</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { white-space: pre-line; overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta" style="font-size: 105%">
<dt style="float: left; width: 6em; font-weight: bold">Revision</dt> <dd><a style="font-weight: bold" href="https://core.trac.wordpress.org/changeset/58712">58712</a><script type="application/ld+json">{"@context":"http://schema.org","@type":"EmailMessage","description":"Review this Commit","action":{"@type":"ViewAction","url":"https://core.trac.wordpress.org/changeset/58712","name":"Review Commit"}}</script></dd>
<dt style="float: left; width: 6em; font-weight: bold">Author</dt> <dd>dmsnell</dd>
<dt style="float: left; width: 6em; font-weight: bold">Date</dt> <dd>2024-07-12 21:58:20 +0000 (Fri, 12 Jul 2024)</dd>
</dl>

<pre style='padding-left: 1em; margin: 2em 0; border-left: 2px solid #ccc; line-height: 1.25; font-size: 105%; font-family: sans-serif'>HTML API: Join successive text nodes in html5lib test representation.

Many tests from the html5lib test suite fail because of differences in
text handling between a DOM API and the HTML API, even though the
semantics of the parse are equivalent. For example, it's possible in
the HTML API to read multiple successive text nodes when the tokens
between them are ignored.

The test suite didn't account for this and so was failing tests. This
patch improves the construction of the representation to compare
against the test suite so that those tests don't fail inaccurately.

Developed in https://github.com/WordPress/wordpress-develop/pull/6984
Discussed in https://core.trac.wordpress.org/ticket/61576

Props bernhard-reiter, dmsnell, jonsurrell.
See <a href="https://core.trac.wordpress.org/ticket/61576">#61576</a>.</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunktestsphpunittestshtmlapiwpHtmlProcessorHtml5libphp">trunk/tests/phpunit/tests/html-api/wpHtmlProcessorHtml5lib.php</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunktestsphpunittestshtmlapiwpHtmlProcessorHtml5libphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/tests/phpunit/tests/html-api/wpHtmlProcessorHtml5lib.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/tests/phpunit/tests/html-api/wpHtmlProcessorHtml5lib.php    2024-07-12 19:59:34 UTC (rev 58711)
+++ trunk/tests/phpunit/tests/html-api/wpHtmlProcessorHtml5lib.php      2024-07-12 21:58:20 UTC (rev 58712)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -40,7 +40,6 @@
</span><span class="cx" style="display: block; padding: 0 10px">                'menuitem-element/line0012' => 'Bug.',
</span><span class="cx" style="display: block; padding: 0 10px">                'tests1/line0342'           => "Closing P tag implicitly creates opener, which we don't visit.",
</span><span class="cx" style="display: block; padding: 0 10px">                'tests1/line0720'           => 'Unimplemented: Reconstruction of active formatting elements.',
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                'tests1/line0833'           => 'Bug.',
</del><span class="cx" style="display: block; padding: 0 10px">                 'tests15/line0001'          => 'Unimplemented: Reconstruction of active formatting elements.',
</span><span class="cx" style="display: block; padding: 0 10px">                'tests15/line0022'          => 'Unimplemented: Reconstruction of active formatting elements.',
</span><span class="cx" style="display: block; padding: 0 10px">                'tests2/line0650'           => 'Whitespace only test never enters "in body" parsing mode.',
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -51,15 +50,8 @@
</span><span class="cx" style="display: block; padding: 0 10px">                'tests23/line0101'          => 'Unimplemented: Reconstruction of active formatting elements.',
</span><span class="cx" style="display: block; padding: 0 10px">                'tests25/line0169'          => 'Bug.',
</span><span class="cx" style="display: block; padding: 0 10px">                'tests26/line0263'          => 'Bug: An active formatting element should be created for a trailing text node.',
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                'tests7/line0354'           => 'Bug.',
-               'tests8/line0001'           => 'Bug.',
-               'tests8/line0020'           => 'Bug.',
-               'tests8/line0037'           => 'Bug.',
-               'tests8/line0052'           => 'Bug.',
-               'webkit01/line0174'         => 'Bug.',
</del><span class="cx" style="display: block; padding: 0 10px">         );
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-
</del><span class="cx" style="display: block; padding: 0 10px">         /**
</span><span class="cx" style="display: block; padding: 0 10px">         * Verify the parsing results of the HTML Processor against the
</span><span class="cx" style="display: block; padding: 0 10px">         * test cases in the Html5lib tests project.
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -160,6 +152,8 @@
</span><span class="cx" style="display: block; padding: 0 10px">                // Initially, assume we're 2 levels deep at: html > body > [position]
</span><span class="cx" style="display: block; padding: 0 10px">                $indent_level = 2;
</span><span class="cx" style="display: block; padding: 0 10px">                $indent       = '  ';
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                $was_text     = null;
+               $text_node    = '';
</ins><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                while ( $processor->next_token() ) {
</span><span class="cx" style="display: block; padding: 0 10px">                        if ( ! is_null( $processor->get_last_error() ) ) {
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -166,6 +160,12 @@
</span><span class="cx" style="display: block; padding: 0 10px">                                return null;
</span><span class="cx" style="display: block; padding: 0 10px">                        }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                        if ( $was_text && '#text' !== $processor->get_token_name() ) {
+                               $output   .= "{$text_node}\"\n";
+                               $was_text  = false;
+                               $text_node = '';
+                       }
+
</ins><span class="cx" style="display: block; padding: 0 10px">                         switch ( $processor->get_token_type() ) {
</span><span class="cx" style="display: block; padding: 0 10px">                                case '#tag':
</span><span class="cx" style="display: block; padding: 0 10px">                                        $tag_name = strtolower( $processor->get_tag() );
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -198,12 +198,27 @@
</span><span class="cx" style="display: block; padding: 0 10px">                                                        }
</span><span class="cx" style="display: block; padding: 0 10px">                                                        $output .= str_repeat( $indent, $tag_indent + 1 ) . "{$attribute_name}=\"{$val}\"\n";
</span><span class="cx" style="display: block; padding: 0 10px">                                                }
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+
+                                               // Self-contained tags contain their inner contents as modifiable text.
+                                               $modifiable_text = $processor->get_modifiable_text();
+                                               if ( '' !== $modifiable_text ) {
+                                                       $was_text = true;
+                                                       if ( '' === $text_node ) {
+                                                               $text_node = str_repeat( $indent, $indent_level ) . '"';
+                                                       }
+                                                       $text_node .= $modifiable_text;
+                                                       --$indent_level;
+                                               }
</ins><span class="cx" style="display: block; padding: 0 10px">                                         }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                                        break;
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                                case '#text':
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-                                        $output .= str_repeat( $indent, $indent_level ) . "\"{$processor->get_modifiable_text()}\"\n";
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                                 $was_text = true;
+                                       if ( '' === $text_node ) {
+                                               $text_node .= str_repeat( $indent, $indent_level ) . '"';
+                                       }
+                                       $text_node .= $processor->get_modifiable_text();
</ins><span class="cx" style="display: block; padding: 0 10px">                                         break;
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                                case '#comment':
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -238,6 +253,10 @@
</span><span class="cx" style="display: block; padding: 0 10px">                        return null;
</span><span class="cx" style="display: block; padding: 0 10px">                }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                if ( '' !== $text_node ) {
+                       $output .= "${text_node}\"\n";
+               }
+
</ins><span class="cx" style="display: block; padding: 0 10px">                 // Tests always end with a trailing newline.
</span><span class="cx" style="display: block; padding: 0 10px">                return $output . "\n";
</span><span class="cx" style="display: block; padding: 0 10px">        }
</span></span></pre>
</div>
</div>

</body>
</html>