<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[59075] trunk: HTML API: Add `get_full_comment_text()` method.</title>
</head>
<body>
<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; }
#msg dl a { font-weight: bold}
#msg dl a:link { color:#fc3; }
#msg dl a:active { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { white-space: pre-line; overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta" style="font-size: 105%">
<dt style="float: left; width: 6em; font-weight: bold">Revision</dt> <dd><a style="font-weight: bold" href="https://core.trac.wordpress.org/changeset/59075">59075</a><script type="application/ld+json">{"@context":"http://schema.org","@type":"EmailMessage","description":"Review this Commit","action":{"@type":"ViewAction","url":"https://core.trac.wordpress.org/changeset/59075","name":"Review Commit"}}</script></dd>
<dt style="float: left; width: 6em; font-weight: bold">Author</dt> <dd>dmsnell</dd>
<dt style="float: left; width: 6em; font-weight: bold">Date</dt> <dd>2024-09-20 20:21:59 +0000 (Fri, 20 Sep 2024)</dd>
</dl>
<pre style='padding-left: 1em; margin: 2em 0; border-left: 2px solid #ccc; line-height: 1.25; font-size: 105%; font-family: sans-serif'>HTML API: Add `get_full_comment_text()` method.
Previously, there were a few cases where the modifiable text read from an HTML comment differs slightly from the parsed value of its inner text in a browser. This is due to the specific way that invalid HTML syntax tokens become "bogus comments."
This patch introduces a new method to the Tag Processor to allow differentiating these specific cases, such as when copying or serializing HTML from one source to another. Similar code has already been in use in the html5lib tests, and this patch simplifies the test runner, evidencing the fact that this method was already needed.
Developed in https://github.com/wordpress/wordpress-develop/pull/7342
Discussed in https://core.trac.wordpress.org/ticket/62036
Props dmsnell, jonsurrell.
See <a href="https://core.trac.wordpress.org/ticket/62036">#62036</a>.</pre>
<h3>Modified Paths</h3>
<ul>
<li><a href="#trunksrcwpincludeshtmlapiclasswphtmltagprocessorphp">trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php</a></li>
<li><a href="#trunktestsphpunittestshtmlapiwpHtmlProcessorHtml5libphp">trunk/tests/phpunit/tests/html-api/wpHtmlProcessorHtml5lib.php</a></li>
</ul>
</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunksrcwpincludeshtmlapiclasswphtmltagprocessorphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php 2024-09-20 14:07:11 UTC (rev 59074)
+++ trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php 2024-09-20 20:21:59 UTC (rev 59075)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -3386,6 +3386,58 @@
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * Returns the text of a matched comment or null if not on a comment type node.
+ *
+ * This method returns the entire text content of a comment node as it
+ * would appear in the browser.
+ *
+ * This differs from {@see ::get_modifiable_text()} in that certain comment
+ * types in the HTML API cannot allow their entire comment text content to
+ * be modified. Namely, "bogus comments" of the form `<?not allowed in html>`
+ * will create a comment whose text content starts with `?`. Note that if
+ * that character were modified, it would be possible to change the node
+ * type.
+ *
+ * @since 6.7.0
+ *
+ * @return string|null The comment text as it would appear in the browser or null
+ * if not on a comment type node.
+ */
+ public function get_full_comment_text(): ?string {
+ if ( self::STATE_FUNKY_COMMENT === $this->parser_state ) {
+ return $this->get_modifiable_text();
+ }
+
+ if ( self::STATE_COMMENT !== $this->parser_state ) {
+ return null;
+ }
+
+ switch ( $this->get_comment_type() ) {
+ case self::COMMENT_AS_HTML_COMMENT:
+ case self::COMMENT_AS_ABRUPTLY_CLOSED_COMMENT:
+ return $this->get_modifiable_text();
+
+ case self::COMMENT_AS_CDATA_LOOKALIKE:
+ return "[CDATA[{$this->get_modifiable_text()}]]";
+
+ case self::COMMENT_AS_PI_NODE_LOOKALIKE:
+ return "?{$this->get_tag()}{$this->get_modifiable_text()}?";
+
+ /*
+ * This represents "bogus comments state" from HTML tokenization.
+ * This can be entered by `<?` or `<!`, where `?` is included in
+ * the comment text but `!` is not.
+ */
+ case self::COMMENT_AS_INVALID_HTML:
+ $preceding_character = $this->html[ $this->text_starts_at - 1 ];
+ $comment_start = '?' === $preceding_character ? '?' : '';
+ return "{$comment_start}{$this->get_modifiable_text()}";
+ }
+
+ return null;
+ }
+
+ /**
</ins><span class="cx" style="display: block; padding: 0 10px"> * Subdivides a matched text node, splitting NULL byte sequences and decoded whitespace as
</span><span class="cx" style="display: block; padding: 0 10px"> * distinct nodes prefixes.
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span></span></pre></div>
<a id="trunktestsphpunittestshtmlapiwpHtmlProcessorHtml5libphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/tests/phpunit/tests/html-api/wpHtmlProcessorHtml5lib.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/tests/phpunit/tests/html-api/wpHtmlProcessorHtml5lib.php 2024-09-20 14:07:11 UTC (rev 59074)
+++ trunk/tests/phpunit/tests/html-api/wpHtmlProcessorHtml5lib.php 2024-09-20 20:21:59 UTC (rev 59075)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -27,20 +27,17 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * Skip specific tests that may not be supported or have known issues.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> const SKIP_TESTS = array(
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- 'comments01/line0155' => 'Unimplemented: Need to access raw comment text on non-normative comments.',
- 'comments01/line0169' => 'Unimplemented: Need to access raw comment text on non-normative comments.',
- 'html5test-com/line0129' => 'Unimplemented: Need to access raw comment text on non-normative comments.',
- 'noscript01/line0014' => 'Unimplemented: This parser does not add missing attributes to existing HTML or BODY tags.',
- 'tests14/line0022' => 'Unimplemented: This parser does not add missing attributes to existing HTML or BODY tags.',
- 'tests14/line0055' => 'Unimplemented: This parser does not add missing attributes to existing HTML or BODY tags.',
- 'tests19/line0488' => 'Unimplemented: This parser does not add missing attributes to existing HTML or BODY tags.',
- 'tests19/line0500' => 'Unimplemented: This parser does not add missing attributes to existing HTML or BODY tags.',
- 'tests19/line1079' => 'Unimplemented: This parser does not add missing attributes to existing HTML or BODY tags.',
- 'tests2/line0207' => 'Unimplemented: This parser does not add missing attributes to existing HTML or BODY tags.',
- 'tests2/line0686' => 'Unimplemented: This parser does not add missing attributes to existing HTML or BODY tags.',
- 'tests2/line0697' => 'Unimplemented: This parser does not add missing attributes to existing HTML or BODY tags.',
- 'tests2/line0709' => 'Unimplemented: This parser does not add missing attributes to existing HTML or BODY tags.',
- 'webkit01/line0231' => 'Unimplemented: This parser does not add missing attributes to existing HTML or BODY tags.',
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ 'noscript01/line0014' => 'Unimplemented: This parser does not add missing attributes to existing HTML or BODY tags.',
+ 'tests14/line0022' => 'Unimplemented: This parser does not add missing attributes to existing HTML or BODY tags.',
+ 'tests14/line0055' => 'Unimplemented: This parser does not add missing attributes to existing HTML or BODY tags.',
+ 'tests19/line0488' => 'Unimplemented: This parser does not add missing attributes to existing HTML or BODY tags.',
+ 'tests19/line0500' => 'Unimplemented: This parser does not add missing attributes to existing HTML or BODY tags.',
+ 'tests19/line1079' => 'Unimplemented: This parser does not add missing attributes to existing HTML or BODY tags.',
+ 'tests2/line0207' => 'Unimplemented: This parser does not add missing attributes to existing HTML or BODY tags.',
+ 'tests2/line0686' => 'Unimplemented: This parser does not add missing attributes to existing HTML or BODY tags.',
+ 'tests2/line0697' => 'Unimplemented: This parser does not add missing attributes to existing HTML or BODY tags.',
+ 'tests2/line0709' => 'Unimplemented: This parser does not add missing attributes to existing HTML or BODY tags.',
+ 'webkit01/line0231' => 'Unimplemented: This parser does not add missing attributes to existing HTML or BODY tags.',
</ins><span class="cx" style="display: block; padding: 0 10px"> );
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -315,26 +312,8 @@
</span><span class="cx" style="display: block; padding: 0 10px"> break;
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> case '#comment':
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- switch ( $processor->get_comment_type() ) {
- case WP_HTML_Processor::COMMENT_AS_ABRUPTLY_CLOSED_COMMENT:
- case WP_HTML_Processor::COMMENT_AS_HTML_COMMENT:
- case WP_HTML_Processor::COMMENT_AS_INVALID_HTML:
- $comment_text_content = $processor->get_modifiable_text();
- break;
-
- case WP_HTML_Processor::COMMENT_AS_CDATA_LOOKALIKE:
- $comment_text_content = "[CDATA[{$processor->get_modifiable_text()}]]";
- break;
-
- case WP_HTML_Processor::COMMENT_AS_PI_NODE_LOOKALIKE:
- $comment_text_content = "?{$processor->get_tag()}{$processor->get_modifiable_text()}?";
- break;
-
- default:
- throw new Error( "Unhandled comment type for tree construction: {$processor->get_comment_type()}" );
- }
</del><span class="cx" style="display: block; padding: 0 10px"> // Comments must be "<" then "!-- " then the data then " -->".
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $output .= str_repeat( self::TREE_INDENT, $indent_level ) . "<!-- {$comment_text_content} -->\n";
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $output .= str_repeat( self::TREE_INDENT, $indent_level ) . "<!-- {$processor->get_full_comment_text()} -->\n";
</ins><span class="cx" style="display: block; padding: 0 10px"> break;
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> default:
</span></span></pre>
</div>
</div>
</body>
</html>