<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[59467] trunk: HTML API: Allow more contexts in `create_fragment`.</title>
</head>
<body>
<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; }
#msg dl a { font-weight: bold}
#msg dl a:link { color:#fc3; }
#msg dl a:active { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { white-space: pre-line; overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta" style="font-size: 105%">
<dt style="float: left; width: 6em; font-weight: bold">Revision</dt> <dd><a style="font-weight: bold" href="https://core.trac.wordpress.org/changeset/59467">59467</a><script type="application/ld+json">{"@context":"http://schema.org","@type":"EmailMessage","description":"Review this Commit","action":{"@type":"ViewAction","url":"https://core.trac.wordpress.org/changeset/59467","name":"Review Commit"}}</script></dd>
<dt style="float: left; width: 6em; font-weight: bold">Author</dt> <dd>Bernhard Reiter</dd>
<dt style="float: left; width: 6em; font-weight: bold">Date</dt> <dd>2024-11-27 14:33:46 +0000 (Wed, 27 Nov 2024)</dd>
</dl>
<pre style='padding-left: 1em; margin: 2em 0; border-left: 2px solid #ccc; line-height: 1.25; font-size: 105%; font-family: sans-serif'>HTML API: Allow more contexts in `create_fragment`.
This changeset modifies `WP_HTML_Processor::create_fragment( $html, $context )` to use a full processor and `create_fragment_at_node` instead of the other way around. This makes more sense and makes the main factory methods more clear, where the state required for fragments is set up in `create_fragment_at_node` instead of in both `create_fragment` and `create_fragment_at_current_node`.
This allows for more HTML contexts to be provided to the basic `create_fragment` where the provided context HTML is appended to `<!DOCTYPE html>`, a full processor is created, the last tag opener is found, and a fragment parser is created at that node via `create_fragment_at_current_node`.
The HTML5lib tests are updated accordingly to use this new method to create fragments.
Props jonsurrell, dmsnell, bernhard-reiter.
Fixes <a href="https://core.trac.wordpress.org/ticket/62584">#62584</a>.</pre>
<h3>Modified Paths</h3>
<ul>
<li><a href="#trunksrcwpincludeshtmlapiclasswphtmlprocessorphp">trunk/src/wp-includes/html-api/class-wp-html-processor.php</a></li>
<li><a href="#trunktestsphpunittestshtmlapiwpHtmlProcessorphp">trunk/tests/phpunit/tests/html-api/wpHtmlProcessor.php</a></li>
<li><a href="#trunktestsphpunittestshtmlapiwpHtmlProcessorHtml5libphp">trunk/tests/phpunit/tests/html-api/wpHtmlProcessorHtml5lib.php</a></li>
</ul>
<h3>Added Paths</h3>
<ul>
<li><a href="#trunktestsphpunittestshtmlapiwpHtmlProcessorFragmentParsingphp">trunk/tests/phpunit/tests/html-api/wpHtmlProcessorFragmentParsing.php</a></li>
</ul>
</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunksrcwpincludeshtmlapiclasswphtmlprocessorphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/src/wp-includes/html-api/class-wp-html-processor.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/src/wp-includes/html-api/class-wp-html-processor.php 2024-11-27 14:28:55 UTC (rev 59466)
+++ trunk/src/wp-includes/html-api/class-wp-html-processor.php 2024-11-27 14:33:46 UTC (rev 59467)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -279,51 +279,62 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * form is provided because a context element may have attributes that
</span><span class="cx" style="display: block; padding: 0 10px"> * impact the parse, such as with a SCRIPT tag and its `type` attribute.
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- * ## Current HTML Support
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * Example:
</ins><span class="cx" style="display: block; padding: 0 10px"> *
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- * - The only supported context is `<body>`, which is the default value.
- * - The only supported document encoding is `UTF-8`, which is the default value.
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * // Usually, snippets of HTML ought to be processed in the default `<body>` context:
+ * $processor = WP_HTML_Processor::create_fragment( '<p>Hi</p>' );
</ins><span class="cx" style="display: block; padding: 0 10px"> *
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * // Some fragments should be processed in the correct context like this SVG:
+ * $processor = WP_HTML_Processor::create_fragment( '<rect width="10" height="10" />', '<svg>' );
+ *
+ * // This fragment with TD tags should be processed in a TR context:
+ * $processor = WP_HTML_Processor::create_fragment(
+ * '<td>1<td>2<td>3',
+ * '<table><tbody><tr>'
+ * );
+ *
+ * In order to create a fragment processor at the correct location, the
+ * provided fragment will be processed as part of a full HTML document.
+ * The processor will search for the last opener tag in the document and
+ * create a fragment processor at that location. The document will be
+ * forced into "no-quirks" mode by including the HTML5 doctype.
+ *
+ * For advanced usage and precise control over the context element, use
+ * `WP_HTML_Processor::create_full_processor()` and
+ * `WP_HTML_Processor::create_fragment_at_current_node()`.
+ *
+ * UTF-8 is the only allowed encoding. If working with a document that
+ * isn't UTF-8, first convert the document to UTF-8, then pass in the
+ * converted HTML.
+ *
</ins><span class="cx" style="display: block; padding: 0 10px"> * @since 6.4.0
</span><span class="cx" style="display: block; padding: 0 10px"> * @since 6.6.0 Returns `static` instead of `self` so it can create subclass instances.
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * @since 6.8.0 Can create fragments with any context element.
</ins><span class="cx" style="display: block; padding: 0 10px"> *
</span><span class="cx" style="display: block; padding: 0 10px"> * @param string $html Input HTML fragment to process.
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- * @param string $context Context element for the fragment, must be default of `<body>`.
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * @param string $context Context element for the fragment. Defaults to `<body>`.
</ins><span class="cx" style="display: block; padding: 0 10px"> * @param string $encoding Text encoding of the document; must be default of 'UTF-8'.
</span><span class="cx" style="display: block; padding: 0 10px"> * @return static|null The created processor if successful, otherwise null.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> public static function create_fragment( $html, $context = '<body>', $encoding = 'UTF-8' ) {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( '<body>' !== $context || 'UTF-8' !== $encoding ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $context_processor = static::create_full_parser( "<!DOCTYPE html>{$context}", $encoding );
+ if ( null === $context_processor ) {
</ins><span class="cx" style="display: block; padding: 0 10px"> return null;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $processor = new static( $html, self::CONSTRUCTOR_UNLOCK_CODE );
- $processor->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
- $processor->state->encoding = $encoding;
- $processor->state->encoding_confidence = 'certain';
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ while ( $context_processor->next_tag() ) {
+ $context_processor->set_bookmark( 'final_node' );
+ }
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- // @todo Create "fake" bookmarks for non-existent but implied nodes.
- $processor->bookmarks['root-node'] = new WP_HTML_Span( 0, 0 );
- $processor->bookmarks['context-node'] = new WP_HTML_Span( 0, 0 );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if (
+ ! $context_processor->has_bookmark( 'final_node' ) ||
+ ! $context_processor->seek( 'final_node' )
+ ) {
+ _doing_it_wrong( __METHOD__, __( 'No valid context element was detected.' ), '6.8.0' );
+ return null;
+ }
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $root_node = new WP_HTML_Token(
- 'root-node',
- 'HTML',
- false
- );
-
- $processor->state->stack_of_open_elements->push( $root_node );
-
- $context_node = new WP_HTML_Token(
- 'context-node',
- 'BODY',
- false
- );
-
- $processor->context_node = $context_node;
- $processor->breadcrumbs = array( 'HTML', $context_node->node_name );
-
- return $processor;
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ return $context_processor->create_fragment_at_current_node( $html );
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -333,9 +344,9 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * entire HTML document from start to finish. Consider a fragment parser with
</span><span class="cx" style="display: block; padding: 0 10px"> * a context node of `<body>`.
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- * Since UTF-8 is the only currently-accepted charset, if working with a
- * document that isn't UTF-8, it's important to convert the document before
- * creating the processor: pass in the converted HTML.
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * UTF-8 is the only allowed encoding. If working with a document that
+ * isn't UTF-8, first convert the document to UTF-8, then pass in the
+ * converted HTML.
</ins><span class="cx" style="display: block; padding: 0 10px"> *
</span><span class="cx" style="display: block; padding: 0 10px"> * @param string $html Input HTML document to process.
</span><span class="cx" style="display: block; padding: 0 10px"> * @param string|null $known_definite_encoding Optional. If provided, specifies the charset used
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -459,16 +470,37 @@
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><span class="cx" style="display: block; padding: 0 10px"> * @see https://html.spec.whatwg.org/multipage/parsing.html#html-fragment-parsing-algorithm
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * @since 6.8.0
+ *
</ins><span class="cx" style="display: block; padding: 0 10px"> * @param string $html Input HTML fragment to process.
</span><span class="cx" style="display: block; padding: 0 10px"> * @return static|null The created processor if successful, otherwise null.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> public function create_fragment_at_current_node( string $html ) {
</span><span class="cx" style="display: block; padding: 0 10px"> if ( $this->get_token_type() !== '#tag' || $this->is_tag_closer() ) {
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ _doing_it_wrong(
+ __METHOD__,
+ __( 'The context element must be a start tag.' ),
+ '6.8.0'
+ );
</ins><span class="cx" style="display: block; padding: 0 10px"> return null;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $tag_name = $this->current_element->token->node_name;
</ins><span class="cx" style="display: block; padding: 0 10px"> $namespace = $this->current_element->token->namespace;
</span><span class="cx" style="display: block; padding: 0 10px">
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( 'html' === $namespace && self::is_void( $tag_name ) ) {
+ _doing_it_wrong(
+ __METHOD__,
+ sprintf(
+ // translators: %s: A tag name like INPUT or BR.
+ __( 'The context element cannot be a void element, found "%s".' ),
+ $tag_name
+ ),
+ '6.8.0'
+ );
+ return null;
+ }
+
</ins><span class="cx" style="display: block; padding: 0 10px"> /*
</span><span class="cx" style="display: block; padding: 0 10px"> * Prevent creating fragments at nodes that require a special tokenizer state.
</span><span class="cx" style="display: block; padding: 0 10px"> * This is unsupported by the HTML Processor.
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -475,19 +507,35 @@
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> if (
</span><span class="cx" style="display: block; padding: 0 10px"> 'html' === $namespace &&
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- in_array( $this->current_element->token->node_name, array( 'IFRAME', 'NOEMBED', 'NOFRAMES', 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE', 'XMP', 'PLAINTEXT' ), true )
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ in_array( $tag_name, array( 'IFRAME', 'NOEMBED', 'NOFRAMES', 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE', 'XMP', 'PLAINTEXT' ), true )
</ins><span class="cx" style="display: block; padding: 0 10px"> ) {
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ _doing_it_wrong(
+ __METHOD__,
+ sprintf(
+ // translators: %s: A tag name like IFRAME or TEXTAREA.
+ __( 'The context element "%s" is not supported.' ),
+ $tag_name
+ ),
+ '6.8.0'
+ );
</ins><span class="cx" style="display: block; padding: 0 10px"> return null;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $fragment_processor = static::create_fragment( $html );
- if ( null === $fragment_processor ) {
- return null;
- }
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $fragment_processor = new static( $html, self::CONSTRUCTOR_UNLOCK_CODE );
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> $fragment_processor->compat_mode = $this->compat_mode;
</span><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $fragment_processor->context_node = clone $this->state->current_token;
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ // @todo Create "fake" bookmarks for non-existent but implied nodes.
+ $fragment_processor->bookmarks['root-node'] = new WP_HTML_Span( 0, 0 );
+ $root_node = new WP_HTML_Token(
+ 'root-node',
+ 'HTML',
+ false
+ );
+ $fragment_processor->state->stack_of_open_elements->push( $root_node );
+
+ $fragment_processor->bookmarks['context-node'] = new WP_HTML_Span( 0, 0 );
+ $fragment_processor->context_node = clone $this->current_element->token;
</ins><span class="cx" style="display: block; padding: 0 10px"> $fragment_processor->context_node->bookmark_name = 'context-node';
</span><span class="cx" style="display: block; padding: 0 10px"> $fragment_processor->context_node->on_destroy = null;
</span><span class="cx" style="display: block; padding: 0 10px">
</span></span></pre></div>
<a id="trunktestsphpunittestshtmlapiwpHtmlProcessorphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/tests/phpunit/tests/html-api/wpHtmlProcessor.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/tests/phpunit/tests/html-api/wpHtmlProcessor.php 2024-11-27 14:28:55 UTC (rev 59466)
+++ trunk/tests/phpunit/tests/html-api/wpHtmlProcessor.php 2024-11-27 14:33:46 UTC (rev 59467)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -1044,83 +1044,6 @@
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- * @ticket 62357
- */
- public function test_create_fragment_at_current_node_in_foreign_content() {
- $processor = WP_HTML_Processor::create_full_parser( '<svg>' );
- $this->assertTrue( $processor->next_tag( 'SVG' ) );
-
- $fragment = $processor->create_fragment_at_current_node( "\0preceded-by-nul-byte<rect /><circle></circle><foreignobject><div></div></foreignobject><g>" );
-
- $this->assertSame( 'svg', $fragment->get_namespace() );
- $this->assertTrue( $fragment->next_token() );
-
- /*
- * In HTML parsing, a nul byte would be ignored.
- * In SVG it should be replaced with a replacement character.
- */
- $this->assertSame( '#text', $fragment->get_token_type() );
- $this->assertSame( "\u{FFFD}", $fragment->get_modifiable_text() );
-
- $this->assertTrue( $fragment->next_tag( 'RECT' ) );
- $this->assertSame( 'svg', $fragment->get_namespace() );
-
- $this->assertTrue( $fragment->next_tag( 'CIRCLE' ) );
- $this->assertSame( array( 'HTML', 'SVG', 'CIRCLE' ), $fragment->get_breadcrumbs() );
- $this->assertTrue( $fragment->next_tag( 'foreignObject' ) );
- $this->assertSame( 'svg', $fragment->get_namespace() );
- }
-
- /**
- * @ticket 62357
- */
- public function test_create_fragment_at_current_node_in_foreign_content_integration_point() {
- $processor = WP_HTML_Processor::create_full_parser( '<svg><foreignObject>' );
- $this->assertTrue( $processor->next_tag( 'foreignObject' ) );
-
- $fragment = $processor->create_fragment_at_current_node( "<image>\0not-preceded-by-nul-byte<rect />" );
-
- // Nothing has been processed, the html namespace should be used for parsing as an integration point.
- $this->assertSame( 'html', $fragment->get_namespace() );
-
- // HTML parsing transforms IMAGE into IMG.
- $this->assertTrue( $fragment->next_tag( 'IMG' ) );
-
- $this->assertTrue( $fragment->next_token() );
-
- // In HTML parsing, the nul byte is ignored and the text is reached.
- $this->assertSame( '#text', $fragment->get_token_type() );
- $this->assertSame( 'not-preceded-by-nul-byte', $fragment->get_modifiable_text() );
-
- /*
- * svg:foreignObject is an HTML integration point, so the processor should be in the HTML namespace.
- * RECT is an HTML element here, meaning it may have the self-closing flag but does not self-close.
- */
- $this->assertTrue( $fragment->next_tag( 'RECT' ) );
- $this->assertSame( array( 'HTML', 'FOREIGNOBJECT', 'RECT' ), $fragment->get_breadcrumbs() );
- $this->assertSame( 'html', $fragment->get_namespace() );
- $this->assertTrue( $fragment->has_self_closing_flag() );
- $this->assertTrue( $fragment->expects_closer() );
- }
-
- /**
- * @ticket 62357
- */
- public function test_prevent_fragment_creation_on_closers() {
- $processor = WP_HTML_Processor::create_full_parser( '<p></p>' );
- $processor->next_tag( 'P' );
- $processor->next_tag(
- array(
- 'tag_name' => 'P',
- 'tag_closers' => 'visit',
- )
- );
- $this->assertSame( 'P', $processor->get_tag() );
- $this->assertTrue( $processor->is_tag_closer() );
- $this->assertNull( $processor->create_fragment_at_current_node( '<i>fragment HTML</i>' ) );
- }
-
- /**
</del><span class="cx" style="display: block; padding: 0 10px"> * Ensure that lowercased tag_name query matches tags case-insensitively.
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><span class="cx" style="display: block; padding: 0 10px"> * @group 62427
</span></span></pre></div>
<a id="trunktestsphpunittestshtmlapiwpHtmlProcessorFragmentParsingphp"></a>
<div class="addfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Added: trunk/tests/phpunit/tests/html-api/wpHtmlProcessorFragmentParsing.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/tests/phpunit/tests/html-api/wpHtmlProcessorFragmentParsing.php (rev 0)
+++ trunk/tests/phpunit/tests/html-api/wpHtmlProcessorFragmentParsing.php 2024-11-27 14:33:46 UTC (rev 59467)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -0,0 +1,178 @@
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+<?php
+/**
+ * Unit tests covering WP_HTML_Processor fragment parsing functionality.
+ *
+ * @package WordPress
+ * @subpackage HTML-API
+ *
+ * @since 6.8.0
+ *
+ * @group html-api
+ *
+ * @coversDefaultClass WP_HTML_Processor
+ */
+class Tests_HtmlApi_WpHtmlProcessorFragmentParsing extends WP_UnitTestCase {
+ /**
+ * @ticket 62357
+ */
+ public function test_create_fragment_at_current_node_in_foreign_content() {
+ $processor = WP_HTML_Processor::create_full_parser( '<svg>' );
+ $this->assertTrue( $processor->next_tag( 'SVG' ) );
+
+ $fragment = $processor->create_fragment_at_current_node( "\0preceded-by-nul-byte<rect /><circle></circle><foreignobject><div></div></foreignobject><g>" );
+
+ $this->assertSame( 'svg', $fragment->get_namespace() );
+ $this->assertTrue( $fragment->next_token() );
+
+ /*
+ * In HTML parsing, a nul byte would be ignored.
+ * In SVG it should be replaced with a replacement character.
+ */
+ $this->assertSame( '#text', $fragment->get_token_type() );
+ $this->assertSame( "\u{FFFD}", $fragment->get_modifiable_text() );
+
+ $this->assertTrue( $fragment->next_tag( 'RECT' ) );
+ $this->assertSame( 'svg', $fragment->get_namespace() );
+
+ $this->assertTrue( $fragment->next_tag( 'CIRCLE' ) );
+ $this->assertSame( array( 'HTML', 'SVG', 'CIRCLE' ), $fragment->get_breadcrumbs() );
+ $this->assertTrue( $fragment->next_tag( 'foreignObject' ) );
+ $this->assertSame( 'svg', $fragment->get_namespace() );
+ }
+
+ /**
+ * @ticket 62357
+ */
+ public function test_create_fragment_at_current_node_in_foreign_content_integration_point() {
+ $processor = WP_HTML_Processor::create_full_parser( '<svg><foreignObject>' );
+ $this->assertTrue( $processor->next_tag( 'foreignObject' ) );
+
+ $fragment = $processor->create_fragment_at_current_node( "<image>\0not-preceded-by-nul-byte<rect />" );
+
+ // Nothing has been processed, the html namespace should be used for parsing as an integration point.
+ $this->assertSame( 'html', $fragment->get_namespace() );
+
+ // HTML parsing transforms IMAGE into IMG.
+ $this->assertTrue( $fragment->next_tag( 'IMG' ) );
+
+ $this->assertTrue( $fragment->next_token() );
+
+ // In HTML parsing, the nul byte is ignored and the text is reached.
+ $this->assertSame( '#text', $fragment->get_token_type() );
+ $this->assertSame( 'not-preceded-by-nul-byte', $fragment->get_modifiable_text() );
+
+ /*
+ * svg:foreignObject is an HTML integration point, so the processor should be in the HTML namespace.
+ * RECT is an HTML element here, meaning it may have the self-closing flag but does not self-close.
+ */
+ $this->assertTrue( $fragment->next_tag( 'RECT' ) );
+ $this->assertSame( array( 'HTML', 'FOREIGNOBJECT', 'RECT' ), $fragment->get_breadcrumbs() );
+ $this->assertSame( 'html', $fragment->get_namespace() );
+ $this->assertTrue( $fragment->has_self_closing_flag() );
+ $this->assertTrue( $fragment->expects_closer() );
+ }
+
+ /**
+ * @expectedIncorrectUsage WP_HTML_Processor::create_fragment_at_current_node
+ * @ticket 62357
+ */
+ public function test_prevent_fragment_creation_on_closers() {
+ $processor = WP_HTML_Processor::create_full_parser( '<p></p>' );
+ $processor->next_tag( 'P' );
+ $processor->next_tag(
+ array(
+ 'tag_name' => 'P',
+ 'tag_closers' => 'visit',
+ )
+ );
+ $this->assertSame( 'P', $processor->get_tag() );
+ $this->assertTrue( $processor->is_tag_closer() );
+ $this->assertNull( $processor->create_fragment_at_current_node( '<i>fragment HTML</i>' ) );
+ }
+
+ /**
+ * Verifies that the fragment parser doesn't allow invalid context nodes.
+ *
+ * This includes void elements and self-contained elements because they can
+ * contain no inner HTML. Operations on self-contained elements should occur
+ * through methods such as {@see WP_HTML_Tag_Processor::set_modifiable_text}.
+ *
+ * @ticket 62584
+ *
+ * @dataProvider data_invalid_fragment_contexts
+ *
+ * @param string $context Invalid context node for fragment parser.
+ */
+ public function test_rejects_invalid_fragment_contexts( string $context, string $doing_it_wrong_method_name ) {
+ $this->setExpectedIncorrectUsage( "WP_HTML_Processor::{$doing_it_wrong_method_name}" );
+ $this->assertNull(
+ WP_HTML_Processor::create_fragment( 'just a test', $context ),
+ "Should not have been able to create a fragment parser with context node {$context}"
+ );
+ }
+
+ /**
+ * Data provider.
+ *
+ * @return array[]
+ */
+ public static function data_invalid_fragment_contexts() {
+ return array(
+ /*
+ * Invalid contexts.
+ */
+ /*
+ * The text node is confused with a virtual body open tag.
+ * This should fail to set a bookmark in `create_fragment`
+ * but currently does not, it slips through and fails in
+ * `create_fragment_at_current_node`.
+ */
+ 'Invalid text' => array( 'just some text', 'create_fragment_at_current_node' ),
+ 'Invalid comment' => array( '<!-- comment -->', 'create_fragment' ),
+ 'Invalid closing' => array( '</div>', 'create_fragment' ),
+ 'Invalid DOCTYPE' => array( '<!DOCTYPE html>', 'create_fragment' ),
+ /*
+ * PLAINTEXT should appear in the unsupported elements, but at the
+ * moment it's completely unsupported by the processor so
+ * the context element cannot be found.
+ */
+ 'Unsupported PLAINTEXT' => array( '<plaintext>', 'create_fragment' ),
+
+ /*
+ * Invalid contexts.
+ */
+ 'AREA' => array( '<area>', 'create_fragment_at_current_node' ),
+ 'BASE' => array( '<base>', 'create_fragment_at_current_node' ),
+ 'BASEFONT' => array( '<basefont>', 'create_fragment_at_current_node' ),
+ 'BGSOUND' => array( '<bgsound>', 'create_fragment_at_current_node' ),
+ 'BR' => array( '<br>', 'create_fragment_at_current_node' ),
+ 'COL' => array( '<table><colgroup><col>', 'create_fragment_at_current_node' ),
+ 'EMBED' => array( '<embed>', 'create_fragment_at_current_node' ),
+ 'FRAME' => array( '<frameset><frame>', 'create_fragment_at_current_node' ),
+ 'HR' => array( '<hr>', 'create_fragment_at_current_node' ),
+ 'IMG' => array( '<img>', 'create_fragment_at_current_node' ),
+ 'INPUT' => array( '<input>', 'create_fragment_at_current_node' ),
+ 'KEYGEN' => array( '<keygen>', 'create_fragment_at_current_node' ),
+ 'LINK' => array( '<link>', 'create_fragment_at_current_node' ),
+ 'META' => array( '<meta>', 'create_fragment_at_current_node' ),
+ 'PARAM' => array( '<param>', 'create_fragment_at_current_node' ),
+ 'SOURCE' => array( '<source>', 'create_fragment_at_current_node' ),
+ 'TRACK' => array( '<track>', 'create_fragment_at_current_node' ),
+ 'WBR' => array( '<wbr>', 'create_fragment_at_current_node' ),
+
+ /*
+ * Unsupported elements. Include a tag closer to ensure the element can be found
+ * and does not pause the parser at an incomplete token.
+ */
+ 'IFRAME' => array( '<iframe></iframe>', 'create_fragment_at_current_node' ),
+ 'NOEMBED' => array( '<noembed></noembed>', 'create_fragment_at_current_node' ),
+ 'NOFRAMES' => array( '<noframes></noframes>', 'create_fragment_at_current_node' ),
+ 'SCRIPT' => array( '<script></script>', 'create_fragment_at_current_node' ),
+ 'SCRIPT with type' => array( '<script type="javascript"></script>', 'create_fragment_at_current_node' ),
+ 'STYLE' => array( '<style></style>', 'create_fragment_at_current_node' ),
+ 'TEXTAREA' => array( '<textarea></textarea>', 'create_fragment_at_current_node' ),
+ 'TITLE' => array( '<title></title>', 'create_fragment_at_current_node' ),
+ 'XMP' => array( '<xmp></xmp>', 'create_fragment_at_current_node' ),
+ );
+ }
+}
</ins></span></pre></div>
<a id="trunktestsphpunittestshtmlapiwpHtmlProcessorHtml5libphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: trunk/tests/phpunit/tests/html-api/wpHtmlProcessorHtml5lib.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- trunk/tests/phpunit/tests/html-api/wpHtmlProcessorHtml5lib.php 2024-11-27 14:28:55 UTC (rev 59466)
+++ trunk/tests/phpunit/tests/html-api/wpHtmlProcessorHtml5lib.php 2024-11-27 14:33:46 UTC (rev 59467)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -153,69 +153,55 @@
</span><span class="cx" style="display: block; padding: 0 10px"> * @return string|null Tree structure of parsed HTML, if supported, else null.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><span class="cx" style="display: block; padding: 0 10px"> private static function build_tree_representation( ?string $fragment_context, string $html ) {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $processor = null;
</del><span class="cx" style="display: block; padding: 0 10px"> if ( $fragment_context ) {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( 'body' === $fragment_context ) {
- $processor = WP_HTML_Processor::create_fragment( $html );
- } else {
-
- /*
- * If the string of characters starts with "svg ", the context
- * element is in the SVG namespace and the substring after
- * "svg " is the local name. If the string of characters starts
- * with "math ", the context element is in the MathML namespace
- * and the substring after "math " is the local name.
- * Otherwise, the context element is in the HTML namespace and
- * the string is the local name.
- */
- if ( str_starts_with( $fragment_context, 'svg ' ) ) {
- $tag_name = substr( $fragment_context, 4 );
- if ( 'svg' === $tag_name ) {
- $parent_processor = WP_HTML_Processor::create_full_parser( '<!DOCTYPE html><svg>' );
- } else {
- $parent_processor = WP_HTML_Processor::create_full_parser( "<!DOCTYPE html><svg><{$tag_name}>" );
- }
- $parent_processor->next_tag( $tag_name );
- } elseif ( str_starts_with( $fragment_context, 'math ' ) ) {
- $tag_name = substr( $fragment_context, 5 );
- if ( 'math' === $tag_name ) {
- $parent_processor = WP_HTML_Processor::create_full_parser( '<!DOCTYPE html><math>' );
- } else {
- $parent_processor = WP_HTML_Processor::create_full_parser( "<!DOCTYPE html><math><{$tag_name}>" );
- }
- $parent_processor->next_tag( $tag_name );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ /*
+ * If the string of characters starts with "svg ", the context
+ * element is in the SVG namespace and the substring after
+ * "svg " is the local name. If the string of characters starts
+ * with "math ", the context element is in the MathML namespace
+ * and the substring after "math " is the local name.
+ * Otherwise, the context element is in the HTML namespace and
+ * the string is the local name.
+ */
+ if ( str_starts_with( $fragment_context, 'svg ' ) ) {
+ $tag_name = substr( $fragment_context, 4 );
+ if ( 'svg' === $tag_name ) {
+ $fragment_context_html = '<svg>';
</ins><span class="cx" style="display: block; padding: 0 10px"> } else {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( in_array(
- $fragment_context,
- array(
- 'caption',
- 'col',
- 'colgroup',
- 'tbody',
- 'td',
- 'tfoot',
- 'th',
- 'thead',
- 'tr',
- ),
- true
- ) ) {
- $parent_processor = WP_HTML_Processor::create_full_parser( "<!DOCTYPE html><table><{$fragment_context}>" );
- $parent_processor->next_tag();
- } else {
- $parent_processor = WP_HTML_Processor::create_full_parser( "<!DOCTYPE html><{$fragment_context}>" );
- }
- $parent_processor->next_tag( $fragment_context );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $fragment_context_html = "<svg><{$tag_name}>";
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( null !== $parent_processor->get_unsupported_exception() ) {
- throw $parent_processor->get_unsupported_exception();
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ } elseif ( str_starts_with( $fragment_context, 'math ' ) ) {
+ $tag_name = substr( $fragment_context, 5 );
+ if ( 'math' === $tag_name ) {
+ $fragment_context_html = '<math>';
+ } else {
+ $fragment_context_html = "<math><{$tag_name}>";
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( null !== $parent_processor->get_last_error() ) {
- throw new Exception( $parent_processor->get_last_error() );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ } else {
+ // Tags that only appear in tables need a special case.
+ if ( in_array(
+ $fragment_context,
+ array(
+ 'caption',
+ 'col',
+ 'colgroup',
+ 'tbody',
+ 'td',
+ 'tfoot',
+ 'th',
+ 'thead',
+ 'tr',
+ ),
+ true
+ ) ) {
+ $fragment_context_html = "<table><{$fragment_context}>";
+ } else {
+ $fragment_context_html = "<{$fragment_context}>";
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $processor = $parent_processor->create_fragment_at_current_node( $html );
</del><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $processor = WP_HTML_Processor::create_fragment( $html, $fragment_context_html );
+
</ins><span class="cx" style="display: block; padding: 0 10px"> if ( null === $processor ) {
</span><span class="cx" style="display: block; padding: 0 10px"> throw new WP_HTML_Unsupported_Exception( "Could not create a parser with the given fragment context: {$fragment_context}.", '', 0, '', array(), array() );
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span></span></pre>
</div>
</div>
</body>
</html>