[wp-trac] [WordPress Trac] #58637: HTML API: Fatal error processing document with unclosed attribute

WordPress Trac noreply at wordpress.org
Tue Jun 27 02:34:45 UTC 2023


#58637: HTML API: Fatal error processing document with unclosed attribute
----------------------------------------+------------------------------
 Reporter:  dlh                         |       Owner:  (none)
     Type:  defect (bug)                |      Status:  new
 Priority:  normal                      |   Milestone:  Awaiting Review
Component:  HTML API                    |     Version:  6.2
 Severity:  normal                      |  Resolution:
 Keywords:  has-unit-tests needs-patch  |     Focuses:
----------------------------------------+------------------------------
Description changed by dlh:

Old description:

> The HTML tag processor triggers a fatal error (in PHP 8+) when attempting
> to process a HTML string that is malformed because it ends in an unclosed
> attribute.
>
> To replicate:
>
> {{{
> $html = '<iframe width="640" height="400"
> src="https://www.example.com/embed/abcdef';
> $proc = new \WP_HTML_Tag_Processor( $html );
> $proc->next_tag( 'iframe' );
> }}}
>
> Leads to:
>
> `ValueError: strpos(): Argument #3 ($offset) must be contained in
> argument #1 ($haystack)`
>
> I've added a test case in the linked Pull Request. I think I can see that
> the error occurs because `WP_HTML_Tag_Processor::parse_next_attribute()`
> sets `$bytes_already_parsed` to one byte after the end of the document,
> representing the missing closing quote of the attribute. But I'm less
> sure about where in the processor a fix for the problem might go, so I've
> left that open for comment for now.
>
> I encountered a string like this as part of a content migration over
> other rows of well-formed HTML. In this scenario, I wouldn't expect the
> tag processor to be able to tell me anything about the string, but it
> would be helpful to migration scripts like mine for the processor to
> handle the bad string gracefully.

New description:

 The HTML tag processor triggers a fatal error (in PHP 8+) when attempting
 to process a HTML string that is malformed because it ends in an unclosed
 attribute.

 To replicate:

 {{{
 $html = '<iframe width="640" height="400"
 src="https://www.example.com/embed/abcdef';
 $proc = new \WP_HTML_Tag_Processor( $html );
 $proc->next_tag( 'iframe' );
 }}}

 Leads to:

 `ValueError: strpos(): Argument #3 ($offset) must be contained in argument
 #1 ($haystack)`

 in `WP_HTML_Tag_Processor::next_tag()`: https://github.com/WordPress
 /wordpress-develop/blob/12ed5ccb0a61cf684682a94e9b9c02dd11bb7d75/src/wp-
 includes/html-api/class-wp-html-tag-processor.php#L549

 I've added a test case in the linked Pull Request. I think I can see that
 the error occurs because `WP_HTML_Tag_Processor::parse_next_attribute()`
 sets `$bytes_already_parsed` to one byte after the end of the document,
 representing the missing closing quote of the attribute. But I'm less sure
 about where in the processor a fix for the problem might go, so I've left
 that open for comment for now.

 I encountered a string like this as part of a content migration over other
 rows of well-formed HTML. In this scenario, I wouldn't expect the tag
 processor to be able to tell me anything about the string, but it would be
 helpful to migration scripts like mine for the processor to handle the bad
 string gracefully.

--

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/58637#comment:3>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list