[wp-trac] [WordPress Trac] #58637: HTML API: Fatal error processing document with unclosed attribute
WordPress Trac
noreply at wordpress.org
Tue Jun 27 02:34:45 UTC 2023
#58637: HTML API: Fatal error processing document with unclosed attribute
----------------------------------------+------------------------------
Reporter: dlh | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: HTML API | Version: 6.2
Severity: normal | Resolution:
Keywords: has-unit-tests needs-patch | Focuses:
----------------------------------------+------------------------------
Description changed by dlh:
Old description:
> The HTML tag processor triggers a fatal error (in PHP 8+) when attempting
> to process a HTML string that is malformed because it ends in an unclosed
> attribute.
>
> To replicate:
>
> {{{
> $html = '<iframe width="640" height="400"
> src="https://www.example.com/embed/abcdef';
> $proc = new \WP_HTML_Tag_Processor( $html );
> $proc->next_tag( 'iframe' );
> }}}
>
> Leads to:
>
> `ValueError: strpos(): Argument #3 ($offset) must be contained in
> argument #1 ($haystack)`
>
> I've added a test case in the linked Pull Request. I think I can see that
> the error occurs because `WP_HTML_Tag_Processor::parse_next_attribute()`
> sets `$bytes_already_parsed` to one byte after the end of the document,
> representing the missing closing quote of the attribute. But I'm less
> sure about where in the processor a fix for the problem might go, so I've
> left that open for comment for now.
>
> I encountered a string like this as part of a content migration over
> other rows of well-formed HTML. In this scenario, I wouldn't expect the
> tag processor to be able to tell me anything about the string, but it
> would be helpful to migration scripts like mine for the processor to
> handle the bad string gracefully.
New description:
The HTML tag processor triggers a fatal error (in PHP 8+) when attempting
to process a HTML string that is malformed because it ends in an unclosed
attribute.
To replicate:
{{{
$html = '<iframe width="640" height="400"
src="https://www.example.com/embed/abcdef';
$proc = new \WP_HTML_Tag_Processor( $html );
$proc->next_tag( 'iframe' );
}}}
Leads to:
`ValueError: strpos(): Argument #3 ($offset) must be contained in argument
#1 ($haystack)`
in `WP_HTML_Tag_Processor::next_tag()`: https://github.com/WordPress
/wordpress-develop/blob/12ed5ccb0a61cf684682a94e9b9c02dd11bb7d75/src/wp-
includes/html-api/class-wp-html-tag-processor.php#L549
I've added a test case in the linked Pull Request. I think I can see that
the error occurs because `WP_HTML_Tag_Processor::parse_next_attribute()`
sets `$bytes_already_parsed` to one byte after the end of the document,
representing the missing closing quote of the attribute. But I'm less sure
about where in the processor a fix for the problem might go, so I've left
that open for comment for now.
I encountered a string like this as part of a content migration over other
rows of well-formed HTML. In this scenario, I wouldn't expect the tag
processor to be able to tell me anything about the string, but it would be
helpful to migration scripts like mine for the processor to handle the bad
string gracefully.
--
--
Ticket URL: <https://core.trac.wordpress.org/ticket/58637#comment:3>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list