[wp-trac] [WordPress Trac] #47973: Invalid HTML output from image_caption + wpautop combination
WordPress Trac
noreply at wordpress.org
Wed Sep 4 16:24:05 UTC 2019
#47973: Invalid HTML output from image_caption + wpautop combination
------------------------------+-----------------------------
Reporter: terokilkanen | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Formatting | Version: 5.2.2
Severity: normal | Keywords:
Focuses: coding-standards |
------------------------------+-----------------------------
Hello,
We are using WordPress on our site https://www.bonus.ca, and the HTML
output on the page contains extra </p> tags.
I have traced the problem into wpautop() function, which leaves these
extra </p> tags on.
An example:
Caption shortcode like this:
{{{
[caption id="attachment_4413" align="alignnone" width="800"]<img class
="size-full wp-image-4413" src="/images/online-gambling-bonuses.jpg"
alt="top Online Gambling Bonuses" width="800" height="394" /> Internet has
changed the way we gamble. Live dealer casinos are a new way to enjoy the
thrills of Roulette and Blackjack from the comfort of your own home, even
on your phone. Visit <a href="/leovegas">LeoVegas</a> for the best <a
href="/casino/live">live casino experience</a>. You can also get a rare
live casino bonus at LeoVegas.[/caption]
}}}
After shortcode filter has run on the block, the result is:
{{{
<div id="attachment_4413" style="width: 810px" class="wp-caption
alignnone"><img aria-describedby="caption-attachment-4413" class="size-
full wp-image-4413" src="/images/online-gambling-bonuses.jpg" alt="top
Online Gambling Bonuses" width="800" height="394" /><p id="caption-
attachment-4413" class="wp-caption-text">Internet has changed the way we
gamble. Live dealer casinos are a new way to enjoy the thrills of Roulette
and Blackjack from the comfort of your own home, even on your phone. Visit
<a href="/leovegas">LeoVegas</a> for the best <a href="/casino/live">live
casino experience</a>. You can also get a rare live casino bonus at
LeoVegas.</p></div>
}}}
After running wpautop filter, the output is:
{{{
<div id="attachment_4413" style="width: 810px" class="wp-caption
alignnone"><img aria-describedby="caption-attachment-4413" class="size-
full wp-image-4413" src="/images/online-gambling-bonuses.jpg" alt="top
Online Gambling Bonuses" width="800" height="394" /></p>
<p id="caption-attachment-4413" class="wp-caption-text">Internet has
changed the way we gamble. Live dealer casinos are a new way to enjoy the
thrills of Roulette and Blackjack from the comfort of your own home, even
on your phone. Visit <a href="/leovegas">LeoVegas</a> for the best <a
href="/casino/live">live casino experience</a>. You can also get a rare
live casino bonus at LeoVegas.</p>
}}}
Here, there is an extra </p> tag after the <img> tag.
The following part of wpautop() function is supposed to remove the closing
tag, but it fails:
{{{#!php
<?php
// If an opening or closing block element tag is followed by a
closing <p> tag, remove it.
$pee = preg_replace( '!(</?' . $allblocks . '[^>]*>)\s*</p>!',
'$1', $pee );
}}}
The code doesn't remove the extra </p> because the regex only matches to
defined block level tags, and in this case, the previous tag is an <img />
tag.
In general, I think the approach to use regex to modify a HTML document is
an invalid one. One could generate proper regular expressions to properly
match everything in HTML language, but the regular expressions will be
huge.
The proper way to do this would be to somehow generate a DOM from the
source code, and do the adjustments on that.
Or then it might be that the way WP is mixing HTML and plain text is
simply impossible to implement correctly...
--
Ticket URL: <https://core.trac.wordpress.org/ticket/47973>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list