[wp-trac] [WordPress Trac] #47973: Invalid HTML output from image_caption + wpautop combination

WordPress Trac noreply at wordpress.org
Wed Sep 4 16:24:05 UTC 2019


#47973: Invalid HTML output from image_caption + wpautop combination
------------------------------+-----------------------------
 Reporter:  terokilkanen      |      Owner:  (none)
     Type:  defect (bug)      |     Status:  new
 Priority:  normal            |  Milestone:  Awaiting Review
Component:  Formatting        |    Version:  5.2.2
 Severity:  normal            |   Keywords:
  Focuses:  coding-standards  |
------------------------------+-----------------------------
 Hello,

 We are using WordPress on our site https://www.bonus.ca, and the HTML
 output on the page contains extra </p> tags.

 I have traced the problem into wpautop() function, which leaves these
 extra </p> tags on.

 An example:

 Caption shortcode like this:


 {{{
 [caption id="attachment_4413" align="alignnone" width="800"]<img class
 ="size-full wp-image-4413" src="/images/online-gambling-bonuses.jpg"
 alt="top Online Gambling Bonuses" width="800" height="394" /> Internet has
 changed the way we gamble. Live dealer casinos are a new way to enjoy the
 thrills of Roulette and Blackjack from the comfort of your own home, even
 on your phone. Visit <a href="/leovegas">LeoVegas</a> for the best <a
 href="/casino/live">live casino experience</a>. You can also get a rare
 live casino bonus at LeoVegas.[/caption]
 }}}


 After shortcode filter has run on the block, the result is:


 {{{
 <div id="attachment_4413" style="width: 810px" class="wp-caption
 alignnone"><img aria-describedby="caption-attachment-4413" class="size-
 full wp-image-4413" src="/images/online-gambling-bonuses.jpg" alt="top
 Online Gambling Bonuses" width="800" height="394" /><p id="caption-
 attachment-4413" class="wp-caption-text">Internet has changed the way we
 gamble. Live dealer casinos are a new way to enjoy the thrills of Roulette
 and Blackjack from the comfort of your own home, even on your phone. Visit
 <a href="/leovegas">LeoVegas</a> for the best <a href="/casino/live">live
 casino experience</a>. You can also get a rare live casino bonus at
 LeoVegas.</p></div>
 }}}


 After running wpautop filter, the output is:


 {{{
 <div id="attachment_4413" style="width: 810px" class="wp-caption
 alignnone"><img aria-describedby="caption-attachment-4413" class="size-
 full wp-image-4413" src="/images/online-gambling-bonuses.jpg" alt="top
 Online Gambling Bonuses" width="800" height="394" /></p>
 <p id="caption-attachment-4413" class="wp-caption-text">Internet has
 changed the way we gamble. Live dealer casinos are a new way to enjoy the
 thrills of Roulette and Blackjack from the comfort of your own home, even
 on your phone. Visit <a href="/leovegas">LeoVegas</a> for the best <a
 href="/casino/live">live casino experience</a>. You can also get a rare
 live casino bonus at LeoVegas.</p>
 }}}


 Here, there is an extra </p> tag after the <img> tag.

 The following part of wpautop() function is supposed to remove the closing
 tag, but it fails:

 {{{#!php
 <?php
         // If an opening or closing block element tag is followed by a
 closing <p> tag, remove it.
         $pee = preg_replace( '!(</?' . $allblocks . '[^>]*>)\s*</p>!',
 '$1', $pee );

 }}}

 The code doesn't remove the extra </p> because the regex only matches to
 defined block level tags, and in this case, the previous tag is an <img />
 tag.

 In general, I think the approach to use regex to modify a HTML document is
 an invalid one. One could generate proper regular expressions to properly
 match everything in HTML language, but the regular expressions will be
 huge.

 The proper way to do this would be to somehow generate a DOM from the
 source code, and do the adjustments on that.

 Or then it might be that the way WP is mixing HTML and plain text is
 simply impossible to implement correctly...

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/47973>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list