[wp-trac] [WordPress Trac] #39153: Bug in wp_html_split with unclosed PHP tag (or HTML tag <)

WordPress Trac noreply at wordpress.org
Wed Dec 7 17:48:33 UTC 2016


#39153: Bug in wp_html_split with unclosed PHP tag (or HTML tag <)
--------------------------------------+-----------------------------
 Reporter:  crosp                     |      Owner:
     Type:  defect (bug)              |     Status:  new
 Priority:  normal                    |  Milestone:  Awaiting Review
Component:  Formatting                |    Version:  4.6.1
 Severity:  normal                    |   Keywords:
  Focuses:  administration, template  |
--------------------------------------+-----------------------------
 The problem is in the ''shortcodes.php'' file, but exact problem is
 function ''wp_html_spli''t in ''formatting.php''

 This bug is completely described in this question forum thread.

 https://wordpress.org/support/topic/bug-in-wp_html_split-with-unclosed-
 php-tag/

 Consider following post code.


 {{{
 Some amount of useless text <!--more-->

 [code-highlight line-numbers="table" linenostart="53" highlight-
 lines="1,3,8" style="native" lang="html+php" pyg-id="1" ]
 <?php
 //This callback registers our plug-in
 function wpse72394_register_tinymce_plugin($plugin_array) {
     $plugin_array['wpse72394_button'] = 'path/to/shortcode.js';
     return $plugin_array;
 }

 //This callback adds our button to the toolbar
 function wpse72394_add_tinymce_button($buttons) {
             //Add the button ID to the $button array
     $buttons[] = "wpse72394_button";
     return $buttons;
 }
 ?
 [/code-highlight]

 Some amount of useless text <strong>checkstyle</strong>

 [code-highlight style="native" lang="perl" pyg-id="2" ]
 (?:s+)(?:(/*([^*]|[rn]|(*+([^*/]|[rn])))**+/)|(//(?!.*(CHECKSTYLE)).*))
 [/code-highlight]
 }}}

 Here dump after this line

 {{{

 $textarr = wp_html_split( $content );
     var_dump($textarr);
     exit;
 }}}


 {{{

                 array(25) {
   [0]=>
   string(0) ""
   [1]=>
   string(3) "<p>"
   [2]=>
   string(28) "Some amount of useless text "
   [3]=>
   string(11) "<!--more-->"
   [4]=>
   string(0) ""
   [5]=>
   string(4) "</p>"
   [6]=>
   string(1) "
 "
   [7]=>
   string(3) "<p>"
   [8]=>
   string(121) "[code-highlight line-numbers="table" linenostart="53"
 highlight-lines="1,3,8" style="native" lang="html+php" pyg-id="1" ]"
   [9]=>
   string(6) "<br />"
   [10]=>
   string(1) "
 "
   [11]=>
   string(464) "<?php
 //This callback registers our plug-in
 function wpse72394_register_tinymce_plugin($plugin_array) {
     $plugin_array['wpse72394_button'] = 'path/to/shortcode.js';
     return $plugin_array;
 }

 //This callback adds our button to the toolbar
 function wpse72394_add_tinymce_button($buttons) {
             //Add the button ID to the $button array
     $buttons[] = "wpse72394_button";
     return $buttons;
 }
 ?
 [/code-highlight]

 Some amount of useless text <strong>"
   [12]=>
   string(10) "checkstyle"
   [13]=>
   string(9) "</strong>"
   [14]=>
   string(0) ""
   [15]=>
   string(4) "</p>"
   [16]=>
   string(56) "
 [code-highlight style="native" lang="perl" pyg-id="2" ]"
   [17]=>
   string(6) "<br />"
   [18]=>
   string(72) "
 (?:s+)(?:(/*([^*]|[rn]|(*+([^*/]|[rn])))**+/)|(//(?!.*(CHECKSTYLE)).*))"
   [19]=>
   string(6) "<br />"
   [20]=>
   string(19) "
 [/code-highlight]
 "
   [21]=>
   string(3) "<p>"
   [22]=>
   string(15) "Some Text Again"
   [23]=>
   string(4) "</p>"
   [24]=>
   string(1) "
 "
 }
 }}}


 As you can see one shortcode was not splitted, and here the problem. If
 php closing tag is present (?>)
 than everything works fine.

 Problematic regex provider

 {{{#!php
 <?php
 function get_html_split_regex() {
         static $regex;

         if ( ! isset( $regex ) ) {
                 $comments =
                           '!'           // Start of comment, after the <.
                         . '(?:'         // Unroll the loop: Consume
 everything until --> is found.
                         .     '-(?!->)' // Dash not followed by end of
 comment.
                         .     '[^\-]*+' // Consume non-dashes.
                         . ')*+'         // Loop possessively.
                         . '(?:-->)?';   // End of comment. If not found,
 match all input.

                 $cdata =
                           '!\[CDATA\['  // Start of comment, after the <.
                         . '[^\]]*+'     // Consume non-].
                         . '(?:'         // Unroll the loop: Consume
 everything until ]]> is found.
                         .     '](?!]>)' // One ] not followed by end of
 comment.
                         .     '[^\]]*+' // Consume non-].
                         . ')*+'         // Loop possessively.
                         . '(?:]]>)?';   // End of comment. If not found,
 match all input.

                 $escaped =
                           '(?='           // Is the element escaped?
                         .    '!--'
                         . '|'
                         .    '!\[CDATA\['
                         . ')'
                         . '(?(?=!-)'      // If yes, which type?
                         .     $comments
                         . '|'
                         .     $cdata
                         . ')';

                 $regex =
                           '/('              // Capture the entire match.
                         .     '<'           // Find start of element.
                         .     '(?'          // Conditional expression
 follows.
                         .         $escaped  // Find end of escaped
 element.
                         .     '|'           // ... else ...
                         .         '[^>]*>?' // Find end of normal element.
                         .     ')'
                         . ')/';
         }

         return $regex;
 }

 }}}

 Without any doubts this case should be included in regex.

--
Ticket URL: <https://core.trac.wordpress.org/ticket/39153>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list