[wp-trac] Re: [WordPress Trac] #8553: preg_replace_callback in do_shortcode returns empty for large posts

WordPress Trac wp-trac at lists.automattic.com
Sat Jun 20 13:39:10 GMT 2009


#8553: preg_replace_callback in do_shortcode returns empty for large posts
---------------------------+------------------------------------------------
 Reporter:  AaronCampbell  |        Type:  defect (bug)
   Status:  new            |    Priority:  high        
Milestone:  2.9            |   Component:  Shortcodes  
  Version:                 |    Severity:  normal      
 Keywords:  needs-patch    |  
---------------------------+------------------------------------------------

Comment(by Brusdeylins):

 OK, in my last post I wrote "This don't make sence if your shortcode names
 should not contain any brackets". Here I ment the parameters of the
 shortcodes, not the names itself. Sorry… I want to explain, how I
 understand the described regular expression.

 '''--- The double brackets ---'''

 First the double brackets (one non-catching and one catching):
 {{{
 (?:(\/))?         # optional slash to indicate a closing tag, not always
 set
 }}}

 My suggestion: you don’t need the first non-catching brackets, because
 they don’t deactivate the catching brackets inside and they don’t catch
 more then the brackets inside. They are only producing more backtrackings!
 The Meaning of this part is: The result array holds on position 4 the
 slash or "empty", but always has the same amount of array elements! And
 the meaning of “(\/)?” is the same, because the question mark is outside
 of both brackets…

 Here an code example with double brackets:
 {{{
 $TXT = “abcdefg”;
 $pattern = '/.*(?:(cd)).*/';
 preg_match_all($pattern, $TXT, $array);
 echo '<pre>', print_r($array, true), '</pre>';
 }}}

 And the result:
 {{{
 Array
 (
     [0] => Array
         (
             [0] => abcdefg
         )

     [1] => Array
         (
             [0] => cd
         )
 )
 }}}


 If you run the same program with the following regex:
 {{{
 $pattern = '/.*(cd).*/';
 }}}


 you get exact the same result:
 {{{
 Array
 (
     [0] => Array
         (
             [0] => abcdefg
         )

     [1] => Array
         (
             [0] => cd
         )

 )
 }}}


 So I think you can reduce this part of the regular expression. Or do you
 have another example where you get different results?


 '''--- The memory problem with non-greedy quantifiers ---'''
 {{{
 (.*?)             # optional attributes (non-greedy, stops on close
 bracket)
 }}}

 Here I don’t think that my solution is equivalent to “using a dot”. Not if
 you have limits in memory (and time)! And here is the problem. PHP uses a
 “traditional NFA RegEx-Engine”. This means (in case of non-greedy
 quantifiers) that the engine likes “save states” and backtracking… if we
 use non-greedy quantifiers, we have a LIFO process… (This is what I
 learned yeas ago… maybe this means: trying all - returning the shortest
 one - if the memory can hold all temp. results?)

 Try it. Replace the line like described in ''8553.3.diff'' and you will
 see, that the example post will appear. With this fix, you don’t resolve
 your problem No. 3, as you wrote (there is still the pattern (.+?) in the
 expression)! I am using the WordPress plug-in “NextGen Gallery”. This
 plug-in uses shortcode tags like “[singlepic id=37 w=150 h=400
 float=right]”. Here we don’t have (self) closing tags…

 I attached the diff and the HTML-Code of the article I wrote. Happy
 debugging :-)

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/8553#comment:37>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list