[wp-trac] Re: [WordPress Trac] #8553: preg_replace_callback in
do_shortcode returns empty for large posts
WordPress Trac
wp-trac at lists.automattic.com
Sat Jun 20 13:39:10 GMT 2009
#8553: preg_replace_callback in do_shortcode returns empty for large posts
---------------------------+------------------------------------------------
Reporter: AaronCampbell | Type: defect (bug)
Status: new | Priority: high
Milestone: 2.9 | Component: Shortcodes
Version: | Severity: normal
Keywords: needs-patch |
---------------------------+------------------------------------------------
Comment(by Brusdeylins):
OK, in my last post I wrote "This don't make sence if your shortcode names
should not contain any brackets". Here I ment the parameters of the
shortcodes, not the names itself. Sorry… I want to explain, how I
understand the described regular expression.
'''--- The double brackets ---'''
First the double brackets (one non-catching and one catching):
{{{
(?:(\/))? # optional slash to indicate a closing tag, not always
set
}}}
My suggestion: you don’t need the first non-catching brackets, because
they don’t deactivate the catching brackets inside and they don’t catch
more then the brackets inside. They are only producing more backtrackings!
The Meaning of this part is: The result array holds on position 4 the
slash or "empty", but always has the same amount of array elements! And
the meaning of “(\/)?” is the same, because the question mark is outside
of both brackets…
Here an code example with double brackets:
{{{
$TXT = “abcdefg”;
$pattern = '/.*(?:(cd)).*/';
preg_match_all($pattern, $TXT, $array);
echo '<pre>', print_r($array, true), '</pre>';
}}}
And the result:
{{{
Array
(
[0] => Array
(
[0] => abcdefg
)
[1] => Array
(
[0] => cd
)
)
}}}
If you run the same program with the following regex:
{{{
$pattern = '/.*(cd).*/';
}}}
you get exact the same result:
{{{
Array
(
[0] => Array
(
[0] => abcdefg
)
[1] => Array
(
[0] => cd
)
)
}}}
So I think you can reduce this part of the regular expression. Or do you
have another example where you get different results?
'''--- The memory problem with non-greedy quantifiers ---'''
{{{
(.*?) # optional attributes (non-greedy, stops on close
bracket)
}}}
Here I don’t think that my solution is equivalent to “using a dot”. Not if
you have limits in memory (and time)! And here is the problem. PHP uses a
“traditional NFA RegEx-Engine”. This means (in case of non-greedy
quantifiers) that the engine likes “save states” and backtracking… if we
use non-greedy quantifiers, we have a LIFO process… (This is what I
learned yeas ago… maybe this means: trying all - returning the shortest
one - if the memory can hold all temp. results?)
Try it. Replace the line like described in ''8553.3.diff'' and you will
see, that the example post will appear. With this fix, you don’t resolve
your problem No. 3, as you wrote (there is still the pattern (.+?) in the
expression)! I am using the WordPress plug-in “NextGen Gallery”. This
plug-in uses shortcode tags like “[singlepic id=37 w=150 h=400
float=right]”. Here we don’t have (self) closing tags…
I attached the diff and the HTML-Code of the article I wrote. Happy
debugging :-)
--
Ticket URL: <http://core.trac.wordpress.org/ticket/8553#comment:37>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list