[wp-hackers] Shortcode parser

Michael D Adams mda at blogwaffe.com
Thu Nov 10 19:37:24 UTC 2011


2011/11/10 Michał Środek <michal at srodek.info>:
> I've been preparing a fix which will be work great with wp shortcode
> interface but I need some unit tests to check my solution.
>
> The first test php file( not so perfect yet ) is available here:
> http://srodek.info/online/wp-hackers/2011-11-10.txt
> You can also run it online here
> http://srodek.info/online/wp-hackers/2011-11-10.php
>
> $regexp = '#(.?)(?:\[('.$tagregexp.')((?:\s|=).*[^\/\]]{1})?\] ( (?:
> (?(R) [^\[]++ | [^\[]*+) | (?R)) *) \[/\\2\] |
> \[('.$this->allowedShortcodes.')((?:\s|=)[^\/\[\]]+?)?\/?\])(.?)#x';

That regex will backtrack a lot for large posts on sites with many
shortcodes defined.  See, for example, the work in
http://core.trac.wordpress.org/ticket/15600.

Using a recursive regex to solve the nested shortcode problem will
probably always fail badly in edge cases.

If we want to fix some of these edge cases, we should use a real
tokenizer/parser, not a regular expression.

Mike


More information about the wp-hackers mailing list