[wp-trac] [WordPress Trac] #10473: Shortcode parser in is flawed
WordPress Trac
wp-trac at lists.automattic.com
Thu Jul 23 23:27:38 UTC 2009
#10473: Shortcode parser in is flawed
--------------------------+-------------------------------------------------
Reporter: godfreykfc | Owner:
Type: defect (bug) | Status: new
Priority: high | Milestone: Unassigned
Component: Shortcodes | Version: 2.8.1
Severity: critical | Keywords: regex, shortcode, parser
--------------------------+-------------------------------------------------
There are several critical flaws in the shortcode parser. I'm using 2.8.2.
This probably deserves multiple tickets but I think grouping them all here
will be easier for the maintainers to follow up (because a single patch
that fixes one of the following might break the others):
Problem 1: It doesn't work properly, at all!
Plugin code:
{{{
function some_handler($atts, $content){
return "**BEGIN** Content: '{$content}' **END**";
}
function my_init() {
add_shortcode('sc', 'some_handler');
}
add_action('init','my_init');
}}}
Post content:
{{{
1[sc]1
2[sc]with content[/sc]2
3[sc /]3
4[sc]with content[/sc]4
5[sc with="attr"][/sc]5
6[sc with="attr"]and content[/sc]6
7[sc]7
8multiple line content8
9[/sc]9
}}}
I apologize if this is a bit hard to read, but the idea is every line
should be a self contained shortcode (except 7-9 which is one big fat
chunk of short code). I'm adding the line number before and after the tag
so that you can see clearly what's going on. (I'm not even testing the
single line shortcode fail as described in #10082)
Expected HTML output:
{{{
1**BEGIN** Content: '' **END**1<br />
2**BEGIN** Content: 'with content' **END**2<br />
3**BEGIN** Content: '' **END**3<br />
4**BEGIN** Content: 'with content' **END**4<br />
5**BEGIN** Content: '' **END**5<br />
4**BEGIN** Content: 'and content' **END**4<br />
7**BEGIN** Content: '7
8multiple line content8
9' **END**9
}}}
Should be fairly straight forward. But the ACTUAL HTML output blew my
mind...
{{{
1**BEGIN** Content: '1<br />
2[sc]with content' **END**2<br />
3**BEGIN** Content: '3<br />
4[sc]with content' **END**4<br />
5**BEGIN** Content: '[/sc]5<br />
6[sc with="attr"]and content' **END**6<br />
7**BEGIN** Content: '7<br />
8multiple line content8<br />
9' **END**9
}}}
Problem 1.1:
New lines inside a shortcode should not be converted to {{{<br />}}} tags,
at least when they are entered in the HTML (non-visual) editor. This is
perhaps more of an enhancement than defect, but it makes it pretty much
impossible to create shortcodes that would take multiple lines content. If
it's a desire behavior for that particular shortcode, it should be up to
the author to call a function like nl2br to convert those new lines.
Problem 1.2:
Using the same shortcode back-to-back would make the parse spill, even if
they are on different lines. For example, the parser picked up the opening
tag from the first line and the closing tag from the second line. What is
even more weird is that it happens even when the shortcode tag is self-
closed (see line 3-4).
----
Problem 2:
Incorrect types of arguments being passed into the handler function.
{{{
function some_handler($atts, $content = null){
var_dump($atts);
var_dump($content);
}
function my_init() {
add_shortcode('sc', 'some_handler');
}
add_action('init','my_init');
}}}
Post content:
{{{
[sc]
}}}
Output:
{{{
string '' (length=0)
string '' (length=0)
}}}
Problem 2.1:
According to the [http://codex.wordpress.org/Shortcode_API documentation],
$content should be null when the tag is self-closed and it can be checked
by is_null($content). In reality, it doesn't matter if you use {{{[sc]}}}
or {{{[sc /]}}} or {{{[sc][/sc]}}}, they would all pass in an empty String
instead of null. If this is the desired behavior, the documentation should
be updated. (But I believe there are benefits when you can actually
distinguish between empty content and no content.)
Problem 2.2:
What makes me stretch my head even more is the behavior when no attributes
are given. Intuitively {{{$atts}}} should be an empty array when no
arguments are passed in, so that you can loop through it using iterators
like {{{foreach()}}} without the fear of it throwing an error. (Or at
least it should be null..) However, when no arguments are there an empty
string is passed in. This does not make a lot of sense, but if for some
reason that is the intended behavior then the documentation should be
updated too.
Some of you may argue that it doesn't matter because you are suppose to
normalize it using {{{shortcode_atts()}}} before you do anything else, and
it seems to handle empty strings just fine. However, this is not always
possible. For instance, according to the API, unnamed attributes are
supported and they will be passed in as items with numeric keys. However,
there aren't any good ways to make {{{shortcode_atts()}}} keep the
numeric-keyed items in the array except naming all of them explicitly in
the default, i.e. {{{shortcode_atts(array( 0=>'', 1=>'',
2=>''....),$atts);}}}. (And even if you're fine with doing that, you can't
really pass that normalized array into {{{extract()}}} because it'd just
skip them. This might be considered as a defect of {{{shortcode_atts()}}},
but I am not going to touch that part. So, in order to extract your
unnamed attributes, you would have to loop through the array and do your
stuff before passing it into {{{shortcode_atts()}}}. So the differences
between an empty array and an empty string DOES matter.
----
Problem 3:
This is by far the most critical flaw of the parser and it does not
require much explanation:
{{{
function sc_handler($atts, $content = null){
return "A [sc] shortcode.";
}
function scalar_handler($atts, $content = null){
return "A [scalar] shortcode.";
}
function sc_extend_handler($atts, $content = null){
return "A [sc-extend] shortcode.";
}
function my_init() {
add_shortcode('sc', 'sc_handler');
add_shortcode('scalar', 'sc_handler');
add_shortcode('sc-extend', 'sc_extend_handler');
}
add_action('init','my_init');
}}}
Post content:
{{{
[sc]
[scalar]
[sc-extend]
}}}
HTML output:
{{{
A [sc] shortcode.
A [scalar] shortcode.
A [sc] shortcode.
}}}
If you make it do a {{{var_dump()}}} on $atts you will have a better idea
of what's going on:
[sc]:
{{{
string '' (length=0)
}}}
[scalar]:
{{{
string '' (length=0)
}}}
[sc-extend]:
{{{
array
0 => string '-extend' (length=7)
}}}
This is obviously very bad. For example, if someone registered a shortcode
'rss', anything that starts with 'rss' and includes a dash ('rss-fetch' or
something) would fail. Throughout my testing, shortcode names with dashes
seems to be causing a lot of issues (but according to the documentation
it's fully supported, it's even used in oen of the examples). I haven't
looked into the code yet, but are we even escaping those dashes before
sticking it into the pattern?
--
Ticket URL: <http://core.trac.wordpress.org/ticket/10473>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list