[wp-trac] [WordPress Trac] #18311: Caption shortcode doesn't support markup in the caption.
WordPress Trac
wp-trac at lists.automattic.com
Fri Oct 21 20:59:16 UTC 2011
#18311: Caption shortcode doesn't support markup in the caption.
-------------------------+-----------------------------
Reporter: prettyboymp | Owner:
Type: enhancement | Status: new
Priority: normal | Milestone: Future Release
Component: TinyMCE | Version:
Severity: normal | Resolution:
Keywords: needs-patch |
-------------------------+-----------------------------
Changes (by jeremyclarke):
* cc: jer@… (added)
Comment:
The first step of the problem is in {{{image_add_caption()}}}, which is
attached to the {{{image_send_to_editor}}} filter and handles actually
inserting the caption shortcode into post content. Notably, the
{{{image_add_caption()}}} function is completely undocumented ( {@internal
Missing Short Description}}) so it's hard to know what was intended by
it's behavior, though clearly the idea was to simply remove anything that
might be offensive inside a shortcode attribute without consideration for
formatting that might be important to preserve.
Interestingly the upload manager popup correctly saves whatever HTML you
insert in the caption without stripping it or showing an error. The
problem is that when the shortcode is inserted into the post,
{{{image_add_caption()}}} runs the following string replacement, which
completely neutralises any HTML in the caption by replacing it with HTML
entities:
{{{
$caption = str_replace( array( '>', '<', '"', "'" ),
array( '>', '<', '"', ''' ),
$caption
);
}}}
This is despite the fact that re-opening the image in the gallery shows
you that the HTML is still saved correctly and unmolested in the caption
field, which is very confusing. At a minimum the media uploader should not
save any HTML in the caption field that will not ultimately function in
the caption shortcode itself.
If the str_replace is removed then inserted HTML continues to not function
because when the actual $caption content is output, addslashes() is run on
it first( {{{addslashes($caption)}}}) which creates useless broken HTML.
The issue is further complicated by the fact that the caption text is
stored in the {{{caption=""}}} attribute of the shortcode, so any double-
quotes within the caption text truly don't make any sense and absolutely
SHOULD be either escaped/slashed or replaced with single-quotes for
storage inside the shortcode attribute.
'''THE CAPTION SHORTCODE IS FUNDAMENTALLY FLAWED, NUTS'''
Looking at it closely it seems like the architecture of the caption
shortcode is fundamentally flawed because it uses an 'attribute' of the
shortcode (caption="") to store the caption text, and attributes are
rightly blocked from having the kind of formatting a useful caption needs.
Since the shortcode has only one piece of content that can go between the
two tags (i.e. between [caption] and [/caption]) that content should have
been the caption text itself (which could then take advantage of being
regular post content), rather than the image html tag that is currently
there (which could have been broken down into esc_attr()-safe attributes
stored inside the initial [caption] shortcode and parsed into an img tag
during shortcode execution).
Unfortunately re-architecting the caption shortcode now is probably off
the table, and I doubt we can get traction for fully replacing it with a
new more powerful shortcode in core. That is the problem we face though,
the current architecture is just wrong for captions that are any more than
another kind of alt label.
'''WE NEED SOME WAY TO ESCAPE HTML IN THE CAPTION TEXT ON SAVE, THEN
UNESCAPE ON DISPLAY'''
So the trick we need to pull off is getting the system to somehow allow
for at least a limited set of HTML (personally I'd settle for just links)
inside the 'caption' attribute of the [caption] shortcode, then making the
shortcode function {{{img_caption_shortcode()}}} display it correctly. If
that worked, then it should be possible to get the visual editor to stop
messing it up when you switch back and forth.
To me it seems like the best approach within the current shortcode would
be to carefully account for particular HTML tags inside the 'caption'
attribute by escaping them in a particular way while inserting them into
the post, then when executing the shortcode carefully accounting for the
escaped HTML and reverting it into an active state.
'''USING ADDSLASHES()/STRIPSLASHES() TO ESCAPE CAPTION TEXT: FAIL'''
The simplest version of that idea is to just remove the anti-html
str_replace (quoted above) from {{{image_add_caption()}}} and instead
depend on addslashes() to keep the content safe, then, in
{{img_caption_shortcode()}} using stripslashes() to get the content back
to it's original state.
Unfortunately we can't just stripslashes() in the current state of the
codebase because a slash-escaped link tag inside the caption attribute
causes the whole shortcode's $attr array to lose it's lunch and become
broken, here's and example where the caption was "Testing a link" and the
link was an a tag leading to test.com which used double-quotes around the
URL:
{{{
[id] => attachment_74823
[align] => alignnone
[width] => 375
[0] => caption="Testing
[1] => a
[2] => href="http://test.com">link"
}}}
That's the $attr array recieved by the shortcode function
img_caption_shortcode(), so before we even have a chance to
stripslashes($caption) the shortcode fails completely because the $caption
attribute is mandatory for the shortcode and not present in $atts.
The culprit here is {{{shortcode_parse_atts()}}} which uses some intense
regex voodoo I can't even begin to understand but is clearly not ready for
a world where slash-escaped double-quotes are part of a shortcode
attribute.
However, in this case switching to single-quotes actually fixes the
problem, and results in the link showing correctly on the image. Maybe a
final solution will involve enforcing single-quotes -only within caption
text HTML, though that seems awkward, since it will be hard to distinguish
between a double-quote used as part of a link and one used as part of
regular text, and we don't want to replace text-quotes with a single-quote
as that would be confusing for authors.
'''USING URLENCODE()/URLDECODE() TO ESCAPE CAPTION TEXT: UGLY'''
An alternate idea would be to use another escaping system entirely, to
completely neutralize the caption text before sending it to the editor,
then re-activate it when parsing the shortcode on display. I tried
replacing addslashes/stripshashes with urlencode/urldecode and the results
were very promising. In {{{image_add_caption()}}} I use
urlencode($caption) before sending it to the editor, which sends things
like this to the post:
{{{
caption="TEsting+a+%3Ca+href%3D%22http%3A%2F%2Ftest.com%22%3Elink%3C%2Fa%3E+with+also+some+%3Cstrong%3Ebold%3C%2Fstrong%3E+text.+"
}}}
Then in {{{img_caption_shortcode()}}} I run $caption =
urldecode($caption), which reverts it back to it's original state and
shows perfectly fine on the post preview.
This even avoids having the visual editor break the content when switching
back and forth, but unfortunately it also looks AWFUL in the visual
editor, because the urlencode() version of the caption is shown, which
would probably confuse even users savvy enough to be adding HTML links in
the caption field in the first place. This same awfulness applies to
trying to read or edit the caption inside the HTML editor, which is
infinitely more difficult then when the caption text is shown as plain
text. Yet another place where the ugly urlencoded version shows is in the
visual editor when you edit image settings.
'''WHAT OTHER FORMAT MIGHT WORK?'''
It seems like a big problem here is that any attempt to escape the HTML
will need us to account for every single place where it is shown to users
and unescape it for display in that space. Otherwise the escaping pattern
would have to be simple/obvious enough that looking at the escaped version
still made it clear what the HTML was for.
Maybe the solution is some kind of non-HTML code similar to shortcodes
that could be used inside the caption attribute to identify links etc.
This is not ideal as it creates yet another layer of syntax to learn and I
would avoid that if at all possible, but it may be the only way to get
around the problematic architecture that the caption shortcode currently
suffers from. Maybe something that emulates mediawiki markup like this
could work?
{{{
[http://test.com/ test]
}}}
'''OOF! THERE IS A REASON THIS HAS GONE SO LONG WITHOUT A FIX'''
Okay enough from me for today. Just wanted to dig into the code and do
some research about the problem. I think that this probably needs more
than just a patch, it needs thinking and decisions because so far there's
still no solution that seems appropriate despite the obviousness of the
need.
--
Ticket URL: <http://core.trac.wordpress.org/ticket/18311#comment:9>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list