[wp-hackers] RSS/Atom excerpt and filters
Michel Fortin
michel.fortin at michelf.com
Sun Jul 4 15:02:15 UTC 2004
Le 3 juil. 2004, à 17:23, Stephen O'Connor a écrit :
> What happens when the author includes escaped html code in the entry,
> as
> many authors on this list do. This could make things a whole lot
> worse. (I
> can't stand working with character encoding... ew)
This is not a problem. Let's say I have this: `é<br />` that get
encoded this way: `&eacute;<br />`. When the RSS reader reads
it, it first unescape it to get the HTML, and obtain `é<br />`
which is valid HTML that will display as `é` with a line break. Of
course it's easier to enclose it in a CDATA block (`<CDATA[[é<br
/>]]>`) and we get the same result.
> Perhaps a "best-practice" would be to parse $wp_filter for the
> existance of
> htmlentities. It would only work if everyone agreed on it, but it's a
> solution you can use today.
Wrong. If I want the summary to be correct in the Atom feed, the
summary tag needs the type and mode attributes set to "text/html" and
"escaped". Of course the feed validates if I do not set them, but the
HTML will be displayed as text by a viewer that follow the
specification.
This means I can't include tags into the excerpt in a correct manner
without a modification to the wp-atom.php template. Saying: "If you use
Markdown (which will put tags in the excerpt), please replace the
wp-atom.php file with this one" is not much user-friendly for a plugin
you can activate and deactivate from the web interface.
I believe someone made the assumption that the excerpt in WP would
always contain plain text. This is why stripping the tags is the only
compatible solution for WordPress 1.2.
If WP 1.3 allows HTML in the excerpt/description/summary, I want to
allow PHP Markdown to emit tags. This is why there was a version check
when I wrote this:
add_filter('the_excerpt_rss', 'Markdown', 6);
if ($wp_version == 1.2)
add_filter('the_excerpt_rss', 'strip_tags', 100);
This way, when you load the plugin in version 1.3 of WP, tags are not
stripped and WordPress can write them down correctly (escaped or
enclosed in a CDATA block), assuming WordPress 1.3 can deal with tags
itself by removing or encoding them.
## Summary ##
If a tag or a tag-delimiter character is found in an entry excerpt, it
will create an invalid RSS or Atom feed, *plugins or not*. This is what
should be corrected. My solution is to add a 'strip_tags' filter to my
plugin so that it does not put any tags in the excerpt while running in
WP 1.2. I hope this is only a temporary solution. I think WordPress
itself should handle tags in an excerpt without invalidating the feeds.
A way to correct this in WordPress could be to always assume the text
is in HTML format and write the excerpt in the feeds like this (RSS):
<description><CDATA[[ ... ]]></description>
And like this (Atom):
<summary type="text/html" mode="escaped">
<CDATA[[ ... ]]>
</summary>
I hope things are clearer now.
Michel Fortin
michel.fortin at michelf.com
http://www.michelf.com/
More information about the hackers
mailing list