[wp-hackers] RSS/Atom excerpt and filters

Michel Fortin michel.fortin at michelf.com
Sun Jul 4 15:02:15 UTC 2004


Le 3 juil. 2004, à 17:23, Stephen O'Connor a écrit :

> What happens when the author includes escaped html code in the entry, 
> as
> many authors on this list do. This could make things a whole lot 
> worse. (I
> can't stand working with character encoding... ew)

This is not a problem. Let's say I have this: `&eacute;<br />` that get 
encoded this way: `&amp;eacute;&lt;br /&gt;`. When the RSS reader reads 
it, it first unescape it to get the HTML, and obtain `&eacute;<br />` 
which is valid HTML that will display as `é` with a line break. Of 
course it's easier to enclose it in a CDATA block (`<CDATA[[&eacute;<br 
/>]]>`) and we get the same result.

> Perhaps a "best-practice" would be to parse $wp_filter for the 
> existance of
> htmlentities. It would only work if everyone agreed on it, but it's a
> solution you can use today.

Wrong. If I want the summary to be correct in the Atom feed, the 
summary tag needs the type and mode attributes set to "text/html" and 
"escaped". Of course the feed validates if I do not set them, but the 
HTML will be displayed as text by a viewer that follow the 
specification.

This means I can't include tags into the excerpt in a correct manner 
without a modification to the wp-atom.php template. Saying: "If you use 
Markdown (which will put tags in the excerpt), please replace the 
wp-atom.php file with this one" is not much user-friendly for a plugin 
you can activate and deactivate from the web interface.

I believe someone made the assumption that the excerpt in WP would 
always contain plain text. This is why stripping the tags is the only 
compatible solution for WordPress 1.2.

If WP 1.3 allows HTML in the excerpt/description/summary, I want to 
allow PHP Markdown to emit tags. This is why there was a version check 
when I wrote this:

	add_filter('the_excerpt_rss', 'Markdown', 6);
	if ($wp_version == 1.2)
		add_filter('the_excerpt_rss', 'strip_tags', 100);

This way, when you load the plugin in version 1.3 of WP, tags are not 
stripped and WordPress can write them down correctly (escaped or 
enclosed in a CDATA block), assuming WordPress 1.3 can deal with tags 
itself by removing or encoding them.

## Summary ##

If a tag or a tag-delimiter character is found in an entry excerpt, it 
will create an invalid RSS or Atom feed, *plugins or not*. This is what 
should be corrected. My solution is to add a 'strip_tags' filter to my 
plugin so that it does not put any tags in the excerpt while running in 
WP 1.2. I hope this is only a temporary solution. I think WordPress 
itself should handle tags in an excerpt without invalidating the feeds.

A way to correct this in WordPress could be to always assume the text 
is in HTML format and write the excerpt in the feeds like this (RSS):

	<description><CDATA[[ ... ]]></description>

And like this (Atom):

	<summary type="text/html" mode="escaped">
		<CDATA[[ ... ]]>
	</summary>

I hope things are clearer now.


Michel Fortin
michel.fortin at michelf.com
http://www.michelf.com/




More information about the hackers mailing list