[wp-hackers] RSS/Atom excerpt and filters

Michel Fortin michel.fortin at michelf.com
Sat Jul 3 17:16:53 UTC 2004


Hi, I'm new on this list and I came because of something that may 
require some developer discussion. I do not use WordPress much myself, 
but I maintain [PHP Markdown][1], a text formatter tool. The Markdown 
plugin made by Matt and bundled with WordPress 1.2 is based on my work, 
which didn't include natively a plugin interface at the time.

[1]: http://www.michelf.com/projects/php-markdown/

Now to the main subject...
It's easy to filter excerpts in RSS and Atom feeds using the 
"the_excerpt_rss" hook, but I believe there is a problem with the way 
it works currently. While I'm going to talk about Markdown, everything 
is also valid for Textile.

WordPress automatically creates an excerpt from a post when the excerpt 
field is left empty. This is used in RSS and Atom feeds for the 
"descrition" and "summary" tags. But what happens if the excerpt 
contains HTML? In this case, the HTML need to be encoded (changed into 
entities) and with Atom the type and mode attribute need to be set on 
the summary tag like this:

	<summary type="text/html" mode="escaped">
		&lt;b&gt;Hello&lt;/b&gt; world!
	</summary>

So if I want to use Markdown as the filter for the excerpt made 
automatically from my Markdown-formatted posts, I can do this:

	add_filter('the_excerpt_rss', 'Markdown', 6);
	add_filter('the_excerpt_rss', 'htmlentities', 100);

First point: If another plugin does the same thing after mine, the 
excerpt will be escaped twice and the result won't be too good to look 
at. Wouldn't it be better if WordPress was applying "htmlentities" by 
itself?

Second point: that's great, it may work if I take care that no other 
plugins does the same and I change the Atom template in order to add 
the required attributes. So how should I distribute a plugin that does 
this in a user-friendly manner?

The response to the second question is simple: there is only one way to 
be sure a filter will not prevent the RSS or Atom feed to validate in 
the current implementation of WordPress: remove HTML tags! ... like 
this:

	add_filter('the_excerpt_rss', 'Markdown', 6);
	add_filter('the_excerpt_rss', 'strip_tags', 100);

Or, if I am more concerned about forward compatibility:

	add_filter('the_excerpt_rss', 'Markdown', 6);
	if ($wp_version == 1.2)
		add_filter('the_excerpt_rss', 'strip_tags', 100);

This last solution is implying that the problem will be solved in some 
way in the next release version of WordPress.

Instead of escaping the text with entities, we could use a CDATA 
section. This does not help much since we still have to add CDATA block 
delimiters around the text (instead of escaping) and add the same 
attributes to the summary tag in Atom. It may be a little better still 
since it would only require modification to the templates.

Any ideas?

Michel Fortin
michel.fortin at michelf.com
http://www.michelf.com/




More information about the hackers mailing list