[wp-hackers] RSS/Atom excerpt and filters
Michel Fortin
michel.fortin at michelf.com
Sat Jul 3 17:16:53 UTC 2004
Hi, I'm new on this list and I came because of something that may
require some developer discussion. I do not use WordPress much myself,
but I maintain [PHP Markdown][1], a text formatter tool. The Markdown
plugin made by Matt and bundled with WordPress 1.2 is based on my work,
which didn't include natively a plugin interface at the time.
[1]: http://www.michelf.com/projects/php-markdown/
Now to the main subject...
It's easy to filter excerpts in RSS and Atom feeds using the
"the_excerpt_rss" hook, but I believe there is a problem with the way
it works currently. While I'm going to talk about Markdown, everything
is also valid for Textile.
WordPress automatically creates an excerpt from a post when the excerpt
field is left empty. This is used in RSS and Atom feeds for the
"descrition" and "summary" tags. But what happens if the excerpt
contains HTML? In this case, the HTML need to be encoded (changed into
entities) and with Atom the type and mode attribute need to be set on
the summary tag like this:
<summary type="text/html" mode="escaped">
<b>Hello</b> world!
</summary>
So if I want to use Markdown as the filter for the excerpt made
automatically from my Markdown-formatted posts, I can do this:
add_filter('the_excerpt_rss', 'Markdown', 6);
add_filter('the_excerpt_rss', 'htmlentities', 100);
First point: If another plugin does the same thing after mine, the
excerpt will be escaped twice and the result won't be too good to look
at. Wouldn't it be better if WordPress was applying "htmlentities" by
itself?
Second point: that's great, it may work if I take care that no other
plugins does the same and I change the Atom template in order to add
the required attributes. So how should I distribute a plugin that does
this in a user-friendly manner?
The response to the second question is simple: there is only one way to
be sure a filter will not prevent the RSS or Atom feed to validate in
the current implementation of WordPress: remove HTML tags! ... like
this:
add_filter('the_excerpt_rss', 'Markdown', 6);
add_filter('the_excerpt_rss', 'strip_tags', 100);
Or, if I am more concerned about forward compatibility:
add_filter('the_excerpt_rss', 'Markdown', 6);
if ($wp_version == 1.2)
add_filter('the_excerpt_rss', 'strip_tags', 100);
This last solution is implying that the problem will be solved in some
way in the next release version of WordPress.
Instead of escaping the text with entities, we could use a CDATA
section. This does not help much since we still have to add CDATA block
delimiters around the text (instead of escaping) and add the same
attributes to the summary tag in Atom. It may be a little better still
since it would only require modification to the templates.
Any ideas?
Michel Fortin
michel.fortin at michelf.com
http://www.michelf.com/
More information about the hackers
mailing list