[wp-trac] [WordPress Trac] #19368: UTF-8 characters truncated mid-byte sequence in excerpt in RSS2 feed
WordPress Trac
wp-trac at lists.automattic.com
Sun Nov 27 05:54:33 UTC 2011
#19368: UTF-8 characters truncated mid-byte sequence in excerpt in RSS2 feed
--------------------------+-----------------------------
Reporter: kurtmckee | Owner:
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: General | Version:
Severity: normal | Keywords:
--------------------------+-----------------------------
I received [https://code.google.com/p/feedparser/issues/detail?id=306 a
bug report at a project I maintain] and discovered what appears to be a
bug in Wordpress 3.2.1.
The trouble is that the `description` element is being truncated in the
middle of a UTF-8 multibyte character, which is producing garbage binary
data. An example can be found at:
http://www.arnaudmontebourg.fr/?feed=rss2
I downloaded [http://themocracy.com/2009/07/alibi3col-free-wordpress-
theme/ the site's theme] but found nothing that would affect
`post_excerpt` or `the_excerpt_rss`. I then downloaded Wordpress trunk and
attempted to figure out where the problem might be, but I'm unfamiliar
with the Wordpress source and couldn't find anything after tracing through
multiple files using grep.
I did discover that `trackback_url_list()` in `wp-includes/post.php`
appears to be using a simple `substr()` call that might cause problems
with multibyte characters. However, I'm more concerned with the potential
for malformed feeds.
I've included a copy of the feed XML in question for longevity.
--
Ticket URL: <http://core.trac.wordpress.org/ticket/19368>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list