[wp-trac] [WordPress Trac] #19368: UTF-8 characters truncated mid-byte sequence in excerpt in RSS2 feed

WordPress Trac wp-trac at lists.automattic.com
Sun Nov 27 05:54:33 UTC 2011


#19368: UTF-8 characters truncated mid-byte sequence in excerpt in RSS2 feed
--------------------------+-----------------------------
 Reporter:  kurtmckee     |      Owner:
     Type:  defect (bug)  |     Status:  new
 Priority:  normal        |  Milestone:  Awaiting Review
Component:  General       |    Version:
 Severity:  normal        |   Keywords:
--------------------------+-----------------------------
 I received [https://code.google.com/p/feedparser/issues/detail?id=306 a
 bug report at a project I maintain] and discovered what appears to be a
 bug in Wordpress 3.2.1.

 The trouble is that the `description` element is being truncated in the
 middle of a UTF-8 multibyte character, which is producing garbage binary
 data. An example can be found at:

 http://www.arnaudmontebourg.fr/?feed=rss2

 I downloaded [http://themocracy.com/2009/07/alibi3col-free-wordpress-
 theme/ the site's theme] but found nothing that would affect
 `post_excerpt` or `the_excerpt_rss`. I then downloaded Wordpress trunk and
 attempted to figure out where the problem might be, but I'm unfamiliar
 with the Wordpress source and couldn't find anything after tracing through
 multiple files using grep.

 I did discover that `trackback_url_list()` in `wp-includes/post.php`
 appears to be using a simple `substr()` call that might cause problems
 with multibyte characters. However, I'm more concerned with the potential
 for malformed feeds.

 I've included a copy of the feed XML in question for longevity.

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/19368>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list