[wp-trac] Re: [WordPress Trac] #7563: html_entity_decode at RSS
Feed import doesn't respect charset of Blog
WordPress Trac
wp-trac at lists.automattic.com
Sun Sep 14 20:09:33 GMT 2008
#7563: html_entity_decode at RSS Feed import doesn't respect charset of Blog
------------------------------------------+---------------------------------
Reporter: codestyling | Owner: anonymous
Type: defect | Status: new
Priority: high | Milestone: 2.7
Component: General | Version: 2.5.1
Severity: critical | Resolution:
Keywords: rss bug feed encoding damage |
------------------------------------------+---------------------------------
Changes (by codestyling):
* keywords: => rss bug feed encoding damage
* version: => 2.5.1
Comment:
I have created a patch for MagpieRSS class to be able to handle the
imported Feeds correctly.
The patch is made for PHP4 versions, which doesn't detect the feeds
encoding (UTF-8 feeds will be handled as ISO feeds and also for PHP5
versions (with detection) to ensure qualified ISO based html entities
gets converted into UTF-8 target.
Here are 2 feeds gets damaged, if added to dashboard:
-> ISO-8859-1 feed
{{{
http://www.maerkischeallgemeine.de/cms/list/6947650?style_only=J&cms_encoding=iso
}}}
-> UTF-8 Feed with ISO entities (like ä)
{{{
http://blog.wordpress-deutschland.org/feed
}}}
The patch has been tested at PHP4 and PHP5 with both example feeds and
show them now correctly. Also the database doesn't store anymore damaged
option values (broken serialize using original rss.php, sometimes
dependend on feed content)
Input encoding will be detected using regular expression at raw data and
output enconding will be set using charset of blog by given option value.
--
Ticket URL: <http://trac.wordpress.org/ticket/7563#comment:2>
WordPress Trac <http://trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list