[wp-trac] [WordPress Trac] #19998: Feeds can contain characters that are not valid XML

WordPress Trac wp-trac at lists.automattic.com
Fri Feb 10 06:57:45 UTC 2012


#19998: Feeds can contain characters that are not valid XML
--------------------------+------------------------------
 Reporter:  westi         |       Owner:
     Type:  defect (bug)  |      Status:  new
 Priority:  normal        |   Milestone:  Awaiting Review
Component:  Feeds         |     Version:  3.3.1
 Severity:  normal        |  Resolution:
 Keywords:  needs-patch   |
--------------------------+------------------------------

Comment (by solarissmoke):

 One approach could be to filter usgin the set of valid characters from the
 [http://www.w3.org/TR/REC-xml/#charsets spec]:

 {{{
 function strip_for_xml( $utf8 ) {
   return preg_replace(
 '/[^\x{0009}\x{000a}\x{000d}\x{0020}-\x{D7FF}\x{E000}-\x{FFFD}]+/u', ' ',
 $utf8 );
 }
 }}}

 This assumes that the feed is served as UTF-8. I've no idea what it would
 do to XML in other charsets.

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/19998#comment:1>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list