[wp-trac] [WordPress Trac] #19998: Feeds can contain characters that are not valid XML
WordPress Trac
noreply at wordpress.org
Sun Feb 22 11:50:23 UTC 2015
#19998: Feeds can contain characters that are not valid XML
--------------------------+------------------------------
Reporter: westi | Owner:
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Feeds | Version: 3.3.1
Severity: normal | Resolution:
Keywords: has-patch | Focuses:
--------------------------+------------------------------
Comment (by mdgl):
See #3670 for some initial ideas about a broader `esc_xml()` abstraction
that would allow us to more easily clean-up problems such as this one.
Many characters are invalid within XML (see http://www.w3.org/TR/REC-xml/)
and these need to be removed whether they occur directly or as part of
(numeric) character references (e.g. an explicit `£#x0b;` in the text).
Note that function `wp_kses_normalize_entities()` already deals with
invalid (numeric) character references by escaping them where necessary.
Unfortunately, this function also allows a hard-wired list of HTML named
entities, many of which are not valid in XML. Some simple re-factoring
could allow this function to support both HTML and XML.
The situation is slightly complicated for XML fields that will
subsequently be processed as HTML, as occurs in RSS and Atom. Here, if we
encode as CDATA we could leave such invalid named entities alone for later
processing.
I notice also we recently added some clean-up of directly-occurring
control characters to `wp_kses_no_null()` as part of #28506. This strips,
rather than escapes the characters, however and does not deal with many of
the other UTF-8 characters that are not valid in XML. That might be
slightly harder to address, given there appears to be some confusion over
exactly what character encoding is being used in all situations (is using
`blog_charset` sufficient?) as well as availability of PHP support for
UTF-8.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/19998#comment:7>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list