[wp-trac] [WordPress Trac] #19998: Feeds can contain characters that are not valid XML

WordPress Trac noreply at wordpress.org
Sat Mar 28 15:53:18 UTC 2015


#19998: Feeds can contain characters that are not valid XML
--------------------------+------------------------------
 Reporter:  westi         |       Owner:
     Type:  defect (bug)  |      Status:  new
 Priority:  normal        |   Milestone:  Awaiting Review
Component:  Feeds         |     Version:  3.3.1
 Severity:  normal        |  Resolution:
 Keywords:  has-patch     |     Focuses:
--------------------------+------------------------------

Comment (by mdgl):

 Replying to [comment:8 stevenkword]:
 > @mdgl I've created a project on GitHub (https://github.com/stevenkword
 /Finely-Tuned-Feeds) where I am taking a look a the bigger XML escaping
 problem.

 Sorry for the delay but I've been busy for a while and have had only
 limited time to take a look at your plugin. It appears to be at the
 beginning of its development and I'm afraid I can't quite see where you
 are headed with your XML escaping abstraction.

 It may be old-fashioned to do so but I think we first need to identify the
 use cases and requirements for an `esc_xml()` function. For example, there
 appear to be four main situations where we may need to perform XML
 escaping:
 1. Source data is plain text and XML element supports just plain text.
 2. Source data is HTML but XML element supports just plain text.
 3. Source data is plain text but XML element supports embedded HTML.
 4. Source data is HTML and XML element supports embedded HTML.
 I think cases (1), (3) and (4) might be a relatively straightforward
 matter of just escaping the XML special characters. Case (2) is more
 interesting as here we would first need to strip HTML tags and replace any
 HTML entities that are not also valid in XML before escaping the other XML
 special characters. Perhaps this is best left to two separate functions
 but that does make it harder to avoid double escaping. In each case, we
 also need to deal with any invalid character encodings (whether UTF-8 or
 otherwise). The choice of whether to escape using a CDATA block or just
 using basic escaping of the XML special characters should be configurable
 through a filter, perhaps defaulting on a field-by-field basis.

 Anyway, I'm away for the next couple of weeks but will try to find time to
 take another look at this sometime next month.

--
Ticket URL: <https://core.trac.wordpress.org/ticket/19998#comment:9>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list