[wp-trac] [WordPress Trac] #22279: WordPress Export/Import deletes carriage returns (was: Wordpress Export/Import deletes carriage returns)

WordPress Trac noreply at wordpress.org
Sat Feb 22 09:40:44 UTC 2014


#22279: WordPress Export/Import deletes carriage returns
--------------------------+------------------------------
 Reporter:  mykle         |       Owner:
     Type:  defect (bug)  |      Status:  new
 Priority:  normal        |   Milestone:  Awaiting Review
Component:  Export        |     Version:  3.4.2
 Severity:  major         |  Resolution:
 Keywords:                |     Focuses:
--------------------------+------------------------------
Description changed by ocean90:

Old description:

> Wordpress export does not translate or escape bare CR characters in a
> CR/LF pair.  They show up unfiltered in the WXR export file.  I see this
> both in post_content and in strings that were serialized into a post_meta
> field.  The CR characters are in the WXR file, unfiltered.
>
> Then, Wordpress import loses these CR characters.  They are simply
> erased.  It may be because SimpleXMLParser can't or won't open the XML
> file in binary mode, so line ending translation can & does happen.
> That's just a theory, but if it's true then this behavior might *not*
> happen on all platforms or with all PHP versions.  (I'm seeing this on OS
> X 10.6.8, PHP 5.4.4.)
>
> In the worse case -- mine -- the munged string is a small component of a
> complex datastructure that is serialized in a postmeta record.  In this
> case, the entire meta_value field is deleted on import, because the data
> won't unserialize, because its length has changed.
>
> It seems to me that WP Export should escape any character that might be
> threatened in transit.  I'm no XML lawyer, but some sources claim that
> unescaped CR characters are invalid XML.
>
> To reproduce:
>
> * store a carriage return in a post.
> * export it to a WXR file.
> * examine the WXR file for the raw carriage return (^M).
> * import that file.
> * search for the carriage return.

New description:

 WordPress export does not translate or escape bare CR characters in a
 CR/LF pair.  They show up unfiltered in the WXR export file.  I see this
 both in post_content and in strings that were serialized into a post_meta
 field.  The CR characters are in the WXR file, unfiltered.

 Then, WordPress import loses these CR characters.  They are simply erased.
 It may be because SimpleXMLParser can't or won't open the XML file in
 binary mode, so line ending translation can & does happen.  That's just a
 theory, but if it's true then this behavior might *not* happen on all
 platforms or with all PHP versions.  (I'm seeing this on OS X 10.6.8, PHP
 5.4.4.)

 In the worse case -- mine -- the munged string is a small component of a
 complex datastructure that is serialized in a postmeta record.  In this
 case, the entire meta_value field is deleted on import, because the data
 won't unserialize, because its length has changed.

 It seems to me that WP Export should escape any character that might be
 threatened in transit.  I'm no XML lawyer, but some sources claim that
 unescaped CR characters are invalid XML.

 To reproduce:

 * store a carriage return in a post.
 * export it to a WXR file.
 * examine the WXR file for the raw carriage return (`^M`).
 * import that file.
 * search for the carriage return.

--

--
Ticket URL: <https://core.trac.wordpress.org/ticket/22279#comment:1>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list