[wp-trac] [WordPress Trac] #12137: Wordpress import module does not correctly parse XML
WordPress Trac
wp-trac at lists.automattic.com
Fri Feb 5 10:56:37 UTC 2010
#12137: Wordpress import module does not correctly parse XML
--------------------------+-------------------------------------------------
Reporter: greggman | Owner:
Type: defect (bug) | Status: new
Priority: normal | Milestone: Unassigned
Component: Import | Version: 2.9.1
Severity: normal | Keywords:
--------------------------+-------------------------------------------------
I'm not sure if I can say this well. Basically the Wordpress import module
claims to read a modified form of RSS which is based on XML. But the
import module is not actually reading XML, it's just parsing text with
hardcoded rules. This means you can give perfectly valid XML files and it
will fail
Examples. In XML the following 2 lines represent exactly the same data
<content:encoded>hello world</content:encoded>
<content:encoded><![CDATA[hello world]]></content:encoded>
Yet wordpress's import is hardcoded to require the second form.
Another example, these 2 examples represent exactly the same data in XML
--example 1--
<wp:category><wp:cat_name>news</wp:cat_name></wp:category>
--example 2-
<wp:category>
<wp:cat_name>news</wp:cat_name>
</wp:category>
Yet the wordpress importer is hardcoded to only except the first form.
There are many other examples.
The suggestion is to use the build in PHP XML libraries to read the files
and then get the data from those. They will correctly parse XML data
regardless of whitespace, entity or cdata differences.
--
Ticket URL: <http://core.trac.wordpress.org/ticket/12137>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list