[wp-trac] [WordPress Trac] #15197: WXR export/import umbrella ticket

WordPress Trac wp-trac at lists.automattic.com
Sat Oct 23 12:56:22 UTC 2010

#15197: WXR export/import umbrella ticket
 Reporter:  duck_         |       Owner:                       
     Type:  defect (bug)  |      Status:  new                  
 Priority:  normal        |   Milestone:  Awaiting Review      
Component:  Export        |     Version:                       
 Severity:  normal        |    Keywords:  has-patch ux-feedback
 Umbrella ticket for a number of upgrades to the WXR export/import process.

 == Export ==
  * Bump WXR version to 1.1
  * Removed filtering ''for now'' (see explanation below)
  * Removed `wxr_missing_parents` (local function), seems to be a remnant
 from pre-`get_categories`
  * Added author information to export (for better import UX) - #11118
  * Greater usage of slug-like identifiers, e.g. login instead of name in
  * Don't export auto-drafts
  * Filled in docs
  * Ignore _edit_lock and _edit_last meta keys
  * Only use the 'forward compatible' term tags, `<category domain="foo"
 nicename="bar">`, within post items

 == Import ==
  * Use an XML parser (where available). 3 parser options:
 [http://www.php.net/manual/en/book.simplexml.php SimpleXML] (yay!),
 [http://www.php.net/manual/en/book.xml.php XML Parser] (yay!), or regular
 expressions (boo!)
  * Proper import support for nav menus - #14750
    * Menu items for missing content will be skipped, there ''should'' be
 no problems when an associated object is further down the import file than
 the menu item
    * Orphaned menu items (e.g. their parent was skipped due to above
 point) will become top-level
  * Greater usage of slug-like identifiers, e.g. Use `<category
 domain="..." nicename="...">` tags to fix a bunch of category issues
  * Either import author as is (i.e. from information stored in WXR file,
 this allows us to create a user with more data by default) or map to an
 existing user - #10319
  * Less direct feedback (ignoring errors, currently none :( !), as it is
 unwieldy for a large import.

 All accompanied by a number of smaller changes and anything I forgot to
 write down.

 == Further work ==

 === Backwards Compatibility ===
 The main problem for now is ensuring backwards compatibility with WXR 1.0
 files. That said, no major faults ''should'' occur when importing a 1.0
 file. Excluding all the problems you will come across already in an
 export/import in 3.0.1:
  * No author import (the current importer takes author data from each
 post)[[BR]]'''Possible solution:''' if we get an empty author array then
 loop through posts grabbing unique authors and offering to map them (but
 not to import)
  * I think (not tested properly yet) that all term menu items will be
 skipped due to missing term_id XML tags so no way of mapping old ID to
 new[[BR]]'''Possible solution:''' off the top of my head, maybe slugs
 instead of IDs for processed_terms mapping (?)
  * Probably some indexes and vars which need to be checked with isset and
 fallback provided (for when the XML tag doesn't exist in 1.0 files)
  * ... and possibly more with further testing

 How far should this go back?
 Example: 3 years ago [6375] introduced forwards compatible category tags
 including the slug and taxonomy. These are the only category tags the
 parsers currently read, is it worth checking the really old style XML tags
 if no terms are found for a post (should be easy for SimpleXML and regular
 expressions, but I think will be harder for XML Parser)?

 === The problem of filtering ===
  * Potential to export a pretty useless file, e.g. choose Category:
 Uncategorized and Content Type: Pages
  * Makes reliable importing of nav menus harder (worse UX when importer is
 creating half made menus)

 Moving forward I am currently imagining some sort of grid of post types
 selectable by checkbox. Each post type lists its taxonomies below, these
 only activated/recognised if the post type is selected. But what filters
 to include and how to show them are probably for another ticket.

 === Other ===

 The feedback from the importer needs to be completed (see above), I was
 thinking of listing errors (default hidden with JS show?) and a table of
 results showing the number of successes and failures for each of authors,
 posts, terms, ...

 The `can_export` property of a post type only enables it to show up in the
 Content Types dropdown for export filtering, but if "All Content" is
 selected then all post types are exported including those with can_export
 set to false. Fix based on export patch here could be something like:
 $post_types = get_post_types( array( 'public' => true, 'can_export' =>
 true ) );
 $where = "post_type IN ('" . implode( "','", $post_types ) . "') AND
 post_status != 'auto-draft'";
 // grab a snapshot of post IDs, just in case it changes during the export
 $post_ids = $wpdb->get_col( "SELECT ID FROM $wpdb->posts WHERE $where
 ORDER BY post_date_gmt ASC" );
 (NB: would need to look into exactly which builtin posts are and should be
 can_export => false)

 Docs in the importer.

 Currently I have unit tests for the parsers and hopefully coming soon will
 be more for the whole process (need to think up a full checklist of tests
 for edge and problem cases)


 This is still partly a work in progress so feedback and a lot of testing
 please. Thank you.

 This ticket aims to fix the following:
 #5447 #5460 #7400 #7973 #8471 #9237 #10319 #11118 #11144 #11354 #11574
 #12685 #13364 #13394 #13453 #13454 #13627 #14306 #14442 #14524 #14750
 #15055 #15091 #15108

Ticket URL: <http://core.trac.wordpress.org/ticket/15197>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software

More information about the wp-trac mailing list