[wp-trac] [WordPress Trac] #8460: Imports produce duplicate posts in some cases

WordPress Trac wp-trac at lists.automattic.com
Tue Dec 2 14:21:56 GMT 2008

#8460: Imports produce duplicate posts in some cases
 Reporter:  tott    |       Owner:  tott                                       
     Type:  defect  |      Status:  new                                        
 Priority:  normal  |   Milestone:  2.8                                        
Component:  Import  |     Version:                                             
 Severity:  normal  |    Keywords:  import, duplicate, needs-testing, has-patch
 * When importing multiple times posts with post w/ out a title will create
 duplicates if import is re run.
  * comments by author's with apostrophe, like O'Tool will result in
 duplicate comments

 From what I see this is due to different sanitation rules in
 post/comment_exists functions than used in import (wp_insert_post /
 wp_insert_comment ). This was reported for the Movable Type importer and
 as far as I can see this probem also exists in other importers and I was
 able to reproduce it with the wordpress import as well.

 post_exists needs to be verified with sanitize_post_field(). a simple
 stripslashes as it is right now in post_exists will not bring the correct
 result for cases with escaped data.

 comment_exists seems only used within importers and there all the
 comment_authors are pre-escaped before passed to comment_exists. Running
 them through wpdb->prepare causes comment_exists to fail in cases with
 escaped data.

 There are two possible ways to fix this problem :

  * Fix *_exists functionality to produce correct matching, which might
 cause trouble on other places and needs to be tested really well
  * Fix the import functions in a way that the sanitation/conversion is
 done to the values that are passed to the *_exists functions.

 I included a patch that applies on the post_exists and comment_exists
 functions so no further changes in importers and other functions should be

 The patch also makes sure that titles/content needs to be unique per date
 and not within the whole blog. Also so far it was only checked for one of
 the submitted values content/title in combination with the date.
 Combinations of title/content where not handled correctly.

 I tested this with a small wxr export which I attach to this ticket. Also
 did some manual testing but it will need some tests against other
 importers as well.

Ticket URL: <http://trac.wordpress.org/ticket/8460>
WordPress Trac <http://trac.wordpress.org/>
WordPress blogging software

More information about the wp-trac mailing list