[wp-trac] [WordPress Trac] #8460: Imports produce duplicate posts
in some cases
WordPress Trac
wp-trac at lists.automattic.com
Tue Dec 2 14:21:56 GMT 2008
#8460: Imports produce duplicate posts in some cases
--------------------+-------------------------------------------------------
Reporter: tott | Owner: tott
Type: defect | Status: new
Priority: normal | Milestone: 2.8
Component: Import | Version:
Severity: normal | Keywords: import, duplicate, needs-testing, has-patch
--------------------+-------------------------------------------------------
* When importing multiple times posts with post w/ out a title will create
duplicates if import is re run.
* comments by author's with apostrophe, like O'Tool will result in
duplicate comments
From what I see this is due to different sanitation rules in
post/comment_exists functions than used in import (wp_insert_post /
wp_insert_comment ). This was reported for the Movable Type importer and
as far as I can see this probem also exists in other importers and I was
able to reproduce it with the wordpress import as well.
post_exists needs to be verified with sanitize_post_field(). a simple
stripslashes as it is right now in post_exists will not bring the correct
result for cases with escaped data.
comment_exists seems only used within importers and there all the
comment_authors are pre-escaped before passed to comment_exists. Running
them through wpdb->prepare causes comment_exists to fail in cases with
escaped data.
There are two possible ways to fix this problem :
* Fix *_exists functionality to produce correct matching, which might
cause trouble on other places and needs to be tested really well
* Fix the import functions in a way that the sanitation/conversion is
done to the values that are passed to the *_exists functions.
I included a patch that applies on the post_exists and comment_exists
functions so no further changes in importers and other functions should be
needed.
The patch also makes sure that titles/content needs to be unique per date
and not within the whole blog. Also so far it was only checked for one of
the submitted values content/title in combination with the date.
Combinations of title/content where not handled correctly.
I tested this with a small wxr export which I attach to this ticket. Also
did some manual testing but it will need some tests against other
importers as well.
--
Ticket URL: <http://trac.wordpress.org/ticket/8460>
WordPress Trac <http://trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list