[wp-trac] [WordPress Trac] #44349: Posts show up multiple times in backend, when imported with Import WordPress tool

WordPress Trac noreply at wordpress.org
Mon Jun 11 18:14:05 UTC 2018


#44349: Posts show up multiple times in backend, when imported with Import
WordPress tool
-------------------------------------------+------------------------------
 Reporter:  wzshop                         |       Owner:  (none)
     Type:  defect (bug)                   |      Status:  new
 Priority:  normal                         |   Milestone:  Awaiting Review
Component:  Query                          |     Version:  4.9.6
 Severity:  normal                         |  Resolution:
 Keywords:  reporter-feedback 2nd-opinion  |     Focuses:
-------------------------------------------+------------------------------
Changes (by pbiron):

 * keywords:  reporter-feedback => reporter-feedback 2nd-opinion
 * component:  Import => Query


Comment:

 Thanx for the video!  Now that I understand what you are reporting, I can
 confirm that I'm seeing what you are seeing.  At first sight this appears
 to be a bug in the importer, but it is not.  The explanation is a little
 complicated, but I'll do my best.

 First, you can confirm that "Post 129" was only imported once by searching
 for "Post 129" and seeing that it only appears once in the search results.

 The reason it appears on page 1 & page 2 is the result of 2 factors:

 1. more than one post in the WXR file being imported have the same value
 in the `<wp:post_date>` element, thus more than one of the imported posts
 have the same value in the `post_date` field.
 1. the pagination on `/wp-admin/edit.php` is accomplished via `LIMIT` and
 `ORDER BY` clauses in the SQL request.  MySQL performs certain
 optimizations on queries that include a `LIMIT` and those optimizations
 are what is producing the seemingly buggy behavior you're seeing.

 As explained in MySQL's [[https://dev.mysql.com/doc/refman/5.7/en/limit-
 optimization.html|LIMIT Query Optimization]],

 > If you combine LIMIT row_count with ORDER BY, MySQL stops sorting as
 soon as it has found the first row_count rows of the sorted result, rather
 than sorting the entire result. If ordering is done by using an index,
 this is very fast. If a filesort must be done, all rows that match the
 query without the LIMIT clause are selected, and most or all of them are
 sorted, before the first row_count are found. After the initial rows have
 been found, MySQL does not sort any remainder of the result set.
 >
 > One manifestation of this behavior is that an ORDER BY query with and
 without LIMIT may return rows in different order, as described later in
 this section.

 and more importantly

 > If multiple rows have identical values in the ORDER BY columns, the
 server is free to return those rows in any order, and may do so
 differently depending on the overall execution plan. In other words, the
 sort order of those rows is nondeterministic with respect to the
 nonordered columns.

 So, with the posts imported from `post export demo.xml`, the 1st page of
 posts when `Number of items per page = 20` (in `Screen Options`) contains
 some posts with `post_date = 2018-06-11 15:51:54` and some with `post_date
 = 2018-06-11 15:51:53` (the later including "Post 129") with the order of
 posts within each "group" being "random".  The 2nd page contains some
 posts with `post_date = 2018-06-11 15:51:53` (including "Post 129") and
 some with `post_date = 2018-06-11 15:51:52`.

 If you change `Number of items per page = 30` all of the posts with
 `post_date = 2018-06-11 15:51:53` will appear on the 1st page (again in a
 "random" order) and thus "Post 129" will only appear on that 1st page.

 Alternatively, if the posts in the site you exported (to produce `post
 export demo.xml`) all had unique `post_date`'s, then when you import them
 into another site you will not see this behavior.

 Yes, this is behavior is confusing, but it is a result of MySQL's "Limit
 Query Optimization" and is not a bug in the Importer nor in `WP_Query`.

 Thus, I believe this ticket should be closed as "invalid", but I will
 leave it to one of the SQL gurus on the team to confirm my analysis and
 close it accordingly.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/44349#comment:5>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list