[wp-trac] [WordPress Trac] #44349: Posts show up multiple times in backend, when imported with Import WordPress tool
WordPress Trac
noreply at wordpress.org
Mon Jun 11 18:14:05 UTC 2018
#44349: Posts show up multiple times in backend, when imported with Import
WordPress tool
-------------------------------------------+------------------------------
Reporter: wzshop | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Query | Version: 4.9.6
Severity: normal | Resolution:
Keywords: reporter-feedback 2nd-opinion | Focuses:
-------------------------------------------+------------------------------
Changes (by pbiron):
* keywords: reporter-feedback => reporter-feedback 2nd-opinion
* component: Import => Query
Comment:
Thanx for the video! Now that I understand what you are reporting, I can
confirm that I'm seeing what you are seeing. At first sight this appears
to be a bug in the importer, but it is not. The explanation is a little
complicated, but I'll do my best.
First, you can confirm that "Post 129" was only imported once by searching
for "Post 129" and seeing that it only appears once in the search results.
The reason it appears on page 1 & page 2 is the result of 2 factors:
1. more than one post in the WXR file being imported have the same value
in the `<wp:post_date>` element, thus more than one of the imported posts
have the same value in the `post_date` field.
1. the pagination on `/wp-admin/edit.php` is accomplished via `LIMIT` and
`ORDER BY` clauses in the SQL request. MySQL performs certain
optimizations on queries that include a `LIMIT` and those optimizations
are what is producing the seemingly buggy behavior you're seeing.
As explained in MySQL's [[https://dev.mysql.com/doc/refman/5.7/en/limit-
optimization.html|LIMIT Query Optimization]],
> If you combine LIMIT row_count with ORDER BY, MySQL stops sorting as
soon as it has found the first row_count rows of the sorted result, rather
than sorting the entire result. If ordering is done by using an index,
this is very fast. If a filesort must be done, all rows that match the
query without the LIMIT clause are selected, and most or all of them are
sorted, before the first row_count are found. After the initial rows have
been found, MySQL does not sort any remainder of the result set.
>
> One manifestation of this behavior is that an ORDER BY query with and
without LIMIT may return rows in different order, as described later in
this section.
and more importantly
> If multiple rows have identical values in the ORDER BY columns, the
server is free to return those rows in any order, and may do so
differently depending on the overall execution plan. In other words, the
sort order of those rows is nondeterministic with respect to the
nonordered columns.
So, with the posts imported from `post export demo.xml`, the 1st page of
posts when `Number of items per page = 20` (in `Screen Options`) contains
some posts with `post_date = 2018-06-11 15:51:54` and some with `post_date
= 2018-06-11 15:51:53` (the later including "Post 129") with the order of
posts within each "group" being "random". The 2nd page contains some
posts with `post_date = 2018-06-11 15:51:53` (including "Post 129") and
some with `post_date = 2018-06-11 15:51:52`.
If you change `Number of items per page = 30` all of the posts with
`post_date = 2018-06-11 15:51:53` will appear on the 1st page (again in a
"random" order) and thus "Post 129" will only appear on that 1st page.
Alternatively, if the posts in the site you exported (to produce `post
export demo.xml`) all had unique `post_date`'s, then when you import them
into another site you will not see this behavior.
Yes, this is behavior is confusing, but it is a result of MySQL's "Limit
Query Optimization" and is not a bug in the Importer nor in `WP_Query`.
Thus, I believe this ticket should be closed as "invalid", but I will
leave it to one of the SQL gurus on the team to confirm my analysis and
close it accordingly.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/44349#comment:5>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list