[wp-trac] [WordPress Trac] #38062: SELECT DISTINCT ... ORDER BY ...

WordPress Trac noreply at wordpress.org
Tue Apr 9 18:51:23 UTC 2019


#38062: SELECT DISTINCT ... ORDER BY ...
-------------------------------------+------------------------------
 Reporter:  yscrappy                 |       Owner:  (none)
     Type:  defect (bug)             |      Status:  new
 Priority:  normal                   |   Milestone:  Awaiting Review
Component:  Query                    |     Version:  4.6
 Severity:  normal                   |  Resolution:
 Keywords:  has-patch needs-refresh  |     Focuses:
-------------------------------------+------------------------------

Old description:

> I'm not 100% certain of how best to change this in the code, but I've
> attached a patch that I believe is correct and fixes it ...
>
> In '''./wp-includes/class-wp-query.php:2810''' (trunk), there is a query
> that looks like:
>

> {{{
> SELECT $found_rows $distinct {$this->db->posts}.ID
>   FROM {$this->db->posts} $join
>  WHERE 1=1 $where $groupby $orderby $limits
> }}}
> When run, in some cases at least, the query turns into:
>

> {{{
> SELECT SQL_CALC_FOUND_ROWS distinct wp_posts.ID
>   FROM wp_posts
>  ...
> ORDER BY wp_posts.post_date ...
> }}}
>

> The problem with the query is that it could potentially give
> unpredictable results ... the only reason it doesn't is because
> wp_posts.ID happens to be a Primary Key, and therefore, is already
> guaranteed that the post_date+ID will always be unique.
>
> Now, I understand why the 'distinct' is used, as there are JOINs involved
> in the expanded query that could result in multiple rows being returned
> for each wp_post.ID, ie:
>
> ID     post_date
> 1        2016-09-01
> 2        2016-08-01
> 3        2016-07-01
> 4        2016-06-01
> 1        2016-09-01
> 4        2016-06-01
>
> , but if that weren't the case, then the  unpredictable results could
> come from a case like:
>
> ID     post_date
> 1        2016-09-01
> 2        2016-08-01
> 3        2016-07-01
> 4        2016-06-01
> 1        2016-05-01
> 4        2016-04-01
>
> For the above query, which 1,4 date is to be used?   If the first, then
> the order would be 1,2,3,4 ... if the second, the order would be 2,3,1,4
> ...
>
> If the query was changed to:
>

> {{{
> SELECT SQL_CALC_FOUND_ROWS distinct wp_posts.ID, wp_posts.post_date
> }}}
>

> ... then the query becomes SQL compliant, and it is no longer possible to
> get unpredictable results, since in both my 'bad example' above, and in
> the case of the actually database table as defined for Wordpress, the
> result would end up being:
>
> 1,2,3,4
>
> I attached a patch that fixes the query in such a way that makes the
> query SQL Compliant and removes the potential unpredicatability of the
> results that right now is protected against by ensuring that the field
> itself is always UNIQUE in the first place ...

New description:

 I'm not 100% certain of how best to change this in the code, but I've
 attached a patch that I believe is correct and fixes it ...

 In '''./wp-includes/class-wp-query.php:2810''' (trunk), there is a query
 that looks like:


 {{{
 SELECT $found_rows $distinct {$this->db->posts}.ID
   FROM {$this->db->posts} $join
  WHERE 1=1 $where $groupby $orderby $limits
 }}}
 When run, in some cases at least, the query turns into:


 {{{
 SELECT SQL_CALC_FOUND_ROWS distinct wp_posts.ID
   FROM wp_posts
  ...
 ORDER BY wp_posts.post_date ...
 }}}


 The problem with the query is that it could potentially give unpredictable
 results ... the only reason it doesn't is because wp_posts.ID happens to
 be a Primary Key, and therefore, is already guaranteed that the
 post_date+ID will always be unique.

 Now, I understand why the 'distinct' is used, as there are JOINs involved
 in the expanded query that could result in multiple rows being returned
 for each wp_post.ID, ie:

 ID     post_date
 1        2016-09-01
 2        2016-08-01
 3        2016-07-01
 4        2016-06-01
 1        2016-09-01
 4        2016-06-01

 , but if that weren't the case, then the  unpredictable results could come
 from a case like:

 ID     post_date
 1        2016-09-01
 2        2016-08-01
 3        2016-07-01
 4        2016-06-01
 1        2016-05-01
 4        2016-04-01

 For the above query, which 1,4 date is to be used?   If the first, then
 the order would be 1,2,3,4 ... if the second, the order would be 2,3,1,4
 ...

 If the query was changed to:


 {{{
 SELECT SQL_CALC_FOUND_ROWS distinct wp_posts.ID, wp_posts.post_date
 }}}


 ... then the query becomes SQL compliant, and it is no longer possible to
 get unpredictable results, since in both my 'bad example' above, and in
 the case of the actually database table as defined for WordPress, the
 result would end up being:

 1,2,3,4

 I attached a patch that fixes the query in such a way that makes the query
 SQL Compliant and removes the potential unpredicatability of the results
 that right now is protected against by ensuring that the field itself is
 always UNIQUE in the first place ...

--

Comment (by desrosj):

 @yscrappy are you able to refresh your patch to apply cleanly to `trunk`?

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/38062#comment:4>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list