[wp-trac] [WordPress Trac] #52559: When doing JOINs for meta queries the meta_key should be in the ON clause whenever possible.

WordPress Trac noreply at wordpress.org
Wed Feb 17 17:17:35 UTC 2021


#52559: When doing JOINs for meta queries the meta_key should be in the ON clause
whenever possible.
--------------------------+-----------------------------
 Reporter:  herregroen    |      Owner:  (none)
     Type:  defect (bug)  |     Status:  new
 Priority:  normal        |  Milestone:  Awaiting Review
Component:  General       |    Version:
 Severity:  normal        |   Keywords:
  Focuses:  performance   |
--------------------------+-----------------------------
 Currently when doing `JOIN`s for meta queries the meta_key is only
 included in the `ON` clause if the meta compare is `NOT EXISTS`. In all
 other cases only the `post_id` is included.

 This means that the meta key is only filtered after the JOIN is done.

 Case in point, a meta_query that with 5 different keys generates the
 following query:
 {{{#!sql
 SELECT SQL_CALC_FOUND_ROWS  wp_posts.ID FROM wp_posts
 LEFT JOIN wp_postmeta ON ( wp_posts.ID = wp_postmeta.post_id )
 LEFT JOIN wp_postmeta AS mt1 ON ( wp_posts.ID = mt1.post_id )
 LEFT JOIN wp_postmeta AS mt2 ON ( wp_posts.ID = mt2.post_id )
 LEFT JOIN wp_postmeta AS mt3 ON ( wp_posts.ID = mt3.post_id )
 LEFT JOIN wp_postmeta AS mt4 ON (wp_posts.ID = mt4.post_id AND
 mt4.meta_key = 'events_time_frame_end' )
 WHERE 1=1  AND (
   ( wp_postmeta.meta_key = 'events_date_till' AND wp_postmeta.meta_value
 >= '20210217' )
   AND
   (
     ( mt1.meta_key = 'events_date_till' AND mt1.meta_value > '20210217' )
     OR
     (
       ( mt2.meta_key = 'events_date_till' AND mt2.meta_value = '20210217'
 )
       AND
       ( mt3.meta_key = 'events_time_frame_end' AND mt3.meta_value >=
 '14:59:19' )
     )
     OR
     mt4.post_id IS NULL
   )
 ) AND wp_posts.post_type = 'events' AND (wp_posts.post_status = 'publish'
 OR wp_posts.post_status = 'acf-disabled' OR wp_posts.post_status = 'dp-
 rewrite-republish') GROUP BY wp_posts.ID ORDER BY wp_postmeta.meta_value
 ASC LIMIT 0, 10;
 }}}

 This query takes 60 seconds on very well provisioned database with posts
 that each have 50 postmeta rows in a database with 50 million total
 postmeta rows.

 In contrast the following query:
 {{{#!sql
 SELECT SQL_CALC_FOUND_ROWS  wp_posts.ID FROM wp_posts
 LEFT JOIN wp_postmeta ON ( wp_posts.ID = wp_postmeta.post_id AND
 wp_postmeta.meta_key = 'events_date_till' )
 LEFT JOIN wp_postmeta AS mt1 ON ( wp_posts.ID = mt1.post_id AND
 mt1.meta_key = 'events_date_till' )
 LEFT JOIN wp_postmeta AS mt2 ON ( wp_posts.ID = mt2.post_id AND
 mt2.meta_key = 'events_date_till' )
 LEFT JOIN wp_postmeta AS mt3 ON ( wp_posts.ID = mt3.post_id AND
 mt3.meta_key = 'events_time_frame_end' )
 LEFT JOIN wp_postmeta AS mt4 ON (wp_posts.ID = mt4.post_id AND
 mt4.meta_key = 'events_time_frame_end' )
 WHERE 1=1  AND (
   ( wp_postmeta.meta_value >= '20210217' )
   AND
   (
     ( mt1.meta_value > '20210217' )
     OR
     (
       ( mt2.meta_value = '20210217' )
       AND
       ( mt3.meta_value >= '14:59:19' )
     )
     OR
     mt4.post_id IS NULL
   )
 ) AND wp_posts.post_type = 'events' AND (wp_posts.post_status = 'publish'
 OR wp_posts.post_status = 'acf-disabled' OR wp_posts.post_status = 'dp-
 rewrite-republish') GROUP BY wp_posts.ID ORDER BY wp_postmeta.meta_value
 ASC LIMIT 0, 10;
 }}}

 Takes 400ms and returns identical results.

 I believe the cause here is that because in the first query filtering is
 only done after the join and each event has 50 postmeta fields MySQL is
 first creating a virtual table with 50^4^ ( 4 because the last JOIN does
 include the meta_key ) rows joined resulting in 60.000.000 total that are
 then filtered back down.

 Adding the meta_key condition to the `ON` clause means we're only joining
 1 ( or a small number in the case of the meta_key existing in multiple
 rows ) rows for each JOIN meaning the virtual table being created is
 significantly smaller thus having much better query performance.

 At the very least when there is any LEFT JOIN occurring then ALL
 `meta_key` conditions should be moved in the `ON` clause. Although it may
 be simpler to always have the `meta_key` condition in the `ON` clause.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/52559>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list