[wp-trac] [WordPress Trac] #16706: Queries using "category__and" are slow on large databases

Sat Apr 30 05:10:44 UTC 2011

#16706: Queries using "category__and" are slow on large databases
-------------------------+-----------------------
 Reporter:  tigertech    |       Owner:  scribu
     Type:  enhancement  |      Status:  accepted
 Priority:  normal       |   Milestone:  3.2
Component:  Performance  |     Version:  3.1
 Severity:  normal       |  Resolution:
 Keywords:  has-patch    |
-------------------------+-----------------------

Comment (by Denis-de-Bernardy):

 I'd advise adding an extra test-case for this one.

 As I read the initial query plan with the group by/having clause, MySQL is
 merge joining two large sets, and the resulting list of posts gets sorted
 in the best possible way is available.

 The suggested optimization, by contrast, results in a nested loop over the
 posts' index, with a subquery check for each row. This is faster with your
 test data, but not always. For instance, if you've a reasonably large
 posts table (say 100k rows) each sprinkled with a term or two, and the
 final result is no match at all or one very far away in the index, you may
 very well end up visiting the whole index on the posts table (which both
 plans do anyway), along with a subquery for each row (which is then
 considerably slower than merge joining the mess).

 Point is, the merge join plan is appropriate if there are very few
 resulting rows.

 Out of curiosity, has the following plan been tried at any point while
 testing or in prior implementations?

 {{{
 SELECT SQL_CALC_FOUND_ROWS
 wp_posts.*
 FROM wp_posts
 JOIN wp_term_relationships
 ON wp_posts.ID = wp_term_relationships.object_id
 AND term_taxonomy_id IN (1,461)
 WHERE 1=1
 AND wp_posts.post_type = 'post'
 AND (wp_posts.post_status = 'publish')
 GROUP BY wp_posts.ID
 HAVING COUNT(DISTINCT wp_term_relationships.term_taxonomy_id) = 2
 ORDER BY wp_posts.post_date DESC
 LIMIT 0, 6;
 }}}

 The COUNT(DISTINCT wp_term_relationships.term_taxonomy_id) rather than
 COUNT(*) should ensure we don't get bogus rows when additional joins kick
 in. Intuitively, the MySQL planner should do a nested loop from whichever
 of posts or terms will yield the least results.

 If the above introduces two additional sorts, maybe try this one too (so
 as to aggregate over the pre-sorted result set):

 {{{
 SELECT SQL_CALC_FOUND_ROWS
 wp_posts.*
 FROM wp_posts
 JOIN wp_term_relationships
 ON wp_posts.ID = wp_term_relationships.object_id
 AND term_taxonomy_id IN (1,461)
 WHERE 1=1
 AND wp_posts.post_type = 'post'
 AND (wp_posts.post_status = 'publish')
 GROUP BY wp_posts.post_date DESC, wp_posts.ID
 HAVING COUNT(DISTINCT wp_term_relationships.term_taxonomy_id) = 2
 ORDER BY wp_posts.post_date DESC
 LIMIT 0, 6;
 }}}

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/16706#comment:29>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software