[wp-trac] [WordPress Trac] #16706: Queries using "category__and" are slow on large databases
WordPress Trac
wp-trac at lists.automattic.com
Sat Apr 30 05:10:44 UTC 2011
#16706: Queries using "category__and" are slow on large databases
-------------------------+-----------------------
Reporter: tigertech | Owner: scribu
Type: enhancement | Status: accepted
Priority: normal | Milestone: 3.2
Component: Performance | Version: 3.1
Severity: normal | Resolution:
Keywords: has-patch |
-------------------------+-----------------------
Comment (by Denis-de-Bernardy):
I'd advise adding an extra test-case for this one.
As I read the initial query plan with the group by/having clause, MySQL is
merge joining two large sets, and the resulting list of posts gets sorted
in the best possible way is available.
The suggested optimization, by contrast, results in a nested loop over the
posts' index, with a subquery check for each row. This is faster with your
test data, but not always. For instance, if you've a reasonably large
posts table (say 100k rows) each sprinkled with a term or two, and the
final result is no match at all or one very far away in the index, you may
very well end up visiting the whole index on the posts table (which both
plans do anyway), along with a subquery for each row (which is then
considerably slower than merge joining the mess).
Point is, the merge join plan is appropriate if there are very few
resulting rows.
Out of curiosity, has the following plan been tried at any point while
testing or in prior implementations?
{{{
SELECT SQL_CALC_FOUND_ROWS
wp_posts.*
FROM wp_posts
JOIN wp_term_relationships
ON wp_posts.ID = wp_term_relationships.object_id
AND term_taxonomy_id IN (1,461)
WHERE 1=1
AND wp_posts.post_type = 'post'
AND (wp_posts.post_status = 'publish')
GROUP BY wp_posts.ID
HAVING COUNT(DISTINCT wp_term_relationships.term_taxonomy_id) = 2
ORDER BY wp_posts.post_date DESC
LIMIT 0, 6;
}}}
The COUNT(DISTINCT wp_term_relationships.term_taxonomy_id) rather than
COUNT(*) should ensure we don't get bogus rows when additional joins kick
in. Intuitively, the MySQL planner should do a nested loop from whichever
of posts or terms will yield the least results.
If the above introduces two additional sorts, maybe try this one too (so
as to aggregate over the pre-sorted result set):
{{{
SELECT SQL_CALC_FOUND_ROWS
wp_posts.*
FROM wp_posts
JOIN wp_term_relationships
ON wp_posts.ID = wp_term_relationships.object_id
AND term_taxonomy_id IN (1,461)
WHERE 1=1
AND wp_posts.post_type = 'post'
AND (wp_posts.post_status = 'publish')
GROUP BY wp_posts.post_date DESC, wp_posts.ID
HAVING COUNT(DISTINCT wp_term_relationships.term_taxonomy_id) = 2
ORDER BY wp_posts.post_date DESC
LIMIT 0, 6;
}}}
--
Ticket URL: <http://core.trac.wordpress.org/ticket/16706#comment:29>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list