[wp-trac] [WordPress Trac] #8351: Poorly optimized queries in wp_get_archives() and wp_get_calendar()

WordPress Trac wp-trac at lists.automattic.com
Wed Nov 26 09:14:08 GMT 2008


#8351: Poorly optimized queries in wp_get_archives() and wp_get_calendar()
---------------------------------+------------------------------------------
 Reporter:  vladimir_kolesnikov  |       Owner:  anonymous
     Type:  defect               |      Status:  new      
 Priority:  normal               |   Milestone:  2.7      
Component:  General              |     Version:           
 Severity:  normal               |    Keywords:  has-patch
---------------------------------+------------------------------------------
 Performance of the queries in is too bad wp_get_archives() and
 wp_get_calendar() if you have a lot of records in wp_posts table (so bad
 that hosting company suspends the account).

 '''1. wp_get_archives():'''


 {{{
 EXPLAIN SELECT DISTINCT YEAR(post_date) AS `year`, MONTH(post_date) AS
 `month`, count(ID) as posts FROM wp_posts WHERE post_type = 'post' AND
 post_status = 'publish' GROUP BY YEAR(post_date), MONTH(post_date) ORDER
 BY post_date DESC

 +----+-------------+----------+------+------------------+------------------+---------+-------------+------+-----------------------------------------------------------+
 | id | select_type | table    | type | possible_keys    | key
 | key_len | ref         | rows | Extra
 |
 +----+-------------+----------+------+------------------+------------------+---------+-------------+------+-----------------------------------------------------------+
 |  1 | SIMPLE      | wp_posts | ref  | type_status_date | type_status_date
 | 124     | const,const | 3594 | Using where; Using index; Using
 temporary; Using filesort |
 +----+-------------+----------+------+------------------+------------------+---------+-------------+------+-----------------------------------------------------------+
 }}}

 Nothing you can do with this query, as grouping by expression cannot use
 indices by definition. The only thing is to remove DISTINCT from the
 query, as grouping by year and month guarantees that every (year, month)
 pair in the result set will be unique. Same for yearly and daily archives.

 '''2. wp_get_calendar():'''

 Although we can do almost nothing to wp_get_archives() (due to DB design
 deficiencies), we can significantly boost wp_get_calendar().

 First, wp_get_calendar() checks if we have any posts at all:

 {{{
 SELECT ID from wp_posts WHERE post_type = 'post' AND post_status =
 'publish' ORDER BY post_date DESC LIMIT 1
 }}}

 Since we only need to check the existence of the record, we don't have any
 reason to use ORDER BY (although it does nothing harmful as MySQL is able
 to use index).

 If the query succeeded, WP starts searching for the posts in past months:

 {{{
 SELECT DISTINCT MONTH(post_date) AS month, YEAR(post_date) AS year FROM
 wp_posts WHERE post_date < '2008-11-01' AND post_type = 'post' AND
 post_status = 'publish' ORDER BY post_date DESC LIMIT 1
 }}}

 What is wrong: since we need only ONE record (see LIMIT 1), it will be
 unique in the result set anyway, DISTINCT is not needed. Here's the
 EXPLAIN of the query:

 {{{
 +----+-------------+----------+-------+----------------------------+------------------+---------+------+------+-------------------------------------------+
 | id | select_type | table    | type  | possible_keys              | key
 | key_len | ref  | rows | Extra                                     |
 +----+-------------+----------+-------+----------------------------+------------------+---------+------+------+-------------------------------------------+
 |  1 | SIMPLE      | wp_posts | range | type_status_date,post_type |
 type_status_date | 132     | NULL | 3238 | Using where; Using index; Using
 temporary |
 +----+-------------+----------+-------+----------------------------+------------------+---------+------+------+-------------------------------------------+
 }}}

 As you can see, MySQL needs a temporary table (it doesn't matter that it
 will have only one row, it will be created). If we remove DISTINCT, we can
 get rid of the temporary table:

 {{{
 EXPLAIN SELECT MONTH(post_date) AS month, YEAR(post_date) AS year FROM
 wp_posts WHERE post_date < '2008-11-01' AND post_type = 'post' AND
 post_status = 'publish' ORDER BY post_date DESC LIMIT 1

 +----+-------------+----------+-------+----------------------------+------------------+---------+------+------+--------------------------+
 | id | select_type | table    | type  | possible_keys              | key
 | key_len | ref  | rows | Extra                    |
 +----+-------------+----------+-------+----------------------------+------------------+---------+------+------+--------------------------+
 |  1 | SIMPLE      | wp_posts | range | type_status_date,post_type |
 type_status_date | 132     | NULL | 3238 | Using where; Using index |
 +----+-------------+----------+-------+----------------------------+------------------+---------+------+------+--------------------------+
 }}}

 Then WP looks for the posts written in the next months:

 {{{
 SELECT DISTINCT MONTH(post_date) AS month, YEAR(post_date) AS year FROM
 wp_posts
 WHERE post_date > '2008-11-01' AND MONTH( post_date ) != MONTH(
 '2008-11-01' )
 AND post_type = 'post' AND post_status = 'publish' ORDER BY post_date ASC
 LIMIT 1

 +----+-------------+----------+-------+----------------------------+------------------+---------+------+------+-------------------------------------------+
 | id | select_type | table    | type  | possible_keys              | key
 | key_len | ref  | rows | Extra                                     |
 +----+-------------+----------+-------+----------------------------+------------------+---------+------+------+-------------------------------------------+
 |  1 | SIMPLE      | wp_posts | range | type_status_date,post_type |
 type_status_date | 132     | NULL |  356 | Using where; Using index; Using
 temporary |
 +----+-------------+----------+-------+----------------------------+------------------+---------+------+------+-------------------------------------------+
 }}}

 Same as above: DISTINCT is not needed. But we still can rewrite the query
 in a better way:

 {{{
 EXPLAIN  SELECT MONTH(post_date) AS month, YEAR(post_date) AS year FROM
 wp_posts WHERE post_date > '2008-11-31 23:59:59' AND post_type = 'post'
 AND post_status = 'publish' ORDER BY post_date ASC LIMIT
 1;+----+-------------+----------+-------+----------------------------+------------------+---------+------+------+--------------------------+
 | id | select_type | table    | type  | possible_keys              | key
 | key_len | ref  | rows | Extra                    |
 +----+-------------+----------+-------+----------------------------+------------------+---------+------+------+--------------------------+
 |  1 | SIMPLE      | wp_posts | range | type_status_date,post_type |
 type_status_date | 132     | NULL |    1 | Using where; Using index |
 +----+-------------+----------+-------+----------------------------+------------------+---------+------+------+--------------------------+
 }}}

 We got rid of Using temporary AND we were able to make MySQL scan much
 less rows!

 Next, WP looks for the days in the month when at least one post was
 published:

 {{{
 SELECT DISTINCT DAYOFMONTH(post_date)
 FROM wp_posts WHERE MONTH(post_date) = '11'
 AND YEAR(post_date) = '2008'
 AND post_type = 'post' AND post_status = 'publish'
 AND post_date < '2008-11-24 18:30:04'
 }}}

 Only range optimization can be used here. But, as I have already
 mentioned, MySQL is unable to use indices for expressions. Thus, our range
 search will only be limited from the top:

 {{{
 +----+-------------+----------+-------+----------------------------+------------------+---------+------+------+-------------------------------------------+
 | id | select_type | table    | type  | possible_keys              | key
 | key_len | ref  | rows | Extra                                     |
 +----+-------------+----------+-------+----------------------------+------------------+---------+------+------+-------------------------------------------+
 |  1 | SIMPLE      | wp_posts | range | type_status_date,post_type |
 type_status_date | 132     | NULL | 3594 | Using where; Using index; Using
 temporary |
 +----+-------------+----------+-------+----------------------------+------------------+---------+------+------+-------------------------------------------+
 }}}

 It is very easy to rewrite the query:

 {{{
 SELECT DISTINCT DAYOFMONTH(post_date)
 FROM wp_posts
 WHERE post_date >= '2008-11-01 00:00:00'
 AND post_type = 'post' AND post_status = 'publish'
 AND post_date < '2008-11-24 18:30:04'

 +----+-------------+----------+-------+----------------------------+------------------+---------+------+------+-------------------------------------------+
 | id | select_type | table    | type  | possible_keys              | key
 | key_len | ref  | rows | Extra                                     |
 +----+-------------+----------+-------+----------------------------+------------------+---------+------+------+-------------------------------------------+
 |  1 | SIMPLE      | wp_posts | range | type_status_date,post_type |
 type_status_date | 132     | NULL |  267 | Using where; Using index; Using
 temporary |
 +----+-------------+----------+-------+----------------------------+------------------+---------+------+------+-------------------------------------------+
 }}}

 As you can see, MySQL needed to scan only 267 rows instead 3594, and the
 more records you have in your wp_posts table, the more performance is.

 Finally, WP fetches the days and post titles:

 {{{
 SELECT post_title, DAYOFMONTH(post_date) as dom
 FROM wp_posts
 WHERE YEAR(post_date) = '2008'
 AND MONTH(post_date) = '11'
 AND post_date < '2008-11-24 18:30:04'
 AND post_type = 'post'
 AND post_status = 'publish'
 }}}

 The above optimization can be applied here.

-- 
Ticket URL: <http://trac.wordpress.org/ticket/8351>
WordPress Trac <http://trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list