[wp-trac] [WordPress Trac] #34239: Split queries in `get_terms()` and `wp_get_object_terms()`

WordPress Trac noreply at wordpress.org
Sat Oct 10 04:00:58 UTC 2015


#34239: Split queries in `get_terms()` and `wp_get_object_terms()`
----------------------------+----------------------------
 Reporter:  boonebgorges    |      Owner:
     Type:  task (blessed)  |     Status:  new
 Priority:  normal          |  Milestone:  Future Release
Component:  Taxonomy        |    Version:
 Severity:  normal          |   Keywords:  needs-patch
  Focuses:  performance     |
----------------------------+----------------------------
 The cache strategy in `get_terms()` looks like this:

 1. Query for full rows from the term/term_taxonomy tables.
 2. Update the cache for the individual terms found.
 3. Store the results of the query in a single cache key.

 `wp_get_object_terms()` does 1 and 2, but doesn't cache its query results
 at all.

 This is a wasteful strategy.

 * In the case of `get_terms()`, term objects can be stored dozens of times
 in the cache. Each combination of arguments passed to `get_terms()`
 results in a separate cache key, and each of these slots is home to full
 term items. We don't have problems with invalidation because we're using
 the sledgehammer of a 'last_changed' incrementor, but we are certainly
 taking up more space in persistent cache storage than we need to.
 * `$wpdb->get_results( "SELECT * FROM ..." )` is significantly more
 expensive than `$wpdb->get_col( "SELECT term_id FROM ..." )`, both in
 terms of the memory used to store results and the index strategy that
 MySQL can use. Selecting full results was probably necessary when terms
 could be shared between taxonomies - `term_id` wouldn't have been a unique
 identifier - but we shouldn't have that problem anymore.

 Let's split the query in both of these functions, so that we do the
 following:

 * The primary SQL query will fetch term_ids only. `$wpdb->get_col( "SELECT
 term_id FROM..." )`
 * We'll cache the results of that query, which will be an array of
 integers, instead of an array of `stdClass` objects with a bunch of data
 in them, using the existing 'last_changed' technique. (For now, we can
 limit ourselves to caching only `get_terms()` queries. It'd be great to do
 `wp_get_object_terms()` too, but this will probably take some more
 research.)
 * Do a `SELECT * FROM ... WHERE term_id IN (...)` query to get data
 corresponding to uncached terms, then prime the cache for those terms.
 * Fill the term objects from the cache before returning from the function.

 Both `get_terms()` and `wp_get_object_terms()` support a 'fields'
 parameter, which allows the selection of only a subset of fields, as an
 alternative to 'all' (or 'all_with_object_id'). At some point down the
 line, it'd be great to convert as many of these as possible to use the
 persistent cache described above. But for now, it probably makes sense to
 start with the 'all' queries only.

--
Ticket URL: <https://core.trac.wordpress.org/ticket/34239>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list