[wp-trac] [WordPress Trac] #37189: In wp_term_query on cache ids

Tue Jun 20 17:50:06 UTC 2017

#37189: In wp_term_query on cache ids
--------------------------+--------------------------
 Reporter:  spacedmonkey  |       Owner:
     Type:  enhancement   |      Status:  new
 Priority:  normal        |   Milestone:  4.9
Component:  Taxonomy      |     Version:  4.6
 Severity:  normal        |  Resolution:
 Keywords:  has-patch     |     Focuses:  performance
--------------------------+--------------------------
Changes (by boonebgorges):

 * keywords:  has-patch needs-unit-tests => has-patch
 * milestone:  Future Release => 4.9

Comment:

 Thanks for continuing to work on this, @spacedmonkey.

 Your patch generated a couple of PHPUnit failures for me (outside of the
 taxonomy component). As I started digging into why, it became clear to me
 that our efforts would be better spent doing the right thing and simply
 splitting the query.

 Specifically: The fact that you switched to `SELECT t.*, t.**` for most
 queries got me concerned that we'd cause memory issues for people using
 the `fields` parameter for bulk operations. Yet, because your patch worked
 within the existing parameters of term-query caching, there was no easy
 way to work around this problem - no way to actually utilize the single-
 term cache during `get_terms()` queries. The possibility of additional
 overhead in the main `SELECT` query only seems worth the trade-off if we
 also offer a more efficient caching layer for single terms. The way we
 handle this elsewhere is by doing a `SELECT {id}` query and then filling
 the objects from the cache.

 [attachment:37189.3.patch] is a first pass at making this happen. It
 allows us to simplify the internals of `WP_Term_Query::query()`, but
 requires some other fixes. Some notes:

 * Your technique for building keys is a bit redundant. Once we've built
 the `request`, we have a key that's guaranteed to be unique. So let's use
 it, and do away with the `default_args` parsing.
 * I've written the patch in such a way that `fields=ids` queries generally
 do not require the second query to populate the term objects. The
 exception is in the case of hierarchical queries, where the tree must be
 descended after the initial query takes place. So this is a fairly big
 performance win for most `fields=ids` queries.
 * The patch uncovered a pretty deep bug (actually, two related bugs) in
 `_split_shared_term()` that was agonizing to figure out. Briefly, the way
 that taxonomy hierarchy caches were being regenerated
 (`{$taxonomy}_children`) was not consistent: certain child terms were not
 properly triggering taxonomy cache clears, and the calls to
 `clean_term_cache()` in the context of a `foreach()` loop was causing race
 conditions in cases where `get_terms()` descends the hierarchy tree. The
 fixes in `_split_shared_term()` should go in, no matter what happens with
 the rest of the patch.
 * As part of these fixes, I broke the taxonomy-cache busting functionality
 into a new `clean_taxonomy_cache()` function.
 * On an installation without a persistent cache (or with an empty cache),
 term queries now generate two queries instead of one. This requires a
 reworking of certain unit tests that count `num_queries`. I've refactored
 most of them so that they're not counting the queries specifically, but
 instead check for data invalidation (which is the important part anyway).
 A few tests - those that rely on examining a `last_query` that no longer
 points to the right thing - had to be removed.

 I'd welcome a careful review from @spacedmonkey, @flixos90 or anyone else
 with some experience in how term caching has historically differed from
 caching for posts, comments, etc.

--
Ticket URL: <https://core.trac.wordpress.org/ticket/37189#comment:6>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform