[wp-trac] [WordPress Trac] #42252: Use more granular caching for common queries in `WP_Site_Query`

WordPress Trac noreply at wordpress.org
Tue Oct 17 22:08:03 UTC 2017


#42252: Use more granular caching for common queries in `WP_Site_Query`
------------------------------------------+------------------------------
 Reporter:  flixos90                      |       Owner:
     Type:  enhancement                   |      Status:  new
 Priority:  normal                        |   Milestone:  Awaiting Review
Component:  Networks and Sites            |     Version:
 Severity:  normal                        |  Resolution:
 Keywords:  needs-patch needs-unit-tests  |     Focuses:  multisite
------------------------------------------+------------------------------

Comment (by flixos90):

 [attachment:42252.diff] implements several caching improvements, some
 major, some minor. All of them are part of a new
 `WP_Site_Query::get_cache_key( $query_args )` method. The goal of these
 changes is to bring the same granular handling that old caches such as
 `blog-lookup` use to `WP_Site_Query` in general and thus bring this solid
 foundation to any code that uses `get_sites()`.

 The most significant improvement and also kind of new concept introduces
 in the patch is that there is no longer just one `last_changed` key in the
 `sites` group. Instead there are several more `last_changed` keys,
 prefixed with md5-generated values from specific query arguments. There
 are currently four sets of arguments that are handled in a special way:

 * `domain`, `path` and `network_id`
 * `domain` and `path`
 * `domain`
 * `path`

 If a query contains all arguments for one of the above bullet point
 without including any other "database field related" argument, a special
 `last_changed` key will be used which is generated by passing the
 concatenated values for the respective arguments (''only'' the arguments
 of the respective bullet point above) through md5. The term "database
 field related" here means query arguments that deal with a specific
 database field, such as `domain`, `path`, `ID`, `site__in`, `lang__in`,
 `public`, to give only a few examples. Contrary, arguments that are not
 database field related would be for example `number`, `offset`, `fields`,
 `no_found_rows`, `orderby`. The latter of course do affect the outcome of
 a query - however they do ''not'' affect when the cache for a query needs
 to be invalidated (via a `last_changed` cache value).

 While the logic is rather complex (and we may surely be able to improve
 it!), this change brings a significant improvement to these specific
 queries which are rather common. Notably, the technique of using more
 granular `last_changed` values provides much more flexibility than
 something like the existing `blog-lookup` cache, which only stores one
 result for a specific domain and path.

 For example, both of the following sets would use `md5( 'example.com' ) .
 '_last_changed'` for their last changed key:

 '''Set 1:'''
     * `domain`: `example.com`
     * `number`: `1`

 '''Set 2:'''
     * `domain`: `example.com`
     * `number`: `100`
     * `orderby`: `path`
     * `order`: `DESC`

 Another example, both of the following sets would use `md5(
 'example.com/foo/3' ) . '_last_changed'` for their last changed key:

 '''Set 1:'''
     * `network_id`: `3`
     * `domain`: `example.com`
     * `path`: `/foo/`
     * `number`: `1`

 '''Set 2:'''
     * `network_id`: `3`
     * `domain`: `example.com`
     * `path`: `/foo/`
     * `number`: `5`
     * `offset`: `5`
     * `no_found_rows`: `false`

 These examples show the flexibility of this approach while not being
 subject to aggressive cache invalidation. To explain a bit further, the
 second example's cache would only be invalidated if a site with either
 network ID `3`, domain `example.com` or path `/foo/` was
 created/updated/deleted (see the change in `clean_blog_cache()` for that
 part specifically).

 Concluding, I think this approach gives us the possibility to for example
 get `get_site_by()` merged as it currently is, as simply by making these
 improvements to `WP_Site_Query` it would have caching that is better than
 that in `get_blog_details()`. Furthermore, this will obviously benefit any
 query using `get_sites()`, as said before.

 The code for this is admittedly rather complex, so don't hesitate to ask
 question or tear it apart. This is a new pattern for caching query
 results, so I hope we can get it to a solid state. Once we get there,
 `WP_Network_Query` (and possibly even non-multisite query classes) could
 be enhanced in the same way. Nicely enough, all existing unit tests passed
 for me with these changes. Of course we'll also need a bunch of tests for
 this behavior specifically, which I'll add later.

 One last thing: Lines 707-724 of `class-wp-site-query.php` in the patch
 implement what we've discussed today during office-hours (plus I ignore
 `update_site_cache` completely, since it doesn't affect the query at all).
 I consider this a minor improvement and wanted to do it at the same time,
 but we could also take this out for now to make the patch better readable
 and focus on the more important things first.

--
Ticket URL: <https://core.trac.wordpress.org/ticket/42252#comment:1>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list