[wp-trac] [WordPress Trac] #59516: Improve cache key generation in query classes

WordPress Trac noreply at wordpress.org
Tue Oct 24 12:53:50 UTC 2023


#59516: Improve cache key generation in query classes
--------------------------------------+-----------------------------
 Reporter:  spacedmonkey              |       Owner:  thekt12
     Type:  defect (bug)              |      Status:  assigned
 Priority:  normal                    |   Milestone:  Future Release
Component:  Query                     |     Version:
 Severity:  normal                    |  Resolution:
 Keywords:  has-patch has-unit-tests  |     Focuses:  performance
--------------------------------------+-----------------------------

Comment (by thekt12):

 = Findings:
 == WP_Query
 Inside WP_Query::get_posts(), cache key is generated using
 `generate_cache_key( $q, $new_request );`; where `$q` is query args and
 `$new_request` is the sql that is formed, (check code
 [https://github.com/WordPress/wordpress-
 develop/blob/781953641607c4d5b0743a6924af0e820fd54871/src/wp-includes
 /class-wp-query.php#L3175 here])

 {{{#!php
 $cache_key   = $this->generate_cache_key( $q, $new_request );
 }}}


 Setting of cache for the $cache_key happens at 3 (
 [https://github.com/WordPress/wordpress-
 develop/blob/781953641607c4d5b0743a6924af0e820fd54871/src/wp-includes
 /class-wp-query.php#L3234 Case1], [https://github.com/WordPress/wordpress-
 develop/blob/781953641607c4d5b0743a6924af0e820fd54871/src/wp-includes
 /class-wp-query.php#L3267 Case2], [https://github.com/WordPress/wordpress-
 develop/blob/781953641607c4d5b0743a6924af0e820fd54871/src/wp-includes
 /class-wp-query.php#L3342 Case3] ) places based on different conditions,
 all of which consumes the same $cache_key generated above and there is no
 modification in between.

 For Case1 and Case2, there is a slight modification to the output obtained
 from the query based on the value set in `$q['fields']` . In current
 scenario `$q['fields']=='ID'` and `$q['fields']=='ids'` can have slightly
 different cache values as former doesn’t go through the following
 condition which has a `array_map( 'intval', $this->posts );` (''this seems
 to be a minor bug'')


 In both **Case1** and **Case2** there is no filter in between key
 generation and cache setting which can modify the output base on any other
 values from **args/$q**.

 For Case3 however there is a possibility that the logic may enter  the
 split_the_query
  condition ( if there are less than 500 posts and we split the query to
 get ids of those 500 posts and fetch them manually and prime them).  The
 split_the_query condition contains a filter 'posts_request_ids' that could
 modify the request post key generation, this is a bug that is addressed in
 ([https://core.trac.wordpress.org/ticket/59661 Ticket#59661])
 [https://github.com/WordPress/wordpress-develop/pull/5513/files#diff-
 1d050668621cbc3028e027d15e68438f4d879dd6e80ba98f9c9f69b9fb5469d0R3181
 PR#5513]. This PR introduces a new arg variable
 `posts_request_ids_filters` which also affects what is cached.

 To add to it we can
 What can be removed?
 After Case3, the variable $cache_key is not in use anywhere indicating
 that the only args that must be used for cache key generation is
 $q['fields'] and  $q['posts_request_ids_filters'](introduced in
 [https://github.com/WordPress/wordpress-develop/pull/5513/files#diff-
 1d050668621cbc3028e027d15e68438f4d879dd6e80ba98f9c9f69b9fb5469d0R3181
 PR#5513]).

 We can thereby remove all the query forming (except fields) and after
 cache decision making (except 'posts_request_ids_filters') ( which is
 pretty much everything mentioned here )


 With respect to Third Parties, $args can only be used to cache the same
 values to different keys (which has no use IMO) or to skip most of the
 logic inside get_posts() like incase of VIP Enterprise Search (link),
 where a new `es` is used to decide if the query must be sent to elastic
 search or not.

 == WP_Term_Query
 For term_queries, `$args['fields']`, `$args['object_ids']`,
 `$args['pad_counts']`, `$args['hide_empty']` are the only fields that can
 affect caching. All other variables can be avoided from cache key
 generation. Like WP_Query there is no filter that third parties can use to
 modify what is being cached after cache key generation.

 == WP_Network_Query
 The key is entirely based on args,  however none of the args affect the
 outcome post query is fired. So we could modify the code to be more like
 WP_Query with a separate function for generate_key based on the request
 and args or just the args. ([https://github.com/WordPress/wordpress-
 develop/blob/781953641607c4d5b0743a6924af0e820fd54871/src/wp-includes
 /class-wp-network-query.php#L249 code])

 == WP_Site_Query
 Same as WP_Network_Query. The code needs to be modified to use final query
 and not args. ([https://github.com/WordPress/wordpress-
 develop/blob/781953641607c4d5b0743a6924af0e820fd54871/src/wp-includes
 /class-wp-site-query.php#L357 code])
 == WP_Comment_Query
 Same like WP_Network_Query and WP_Site_Query
 ([https://github.com/WordPress/wordpress-
 develop/blob/781953641607c4d5b0743a6924af0e820fd54871/src/wp-includes
 /class-wp-comment-query.php#L451 code])

 = Proposal
 **Step 1: Standardize code**
 Standardize WP_Network_Query, WP_Site_Query, WP_Comment_Query to use
 generate_cache_key function the same way WP_Query does.
 Note: There is one performance downgrade here. The way caching is
 implemented in all 3 class above it skips the request generation part
 entirely if cache is present. However, the upside it the query will be
 cached based on the final request making it better at handling similar
 queries generated via different args setup.
 For e.g Caching in WP_Network_Query skips [https://github.com/WordPress
 /wordpress-develop/blob/781953641607c4d5b0743a6924af0e820fd54871/src/wp-
 includes/class-wp-network-query.php#L256  $network_ids =
 $this->get_network_ids();] which is logic intensive.

 **Step 2: Use only those key from**
 We should only pass the args (filter it before) to `generate_cache_key()`
 for cache key generation that effects the way cache is done (Detailed in
 the Findings section above).
 For filtering have a array of filed that needs to be considered in an
 array and only scrap out those args from the main args.
 For e.g `$key_to_consider = array( ‘fields’ )` for WP_Query, we can
 provide a filter which will allow third parties to add more keys that
 should be considered, extra care must be given not to unset key from the
 original `$args`.

 **Step 3: Normalize and save back to main $args or (to the clone)**
 Basic normalization must be done before query execution and saved to
 original args->
 Some examples->
 ‘ids’, ‘ID’, [‘ID’] from $args[ ‘fields’ ] must be converted to ‘ids’.
 ‘post_name’ and other fields that contains that contain single array needs
 to be converted to string.

 **Step 4: Move generate_cache_key to utility function as it will be more
 like a repeated piece of code.**

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/59516#comment:10>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list