[wp-trac] [WordPress Trac] #51372: Race condition causes an autoload option to leak outside of alloptions

WordPress Trac noreply at wordpress.org
Tue Sep 22 07:55:45 UTC 2020


#51372: Race condition causes an autoload option to leak outside of alloptions
--------------------------+-----------------------------
 Reporter:  kovshenin     |      Owner:  (none)
     Type:  defect (bug)  |     Status:  new
 Priority:  normal        |  Milestone:  Awaiting Review
Component:  Cache API     |    Version:  2.2
 Severity:  normal        |   Keywords:  dev-feedback
  Focuses:                |
--------------------------+-----------------------------
 There is a problem with `add_option()` which causes an autoloaded item to
 leak outside of the `alloptions` cache key and into its own item under the
 `options` group.

 This becomes pretty broken with persistent object caching, because it
 leads to a state, where the option is stuck under its own cache key in the
 `options` group, and is unable to be deleted or updated, while the
 underlying database value is completely gone.

 I've been able to reproduce this in multiple ways, with persistent object
 caching turned on or off. The backend I used in my testing is
 [https://github.com/Automattic/wp-memcached wp-memcached], but it should
 be reproducable with any other backend.

 The first and easiest way is to just add an `error_log()` to wp-
 includes/options.php in `get_option()` right before `wp_cache_add()` that
 writes to the `options` group:

 {{{
 function get_option() {
 ...
     if ( is_object( $row ) ) {
         ...
         if ( $option == 'foo' ) { error_log( 'leaked' ); }
         wp_cache_add( $option, $value, 'options' );

 }}}

 Now in a simple plugin we can start writing the `foo` option, like this:

 {{{
 add_action( 'init', function() {
     delete_option( 'foo' );
     add_option( 'foo', 'bar' );
     die();
 } );
 }}}

 The `autoload` flag defaults to `true`, so in this scenario the `foo` item
 should always end up in the `alloptions` cache key. However, if you run
 this in multiple parallel threads (with `ab` for example), you'll see the
 `leaked` message in your error log, which means `foo` has been written to
 the `foo` key under the `options` group.

 {{{
 ab -c 100 -n 1000 http://localhost/
 }}}

 Another way to reproduce this is to turn on a persistent object caching
 plugin, visit the site once to trigger a single `add_option()`, then check
 some of site options and cache keys:

 {{{
 wp option get foo # says bar
 wp cache get foo options # error, because foo is autoloaded and stored in
 alloptions
 wp cache get alloptions options # big array with foo => bar at the end
 }}}

 You can also confirm the database value is there:

 In MySQL:

 {{{
 SELECT * FROM wp_options WHERE option_name = 'foo';
 +-----------+-------------+--------------+----------+
 | option_id | option_name | option_value | autoload |
 +-----------+-------------+--------------+----------+
 |      2403 | foo         | bar          | yes      |
 +-----------+-------------+--------------+----------+
 }}}


 Looks good so far.

 Now run the same `ab` test with concurrent requests, and check again.

 {{{
 wp option get foo # says bar
 wp cache get foo options # says bar, because the value leaked from
 alloptions into its own item
 wp cache get alloptions options # big array, but NO foo => bar
 }}}

 And finally, in MySQL shell:

 {{{
 SELECT * FROM wp_options WHERE option_name = 'foo';
 Empty set (0.00 sec)
 }}}

 At this point the value is gone, and only remains as a stale item in
 Memcached under the wrong key. Deleting, adding, or updating the item will
 not work:

 {{{
 $ wp option delete foo
 Warning: Could not delete 'foo' option. Does it exist?
 $ wp option add foo bar
 Error: Could not add option 'foo'. Does it already exist?
 $ wp option update foo baz
 Error: Could not update option 'foo'.
 }}}

 Flushing Memcached (or having the key evicted) will "fix" the stalemate,
 but will cause the data to be lost forever.

 Here's a brief overview of what could happen inside `add_option()` in two
 concurrent threads:

 {{{
 Thread 1: add_option( 'foo', 'bar' );
 Thread 2: add_option( 'foo', 'bar' );

 1: .. get_option( 'foo' ) // false
 1: .. INSERT INTO // true

 2: .. get_option( 'foo' )
 2: .. .. isset( $alloptions[ 'foo' ] ) // false
 2: .. .. ->get_row()
 2: .. .. wp_cache_add( 'foo', 'bar', 'options' ); // LEAKED

 1: $alloptions[] = ...
 1: wp_cache_set( 'alloptions', $alloptions, 'options' );
 }}}

 This is then followed by the next `delete_option()` call, which
 successfully deletes the data from the database and the `alloptions` key,
 so then future calls to `add_option()` will fail, because `get_option()`
 will always return the data from cache.

 I'm not sure about the best way to fix it. Here are a few thoughts:

 * Maybe `add_option()` should not rely so heavily on `get_option()`, and
 do its own checks with cache functions, depending on the `autoload`
 function argument
 * `delete_option()` could clear both the `alloptions` array item as well
 as the `$option` key in the `options` group
 * In `get_option()` if we were unable to retrieve the data from
 `$alloptions` and option cache, maybe query the `autoload` column together
 with `option_value`, before just assuming it's a no:

 {{{
 $row = $wpdb->get_row( $wpdb->prepare( "SELECT option_value, autoload ...
 }}}

 Then handle the cache addition differently, based on that autoload flag.
 This sounds the most reasonable to me, because this is exactly the place
 where the item is being put into the wrong key, which causes the other
 problems.

 I haven't actually tested this with 2.2, but it seems like that's where
 the `alloptions` vs `options` cache keys appeared inside `get_option()`
 right around r4855.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/51372>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list