[wp-trac] [WordPress Trac] #12257: wpdb Scales Badly Due to Unnecessary Copies of All Query Results

Thu Oct 14 22:40:35 UTC 2010

#12257: wpdb Scales Badly Due to Unnecessary Copies of All Query Results
--------------------------+-------------------------------------------------
 Reporter:  miqrogroove   |       Owner:  ryan     
     Type:  defect (bug)  |      Status:  new      
 Priority:  normal        |   Milestone:  3.1      
Component:  Database      |     Version:           
 Severity:  critical      |    Keywords:  has-patch
--------------------------+-------------------------------------------------

Comment(by joelhardi):

 > Doing SELECT * from the user table with 1700 users actually resulted in
 a higher peak memory usage with the patch.  19,881,000-ish with the patch
 vs. 19,869,000-ish without.

 I wouldn't expect to see big differences in peak memory usage (your 2
 tests are +/- 0.06-0.2%) since !WordPress mostly uses full result sets
 copied into arrays, so any query result is going to be copied into an
 array at some point anyway. PHP garbage collection will flush the
 intermediate copies.

 The patch just removes one level of (unnecessary) memory-copying so on
 queries where get_results() is called, the only benefit is slightly
 shorter execution time.

 i.e. when "old" wpdb runs a query:
  1. MySQL result set created (so, 1 copy of result in PHP memory)
  2. result copied into $this->last_result (so, 2 copies in PHP memory ...
 this is the WSOD crash point mentioned in the ticket)
  3. result freed (1 copy)
  4. get_results() called (2 copies)

 and with patch:
  1. same
  2. no extra copy made (still only 1 copy instead of 2)
  3. result not freed (still only 1 copy)
  4. get_results() called (now 2 copies, so peak mem usage is the same as
 above)

 So, the patch only addresses out-of-memory crashes when a large result set
 is returned but get_results() is never called (i.e. the result set is only
 retrieved piecemeal, such as with get_row() or get_col()).

 Here's a short test script:

 {{{
 require_once('wp-includes/wp-db.php');
 $db = new wpdb($dbuser, $dbpassword, $dbname, $dbhost);
 $c = 1000;
 $q = "select * from wp_users;";

 $start = microtime(TRUE);
 for ($i = 0; $i < $c; $i++)
   $array = $db->get_results($q);
 $end = microtime(TRUE);

 echo "$c iterations of $q in ".($end - $start)." seconds\n";
 echo memory_get_peak_usage()." peak memory usage\n";
 }}}

 With a small test set (14 users) over many runs, the result is basically
 what you would expect, both old and patched versions are about the same
 (patched version only 1.27% faster, but with some variance, 1.21% == 1
 standard deviation. This improvement scales with the size of the result
 set). Peak mem usage 0.52% higher with the patched version, 635384 bytes
 versus 632064.

 However, when you change the call to {{{get_results()}}} in the test
 script to {{{query()}}} -- so that only one memory copy of the result
 exists in the patched version, while the old version makes 2 copies --
 execution is 160% faster and memory usage drops by 2.3%.

 With a slightly larger sample set (129 users), execution is 500% faster
 and memory usage drops by 26%.

 In general, I would say that, since !WordPress mostly copies result sets
 into arrays, there will be the minor (1-5%) speed improvement but nothing
 huge. However, in cases where !WordPress is more selective in returning
 values from result sets, there will be a larger benefit. As long as
 !WordPress does nothing stupid with SQL (i.e. returning huge result sets
 it doesn't use, like the happily removed user table SELECT) and only
 queries for rows and cols it needs for a particular view, benefits from
 this patch aren't going to exceed the minor speed improvement.

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/12257#comment:12>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software