[wp-trac] [WordPress Trac] #25043: Add a pre_ hook to term_exists() to allow pre-query optimization

WordPress Trac noreply at wordpress.org
Wed Aug 14 20:11:37 UTC 2013


#25043: Add a pre_ hook to term_exists() to allow pre-query optimization
-------------------------+-----------------------------
 Reporter:  dllh         |      Owner:
     Type:  enhancement  |     Status:  new
 Priority:  normal       |  Milestone:  Awaiting Review
Component:  Taxonomy     |    Version:  trunk
 Severity:  normal       |   Keywords:  has-patch
-------------------------+-----------------------------
 In benchmarking imports, I noticed that {{{term_exists()}}} is a very
 expensive operation. In a command line import  of a WXR with 100 posts,
 500 comments, 5 tags, 5 categories (and 5 tags and 5 cats associated with
 each post), term_exists() accounted for about 17% of the run-time (using
 qcachegrind for metrics). It's doing queries every time it runs, and
 sometimes more than one query.

 In day-to-day usage, this probably isn't so awfully expensive that people
 are noticing and griping about it, but in potentially long-running
 processes like imports and bulk edits, it can be very significant.

 The issue could be mitigated with a pre_ filter that can be used by
 plugins to, for example, fetch stored term data from a cache.

 I tested this by applying a simple filter in {{{term_exists()}}} (patch
 forthcoming), adding the filter during import, and having it store/check
 term data in an array. This allowed {{{term_exists()}}} to just look up
 the term in the array vs. the database if we had already fetched it. With
 the filter and the array lookup added, the percentage of run time spent in
 {{{term_exists()}}} dropped to 0.10% and an import that consistently
 otherwise ran in about 2:50 ran consistently in about 2:20 (not really
 significant for a small import, but very significant as orders of
 magnitude climb).

 I also did some benchmarking using wp-cli, to make sure that the fact that
 I was doing costly import stuff wasn't skewing the perceived benefit of
 the filter addition. To test, I did simple term deletion. Predictably, the
 results are less dramatic for simple, short operations than for long-
 running ones, but the performance increases I saw are not insignificant.

 To test, I made a WXR with 3 categories and 50 posts. Each category was
 associated with every post, so my test was to measure the cost of
 {{{term_exists()}}} for a category belonging to 50 posts, both with and
 without the pre_ filter. Results were as follows (it was a very small set
 of tests, I'll grant):

 ||**With pre_ filter**|| ||**Without pre_ filter**|| ||
 ||run time||term_exists %||run time||term_exists %||
 ||= 1.605s||= 0.42%||= 2.278s||= 11.53%||
 ||= 4.790s||= 0.15%||= 1.492s||= 8.64%||
 ||= 1.986s||= 0.22%||= 2.222s||= 7.44%||

 I present the data as tabular, but of course it doesn't make sense to
 assume that the uncached and cached cells correlate across a single row.
 Curiously, the middle run for each set of three tests was something of an
 outlier. Discarding those and averaging the run times, we get 1.795s for
 the operation with the filter and 2.25 for the operation without. The
 moral, then, is that even for short, single-task operations like deletion
 of a term, we stand to see fairly significant improvements in performance
 with the addition of a pre_filter.

 The pre_ filter pattern occurs in other places in core, so this seems to
 me like a pretty common-sense, low-risk, potentially high-gain addition.

--
Ticket URL: <http://core.trac.wordpress.org/ticket/25043>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list