[wp-trac] [WordPress Trac] #25043: Add a pre_ hook to term_exists() to allow pre-query optimization
WordPress Trac
noreply at wordpress.org
Wed Aug 14 20:11:37 UTC 2013
#25043: Add a pre_ hook to term_exists() to allow pre-query optimization
-------------------------+-----------------------------
Reporter: dllh | Owner:
Type: enhancement | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Taxonomy | Version: trunk
Severity: normal | Keywords: has-patch
-------------------------+-----------------------------
In benchmarking imports, I noticed that {{{term_exists()}}} is a very
expensive operation. In a command line import of a WXR with 100 posts,
500 comments, 5 tags, 5 categories (and 5 tags and 5 cats associated with
each post), term_exists() accounted for about 17% of the run-time (using
qcachegrind for metrics). It's doing queries every time it runs, and
sometimes more than one query.
In day-to-day usage, this probably isn't so awfully expensive that people
are noticing and griping about it, but in potentially long-running
processes like imports and bulk edits, it can be very significant.
The issue could be mitigated with a pre_ filter that can be used by
plugins to, for example, fetch stored term data from a cache.
I tested this by applying a simple filter in {{{term_exists()}}} (patch
forthcoming), adding the filter during import, and having it store/check
term data in an array. This allowed {{{term_exists()}}} to just look up
the term in the array vs. the database if we had already fetched it. With
the filter and the array lookup added, the percentage of run time spent in
{{{term_exists()}}} dropped to 0.10% and an import that consistently
otherwise ran in about 2:50 ran consistently in about 2:20 (not really
significant for a small import, but very significant as orders of
magnitude climb).
I also did some benchmarking using wp-cli, to make sure that the fact that
I was doing costly import stuff wasn't skewing the perceived benefit of
the filter addition. To test, I did simple term deletion. Predictably, the
results are less dramatic for simple, short operations than for long-
running ones, but the performance increases I saw are not insignificant.
To test, I made a WXR with 3 categories and 50 posts. Each category was
associated with every post, so my test was to measure the cost of
{{{term_exists()}}} for a category belonging to 50 posts, both with and
without the pre_ filter. Results were as follows (it was a very small set
of tests, I'll grant):
||**With pre_ filter**|| ||**Without pre_ filter**|| ||
||run time||term_exists %||run time||term_exists %||
||= 1.605s||= 0.42%||= 2.278s||= 11.53%||
||= 4.790s||= 0.15%||= 1.492s||= 8.64%||
||= 1.986s||= 0.22%||= 2.222s||= 7.44%||
I present the data as tabular, but of course it doesn't make sense to
assume that the uncached and cached cells correlate across a single row.
Curiously, the middle run for each set of three tests was something of an
outlier. Discarding those and averaging the run times, we get 1.795s for
the operation with the filter and 2.25 for the operation without. The
moral, then, is that even for short, single-task operations like deletion
of a term, we stand to see fairly significant improvements in performance
with the addition of a pre_filter.
The pre_ filter pattern occurs in other places in core, so this seems to
me like a pretty common-sense, low-risk, potentially high-gain addition.
--
Ticket URL: <http://core.trac.wordpress.org/ticket/25043>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list