[wp-trac] [WordPress Trac] #28575: Eliminate redundant preg_match() to improve wptexturize() performance

WordPress Trac noreply at wordpress.org
Wed Jun 18 14:59:05 UTC 2014


#28575: Eliminate redundant preg_match() to improve wptexturize() performance
-------------------------+-----------------------------
 Reporter:  dllh         |      Owner:
     Type:  enhancement  |     Status:  new
 Priority:  normal       |  Milestone:  Awaiting Review
Component:  General      |    Version:
 Severity:  normal       |   Keywords:
  Focuses:  performance  |
-------------------------+-----------------------------
 `wptexturize()` performs very poorly for large posts. In trying to
 investigate ways of speeding it up, I noticed that when looping over
 `$textarr` to actually do the replacements, we do the `preg_match()` to
 check for strings like `9x9` for every iteration of the loop. It seems to
 me that we should do this check only once, since for large chunks of
 `$text`, we're doing a `preg_match()` against all of `$text` for each item
 in `$textarr`, which will be a lot of items for large text.

 I did some rough profiling of the core code against the attached patch.
 Methodology:

 * Put a chunk of text of approaching 1MB in a file. The text I used came
 from an actual large post I encountered this problem with.
 * Put the wptexturized text in a second file.
 * For 10 iterations, call `wptexturize()` on the original file.
 * Compare the value to the value in the second file to validate that
 nothing in the way `wptexturize()` actually processes text has been
 changed by my modifications.
 * Measure the total time (I used the `time` command on my linux box).

 With no modifications, core consistently took about 1m9s to run through my
 little test. With the attached patch, it consistently took about 6s. For
 just 1 iteration, the difference was more like 8s to 1.4s, so an
 appreciable amount of time even for a single run of the function.

 Arguably, users shouldn't be making posts of 1MB, but it happens. On pages
 like search results that will generate an excerpt and thus fire
 `wptexturize()` potentially many times, the performance increase here
 stands to be fairly significant provided there are large posts among the
 results.

 I'm attaching both a proposed patch and the script I used for the rough
 profiling. The post I tested isn't mine, so I can't share it, but that
 should be easy enough to generate.

--
Ticket URL: <https://core.trac.wordpress.org/ticket/28575>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list