[wp-trac] [WordPress Trac] #35293: Emoji Regex in wp_encode_emoji() is wildly inaccurate

WordPress Trac noreply at wordpress.org
Wed Aug 2 03:17:33 UTC 2017


#35293: Emoji Regex in wp_encode_emoji() is wildly inaccurate
--------------------------+-----------------------
 Reporter:  pento         |       Owner:  pento
     Type:  defect (bug)  |      Status:  reopened
 Priority:  normal        |   Milestone:  4.9
Component:  Emoji         |     Version:  4.2
 Severity:  normal        |  Resolution:
 Keywords:                |     Focuses:
--------------------------+-----------------------
Changes (by pento):

 * keywords:  has-patch =>


Comment:

 Alright! Thank you to everyone who handled this, I'm going to be doing
 some performance testing.

 The baseline test (comparing previous behaviour, and the current state of
 the trunk) is here: https://travis-
 ci.org/pento/test-41501/builds/260016757

 Note: "New" refers to whichever variation of the new code is currently
 being tested. "Old" refers to the old code.

 There are a couple of interesting things to note:
 - Performance for all tests on PHP 5.4-5.6 is fairly similar. New is
 always much slower, except for a handful of edge cases.
 - There's a big jump in performance on PHP 7.0, then small improvements in
 both PHP 7.1 and PHP nightly. New is about the same speed as Old, or
 faster as the post length or emoji percentage increases. An interesting
 exception is on the zh_TW posts, with 0% emoji - New is significantly
 faster.

 So, I'm going to be exploring a few different options for improving
 performance on old PHP, while not killing performance on new PHP.

 == TEST 1

 Short circuit the New staticize function, when there are no emoji. Adding
 a fast-but-possibly-matches-non-emoji test may allow 0% en_US tests to run
 faster, with only a minor penalty on other languages, or posts containing
 emoji.

 Add the following code at the start of `wp_staticize_emoji2()`:


 {{{#!php
         if ( ( ( function_exists( 'mb_check_encoding' ) &&
 mb_check_encoding( $text, 'ASCII' ) ) || ! preg_match( '/[^\x00-\x7F]/',
 $text ) ) && false === strpos( $text, '&#x' ) ) {
                 // The text doesn't contain anything that might be emoji,
 so we can return early.
                 return $text;
         }
 }}}

 '''Data''': https://travis-ci.org/pento/test-41501/builds/260021583

 '''Analysis''':
 - Negligible impact on all tests in PHP 7.0+
 - Negligible impact on PHP 5.4-5.6, non en_US languages.
 - Negligible impact on PHP 5.4-5.6, en_US, 1% and 10% emoji.
 - Significant performance improvements on PHP 5.4-5.6, en_US, 0% emoji. On
 Long posts, processing time decreased from 360ms to 0.2 ms. Super Long
 decreased from 3700ms to 0.9ms.

 '''Conclusion''': Test 1 changes should be included.

--
Ticket URL: <https://core.trac.wordpress.org/ticket/35293#comment:27>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list