[wp-trac] [WordPress Trac] #35293: Emoji Regex in wp_encode_emoji() is wildly inaccurate
WordPress Trac
noreply at wordpress.org
Wed Aug 2 03:17:33 UTC 2017
#35293: Emoji Regex in wp_encode_emoji() is wildly inaccurate
--------------------------+-----------------------
Reporter: pento | Owner: pento
Type: defect (bug) | Status: reopened
Priority: normal | Milestone: 4.9
Component: Emoji | Version: 4.2
Severity: normal | Resolution:
Keywords: | Focuses:
--------------------------+-----------------------
Changes (by pento):
* keywords: has-patch =>
Comment:
Alright! Thank you to everyone who handled this, I'm going to be doing
some performance testing.
The baseline test (comparing previous behaviour, and the current state of
the trunk) is here: https://travis-
ci.org/pento/test-41501/builds/260016757
Note: "New" refers to whichever variation of the new code is currently
being tested. "Old" refers to the old code.
There are a couple of interesting things to note:
- Performance for all tests on PHP 5.4-5.6 is fairly similar. New is
always much slower, except for a handful of edge cases.
- There's a big jump in performance on PHP 7.0, then small improvements in
both PHP 7.1 and PHP nightly. New is about the same speed as Old, or
faster as the post length or emoji percentage increases. An interesting
exception is on the zh_TW posts, with 0% emoji - New is significantly
faster.
So, I'm going to be exploring a few different options for improving
performance on old PHP, while not killing performance on new PHP.
== TEST 1
Short circuit the New staticize function, when there are no emoji. Adding
a fast-but-possibly-matches-non-emoji test may allow 0% en_US tests to run
faster, with only a minor penalty on other languages, or posts containing
emoji.
Add the following code at the start of `wp_staticize_emoji2()`:
{{{#!php
if ( ( ( function_exists( 'mb_check_encoding' ) &&
mb_check_encoding( $text, 'ASCII' ) ) || ! preg_match( '/[^\x00-\x7F]/',
$text ) ) && false === strpos( $text, '&#x' ) ) {
// The text doesn't contain anything that might be emoji,
so we can return early.
return $text;
}
}}}
'''Data''': https://travis-ci.org/pento/test-41501/builds/260021583
'''Analysis''':
- Negligible impact on all tests in PHP 7.0+
- Negligible impact on PHP 5.4-5.6, non en_US languages.
- Negligible impact on PHP 5.4-5.6, en_US, 1% and 10% emoji.
- Significant performance improvements on PHP 5.4-5.6, en_US, 0% emoji. On
Long posts, processing time decreased from 360ms to 0.2 ms. Super Long
decreased from 3700ms to 0.9ms.
'''Conclusion''': Test 1 changes should be included.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/35293#comment:27>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list