[wp-trac] [WordPress Trac] #35293: Emoji Regex in wp_encode_emoji() is wildly inaccurate

WordPress Trac noreply at wordpress.org
Tue Jul 19 14:07:35 UTC 2016

#35293: Emoji Regex in wp_encode_emoji() is wildly inaccurate
 Reporter:  pento         |       Owner:  pento
     Type:  defect (bug)  |      Status:  assigned
 Priority:  normal        |   Milestone:  Future Release
Component:  Emoji         |     Version:  4.2
 Severity:  normal        |  Resolution:
 Keywords:                |     Focuses:
Changes (by pento):

 * keywords:  emoji =>


 [attachment:35293.2.diff] is the framework for generating the PHP regex
 from the `twemoji.js` regex.

 Proceeding from here is... tricky. The Twemoji regex uses UTF-16 code
 points, which PHP didn't support until PCRE 8.3.2 (PHP 5.4.14). There's no
 way to nicely convert the code point ranges to a PHP-compatible regex.

 The main problem with using the method from `twemoji-generator.js` is that
 it requires a local copy of the Twemoji images, to check which images
 Twemoji supports. It also takes us a further step away from the actual
 regex we need to build, creating potential inconsistencies.

 I would not be adverse to providing an accurate regex for PHP versions
 that support it, and a more approximate fallback for those that don't.

Ticket URL: <https://core.trac.wordpress.org/ticket/35293#comment:9>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform

More information about the wp-trac mailing list