[wp-trac] [WordPress Trac] #35293: Emoji Regex in wp_encode_emoji() is wildly inaccurate
WordPress Trac
noreply at wordpress.org
Tue Jul 18 03:47:31 UTC 2017
#35293: Emoji Regex in wp_encode_emoji() is wildly inaccurate
-----------------------------------+---------------------
Reporter: pento | Owner: pento
Type: defect (bug) | Status: closed
Priority: normal | Milestone: 4.8.1
Component: Emoji | Version: 4.2
Severity: normal | Resolution: fixed
Keywords: has-patch fixed-major | Focuses:
-----------------------------------+---------------------
Changes (by pento):
* status: reopened => closed
* resolution: => fixed
Comment:
In [changeset:"41069"]:
{{{
#!CommitTicketReference repository="" revision="41069"
Emoji: Port the Twemoji regex to PHP.
Previously, `wp_encode_emoji()` and `wp_staticize_emoji()` used inaccurate
regular expressions to find emoji, and transform then into HTML entities
or `<img>`s, respectively. This would result in emoji not being correctly
transformed, or occasionally, non-emoji being incorrectly transformed.
This commit adds a new `grunt` task - `grunt precommit:emoji`. It finds
the regex in `twemoji.js`, transforms it into a PHP-friendly version, and
adds it to `formatting.php`. This task is also automatically run by `grunt
precommit`, when it detects that `twemoji.js` has changed.
The new regex requires features introduced in PCRE 8.32, which was
introduced in PHP 5.4.14, though it was also backported to later releases
of the PHP 5.3 series. For versions of PHP that don't support this, it
will fall back to an updated version of the loose-matching regex.
For short posts, the performance difference between the old and new regex
is negligible. As the posts get longer, however, the new method is
exponentially faster.
Merges [41043], [41045], and [41046] to the 4.8 branch.
Fixes #35293.
}}}
--
Ticket URL: <https://core.trac.wordpress.org/ticket/35293#comment:21>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list