[wp-trac] [WordPress Trac] #57301: Emoji feature detection is incorrect

Tue Dec 13 15:00:12 UTC 2022

#57301: Emoji feature detection is incorrect
---------------------------+--------------------------------------
 Reporter:  sergiomdgomes  |       Owner:  (none)
     Type:  defect (bug)   |      Status:  new
 Priority:  normal         |   Milestone:  Awaiting Review
Component:  Emoji          |     Version:  trunk
 Severity:  normal         |  Resolution:
 Keywords:                 |     Focuses:  javascript, performance
---------------------------+--------------------------------------

Comment (by sergiomdgomes):

 Replying to [comment:8 dmsnell]:
 >> any browser that does not support fromCodePoint also does not support
 Emoji 14
 > This is the point I was questioning, actually.
 > [Followed by explanation]

 Thanks for your clarification, the OS seems to have a larger influence
 than I thought.

 Let's put it this way, then: there's a strong correlation between old
 browser versions and old OS versions, which means there's a strong
 correlation between lack of support for `String.fromCodePoint` and lack of
 support for Emoji 14.

 [https://caniuse.com/mdn-javascript_builtins_string_fromcodepoint Global
 browser support for String.fromCodePoint] is currently at 96.4%, which
 means that at most we'll be getting it wrong and unnecessarily serving a
 polyfill to 3.6% of users — and it will never be as high as that because
 of the aforementioned correlation.

 But even if it **were** that high, unnecessarily serving a polyfill to
 3.6% of users is a significant improvement over blindly serving it to
 everyone.

 > On this note, I now wonder what you discovered is wrong about the
 original code, as passing a sequence of code units to fromCharCode still
 properly decodes the string in the same way that fromCodePoint does, and
 this a result of how JS strings are UTF-16.
 > [...]
 > Does this patch materially change anything? I guess I'm still
 fundamentally confused on what's broken and what's fixed.

 This patch definitely changes things, in that it fixes the bug. I
 apologise if my explanation so far has been less than clear (I am not
 well-versed in deep Unicode technicals and thus probably get a lot of
 details wrong), but I did test the fixed script in a number of browsers
 and OSes, and mentioned that in the original submission. The goal here is
 to fix a bug, so it's basic due diligence to ensure that the fix actually
 works.

 Going back to why, the difference lies in the fact that
 `String.fromCodePoint` works for any code points, whereas
 `String.fromCharCode` effectively does not work correctly for code points
 `0x010000` – `0x10FFFF`.

 Your simple test worked for you because you picked the values from a
 correct, older test, instead of one of the new tests that were introduced
 since #47852 and which effectively broke the feature-detection.

 Here's an example where it doesn't work, straight from the current code:

 {{{
 String.fromCodePoint.apply(null, [0x1F3F3, 0xFE0F, 0x200D, 0x26A7,
 0xFE0F]) ===
   String.fromCharCode.apply(null, [0x1F3F3, 0xFE0F, 0x200D, 0x26A7,
 0xFE0F]);

 > false
 }}}

 And here's [https://developer.mozilla.org/en-
 US/docs/Web/JavaScript/Reference/Global_Objects/String/fromCodePoint#compared_to_fromcharcode
 MDN's explanation for it]:

 {{{
 String.fromCharCode() cannot return supplementary characters (i.e. code
 points 0x010000 – 0x10FFFF)
 by specifying their code point. Instead, it requires the UTF-16 surrogate
 pair in order to return
 a supplementary character.
 }}}

 So what this patch does is make code points `0x010000` – `0x10FFFF`
 actually work, since they're being used in the tests since #47852.

 > For the purposes of this patch, I wonder if there's a reason to
 introduce another indirection through fromCodePoint rather than resorting
 to string literals. Seems like this would be less confusing.

 That seems like the logical best solution over the incorrect approach I
 proposed in my last comment, assuming we're not missing anything here.

 However, we'd still need to be defensive regarding feature support.
 Namely, we'd need to determine what the failure mode is for browsers that
 don't support unicode escape sequences, and we'd probably need to feature-
 detect that lack of support and default to polyfilling emoji, similar to
 what my current patch does when it detects a lack of support for
 `String.fromCodePoint`.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/57301#comment:9>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform