[wp-trac] [WordPress Trac] #57301: Emoji feature detection is incorrect

Wed Dec 14 11:32:47 UTC 2022

#57301: Emoji feature detection is incorrect
---------------------------+--------------------------------------
 Reporter:  sergiomdgomes  |       Owner:  (none)
     Type:  defect (bug)   |      Status:  new
 Priority:  normal         |   Milestone:  Awaiting Review
Component:  Emoji          |     Version:  trunk
 Severity:  normal         |  Resolution:
 Keywords:                 |     Focuses:  javascript, performance
---------------------------+--------------------------------------

Comment (by sergiomdgomes):

 The discussion has turned out pretty long, but I think we're mostly in
 alignment now. Let me try to summarise our options here, so that we can
 pick one and move forward:

 1. The current patch.
   - Small set of changes.
   - Allows for developers to specify sequences using higher code points or
 surrogate pairs.
   - [https://caniuse.com/mdn-javascript_builtins_string_fromcodepoint
 Somewhat reduced browser support] vs [https://caniuse.com/mdn-
 javascript_builtins_string_fromcharcode the status quo].

 2. Change all of the existing sequences to surrogate pairs, and enforce it
 going forward. E.g. `[ 0xD83C, 0xDFF3, 0xFE0F, 0x200D, 0x26A7, 0xFE0F ]`.
   - Small set of changes.
   - Forces developers to understand what surrogate pairs are, and to
 actually use surrogate pairs instead of higher code points.
   - Maintains current level of browser support.
   - This approach can be complemented with comments or errors to attempt
 to ensure developers don't make the mistake of using higher code points.

 3. Work around the absence of `fromCodePoint` by effectively
 reimplementing its functionality on top of `fromCharCode`, translating
 from higher code points to surrogate pairs.
   - Larger set of changes.
   - More code, which means a larger script and potentially more
 maintenance.
   - Allows for developers to specify sequences using higher code points or
 surrogate pairs.
   - Maintains current level of browser support.

 4. Use unicode escape sequences in string literals instead of arrays of
 code points. E.g. `'\uD83C\uDFF3\uFE0F\u200D\u26A7\uFE0F'`.
   - Larger set of changes.
   - Less code than the status quo, which means a smaller script and less
 maintenance / chance of errors.
   - Still easy to see what sequences are being tested; the numbers are all
 there, just in a different format.
   - Forces developers to understand what surrogate pairs are, and to
 actually use surrogate pairs instead of higher code points.
   - Maintains current level of browser support (it's the same as for
 `fromCharCode`).
   - Strings are perfectly clear and consistent, with everything being the
 same type of escape sequence.
   - This approach can be complemented with comments or errors to attempt
 to ensure developers don't make the mistake of using higher code points
 via unicode code point escape sequences.

 5. Use unicode code point escape sequences in string literals instead of
 arrays of code points. E.g. `'\u{1F3F3}\uFE0F\u200D\u26A7\uFE0F'` or
 `'\u{1F3F3}\u{FE0F}\u{200D}\u{26A7}\u{FE0F}'`.
   - Larger set of changes.
   - Less code than the status quo, which means a smaller script and less
 maintenance / chance of errors.
   - Still easy to see what sequences are being tested; the numbers are all
 there, just in a different format.
   - Allows for developers to specify escape sequences using higher code
 points or surrogate pairs.
   - Slightly [https://caniuse.com/mdn-
 javascript_builtins_string_unicode_code_point_escapes reduced browser
 support] vs [https://caniuse.com/mdn-
 javascript_builtins_string_fromcharcode the status quo].
   - Strings are mostly clear and consistent, with everything being an
 escape sequence of some sort.

 6. Directly use unescaped characters in string literals. E.g. `'🏳️‍⚧️'`.
   - Larger set of changes.
   - Less code than the status quo, which means a smaller script and less
 maintenance / chance of errors.
   - Easiest workflow for future changes: just copy and paste the emoji
 into a string literal.
   - Hardest option to reason about; it's difficult to analyse exactly
 which sequences are being tested.
   - Potential for a contributor's chosen editor to mess things up if it
 doesn't handle unicode string literals correctly, somehow.
   - Unclear what the browser support and what the failure mode for this
 option are; would need testing.

 Note that for all of the above options, reduced browser support should
 mean (pending implementation and testing) that we simply default to
 including the polyfill, instead of anything actually breaking. So it's
 really not a big deal at all.

 ----

 So, all things considered, I'm leaning towards 5. It's easy to understand
 the sequences, it allows for contributors to use higher code points or
 surrogate pairs without worrying about things too much, it maintains
 nearly the same level of browser support, and it actually makes the code
 smaller and simpler.

 Which option do you think is best, @dmsnell? Did I miss anything?

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/57301#comment:11>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform