[wp-trac] [WordPress Trac] #57301: Emoji feature detection is incorrect
WordPress Trac
noreply at wordpress.org
Wed Dec 14 11:32:47 UTC 2022
#57301: Emoji feature detection is incorrect
---------------------------+--------------------------------------
Reporter: sergiomdgomes | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Emoji | Version: trunk
Severity: normal | Resolution:
Keywords: | Focuses: javascript, performance
---------------------------+--------------------------------------
Comment (by sergiomdgomes):
The discussion has turned out pretty long, but I think we're mostly in
alignment now. Let me try to summarise our options here, so that we can
pick one and move forward:
1. The current patch.
- Small set of changes.
- Allows for developers to specify sequences using higher code points or
surrogate pairs.
- [https://caniuse.com/mdn-javascript_builtins_string_fromcodepoint
Somewhat reduced browser support] vs [https://caniuse.com/mdn-
javascript_builtins_string_fromcharcode the status quo].
2. Change all of the existing sequences to surrogate pairs, and enforce it
going forward. E.g. `[ 0xD83C, 0xDFF3, 0xFE0F, 0x200D, 0x26A7, 0xFE0F ]`.
- Small set of changes.
- Forces developers to understand what surrogate pairs are, and to
actually use surrogate pairs instead of higher code points.
- Maintains current level of browser support.
- This approach can be complemented with comments or errors to attempt
to ensure developers don't make the mistake of using higher code points.
3. Work around the absence of `fromCodePoint` by effectively
reimplementing its functionality on top of `fromCharCode`, translating
from higher code points to surrogate pairs.
- Larger set of changes.
- More code, which means a larger script and potentially more
maintenance.
- Allows for developers to specify sequences using higher code points or
surrogate pairs.
- Maintains current level of browser support.
4. Use unicode escape sequences in string literals instead of arrays of
code points. E.g. `'\uD83C\uDFF3\uFE0F\u200D\u26A7\uFE0F'`.
- Larger set of changes.
- Less code than the status quo, which means a smaller script and less
maintenance / chance of errors.
- Still easy to see what sequences are being tested; the numbers are all
there, just in a different format.
- Forces developers to understand what surrogate pairs are, and to
actually use surrogate pairs instead of higher code points.
- Maintains current level of browser support (it's the same as for
`fromCharCode`).
- Strings are perfectly clear and consistent, with everything being the
same type of escape sequence.
- This approach can be complemented with comments or errors to attempt
to ensure developers don't make the mistake of using higher code points
via unicode code point escape sequences.
5. Use unicode code point escape sequences in string literals instead of
arrays of code points. E.g. `'\u{1F3F3}\uFE0F\u200D\u26A7\uFE0F'` or
`'\u{1F3F3}\u{FE0F}\u{200D}\u{26A7}\u{FE0F}'`.
- Larger set of changes.
- Less code than the status quo, which means a smaller script and less
maintenance / chance of errors.
- Still easy to see what sequences are being tested; the numbers are all
there, just in a different format.
- Allows for developers to specify escape sequences using higher code
points or surrogate pairs.
- Slightly [https://caniuse.com/mdn-
javascript_builtins_string_unicode_code_point_escapes reduced browser
support] vs [https://caniuse.com/mdn-
javascript_builtins_string_fromcharcode the status quo].
- Strings are mostly clear and consistent, with everything being an
escape sequence of some sort.
6. Directly use unescaped characters in string literals. E.g. `'🏳️⚧️'`.
- Larger set of changes.
- Less code than the status quo, which means a smaller script and less
maintenance / chance of errors.
- Easiest workflow for future changes: just copy and paste the emoji
into a string literal.
- Hardest option to reason about; it's difficult to analyse exactly
which sequences are being tested.
- Potential for a contributor's chosen editor to mess things up if it
doesn't handle unicode string literals correctly, somehow.
- Unclear what the browser support and what the failure mode for this
option are; would need testing.
Note that for all of the above options, reduced browser support should
mean (pending implementation and testing) that we simply default to
including the polyfill, instead of anything actually breaking. So it's
really not a big deal at all.
----
So, all things considered, I'm leaning towards 5. It's easy to understand
the sequences, it allows for contributors to use higher code points or
surrogate pairs without worrying about things too much, it maintains
nearly the same level of browser support, and it actually makes the code
smaller and simpler.
Which option do you think is best, @dmsnell? Did I miss anything?
--
Ticket URL: <https://core.trac.wordpress.org/ticket/57301#comment:11>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list