[wp-trac] [WordPress Trac] #21212: MySQL tables should use utf8mb4 character set
WordPress Trac
noreply at wordpress.org
Wed Feb 11 12:35:12 UTC 2015
#21212: MySQL tables should use utf8mb4 character set
----------------------------+---------------------
Reporter: pento | Owner:
Type: task (blessed) | Status: closed
Priority: normal | Milestone: 4.2
Component: Database | Version: 3.4.1
Severity: normal | Resolution: fixed
Keywords: | Focuses:
----------------------------+---------------------
Comment (by pento):
Replying to [comment:88 masakielastic]:
> wp_encode_emoji function is
[http://www.wikiwand.com/en/Leaky_abstraction leaky abstraction] since new
emoji characters will be added in Unicode Standard every year. Thus the
users of function are forced to check the change of
[http://www.unicode.org/Public/emoji/1.0/emoji-data.txt emoji-data.txt],
emoji skin tone modifiers (U+1F3FB..U+1F3FF, see
[http://www.unicode.org/reports/tr51/tr51-1.html Unicode Technical Report
51] ) and use their own function.
Not supporting Unicode 8.0 changes is intentional. The Twemoji library
also doesn't support Unicode 8.0 emoji or skin tone modifiers, we'll
update both when Unicode 8.0 is finalised, and Twemoji adds support.
> [http://en.wikipedia.org/wiki/Regional_Indicator_Symbol Regional
indicator symbols] (U+1F1E6..U+1F1FF) are not emoji themself though, they
are used for national flags. see
[http://www.unicode.org/Public/7.0.0/ucd/auxiliary/GraphemeBreakProperty.txt
GraphemeBreakProperty.txt] or [http://unicode.org/reports/tr29/ Unicode
Standard Annex 29] for the details.
For our purposes, I'm okay with treating the national flags as individual
characters. For example, `🇬🇧` will still show as the GB
flag.
There needs to be a little bit of extra logic to allow for the static
image replacement, but that won't be too tricky. Thank you for bringing it
to my attention - I've created a [https://github.com/pento/x1f4a9/issues/5
Github issue] to track it.
> Another reason why I do not vote for wp_encode_emoji function is that
all of 4-byte characters is not emoji.
>
> A lot of 4-byte chinese characters are used for the names of places and
the family names. U+20BB7 is used for
[http://en.wikipedia.org/wiki/Yoshinoya Yoshinoya], which is Japanese fast
food chain and have more than 1800 stores.
>
> A part of Variation selectors supplements are used for variant form of
chinese characters (U+E0100..U+E01EF). U+E0101 is used for Katsushika-ku
(U+845B U+E0101 U+98FE U+533A). TAKARA TOMY which is famous for Pokemon
has the head office in Katsushika-ku.
I'm not following what the problem is here - none of these characters are
encoded by `wp_encode_emoji()`. Could you please expand on the problem?
--
Ticket URL: <https://core.trac.wordpress.org/ticket/21212#comment:89>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list