[wp-trac] [WordPress Trac] #21212: MySQL tables should use utf8mb4 character set

WordPress Trac noreply at wordpress.org
Wed Feb 11 12:35:12 UTC 2015


#21212: MySQL tables should use utf8mb4 character set
----------------------------+---------------------
 Reporter:  pento           |       Owner:
     Type:  task (blessed)  |      Status:  closed
 Priority:  normal          |   Milestone:  4.2
Component:  Database        |     Version:  3.4.1
 Severity:  normal          |  Resolution:  fixed
 Keywords:                  |     Focuses:
----------------------------+---------------------

Comment (by pento):

 Replying to [comment:88 masakielastic]:
 > wp_encode_emoji function is
 [http://www.wikiwand.com/en/Leaky_abstraction leaky abstraction] since new
 emoji characters will be added in Unicode Standard every year. Thus the
 users of function are forced to check the change of
 [http://www.unicode.org/Public/emoji/1.0/emoji-data.txt emoji-data.txt],
 emoji skin tone modifiers (U+1F3FB..U+1F3FF, see
 [http://www.unicode.org/reports/tr51/tr51-1.html Unicode Technical Report
 51] ) and use their own function.

 Not supporting Unicode 8.0 changes is intentional. The Twemoji library
 also doesn't support Unicode 8.0 emoji or skin tone modifiers, we'll
 update both when Unicode 8.0 is finalised, and Twemoji adds support.

 > [http://en.wikipedia.org/wiki/Regional_Indicator_Symbol Regional
 indicator symbols] (U+1F1E6..U+1F1FF) are not emoji themself though, they
 are used for national flags. see
 [http://www.unicode.org/Public/7.0.0/ucd/auxiliary/GraphemeBreakProperty.txt
 GraphemeBreakProperty.txt] or [http://unicode.org/reports/tr29/ Unicode
 Standard Annex 29] for the details.

 For our purposes, I'm okay with treating the national flags as individual
 characters. For example, `🇬🇧` will still show as the GB
 flag.

 There needs to be a little bit of extra logic to allow for the static
 image replacement, but that won't be too tricky. Thank you for bringing it
 to my attention - I've created a [https://github.com/pento/x1f4a9/issues/5
 Github issue] to track it.

 > Another reason why I do not vote for wp_encode_emoji function is that
 all of 4-byte characters is not emoji.
 >
 > A lot of 4-byte chinese characters are used for the names of places and
 the family names. U+20BB7 is used  for
 [http://en.wikipedia.org/wiki/Yoshinoya Yoshinoya], which is Japanese fast
 food chain and have more than 1800 stores.
 >
 > A part of Variation selectors supplements are used for variant form of
 chinese characters (U+E0100..U+E01EF). U+E0101 is used for Katsushika-ku
 (U+845B U+E0101 U+98FE U+533A).  TAKARA TOMY which is famous for Pokemon
 has the head office in Katsushika-ku.

 I'm not following what the problem is here - none of these characters are
 encoded by `wp_encode_emoji()`. Could you please expand on the problem?

--
Ticket URL: <https://core.trac.wordpress.org/ticket/21212#comment:89>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list