[wp-trac] [WordPress Trac] #32105: Database collate should be utf8mb4_unicode_520_ci

WordPress Trac noreply at wordpress.org
Fri Apr 24 12:15:28 UTC 2015


#32105: Database collate should be utf8mb4_unicode_520_ci
--------------------------+------------------------------
 Reporter:  miyauchi      |       Owner:
     Type:  defect (bug)  |      Status:  new
 Priority:  normal        |   Milestone:  Awaiting Review
Component:  Database      |     Version:  4.2
 Severity:  normal        |  Resolution:
 Keywords:                |     Focuses:
--------------------------+------------------------------
Changes (by netweb):

 * component:  General => Database


Comment:

 Miyauchi, thanks for the unit test, as you point out you expect `null` yet
 return post id `3` which will be the result of the test for any version of
 MySQL before version 5.6 and using `utf8mb4_unicode_520_ci`.

 My take on this is that `utf8mb4_unicode_520_ci` is only available in
 MySQL 5.6 and above and as such unless you are running at least MySQL 5.6
 and `utf8mb4_unicode_520_ci` then there is no way to compare one emoji
 against another emoji (or any other character in that unicode plane) in
 older MySQL versions.

 For reference this has been previously discussed in #31328

 Via https://core.trac.wordpress.org/ticket/31328#comment:6
 > The problem stems from MySQL's collation behaviour - it treats all
 Unicode Supplementary Characters (which emoji fall under) as being
 equivalent. It's not until MySQL 5.6, wich the addition of the
 `utf8mb4_unicode_520_ci` collation that this changes.

 Via https://core.trac.wordpress.org/ticket/31328#comment:20
 > This is reproducible for any two terms containing the same number of
 emoji characters (as opposed to glyphs).
 >
 > This is because we were searching for duplicates by term name. All
 `utf8mb4_*` collations (prior to `utf8mb4_unicode_520_ci`) treat emoji as
 being equivalent characters, so would just match the first one found. Now
 that we're searching for duplicates by term term slug, this is no longer a
 problem. Term slugs store the URL-encoded version of the emoji character,
 which the `utf8mb4_` collations correctly interpret as a string of ASCII
 text.

--
Ticket URL: <https://core.trac.wordpress.org/ticket/32105#comment:3>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list