[wp-trac] [WordPress Trac] #32105: Database collate should be utf8mb4_unicode_520_ci
WordPress Trac
noreply at wordpress.org
Fri Apr 24 12:15:28 UTC 2015
#32105: Database collate should be utf8mb4_unicode_520_ci
--------------------------+------------------------------
Reporter: miyauchi | Owner:
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Database | Version: 4.2
Severity: normal | Resolution:
Keywords: | Focuses:
--------------------------+------------------------------
Changes (by netweb):
* component: General => Database
Comment:
Miyauchi, thanks for the unit test, as you point out you expect `null` yet
return post id `3` which will be the result of the test for any version of
MySQL before version 5.6 and using `utf8mb4_unicode_520_ci`.
My take on this is that `utf8mb4_unicode_520_ci` is only available in
MySQL 5.6 and above and as such unless you are running at least MySQL 5.6
and `utf8mb4_unicode_520_ci` then there is no way to compare one emoji
against another emoji (or any other character in that unicode plane) in
older MySQL versions.
For reference this has been previously discussed in #31328
Via https://core.trac.wordpress.org/ticket/31328#comment:6
> The problem stems from MySQL's collation behaviour - it treats all
Unicode Supplementary Characters (which emoji fall under) as being
equivalent. It's not until MySQL 5.6, wich the addition of the
`utf8mb4_unicode_520_ci` collation that this changes.
Via https://core.trac.wordpress.org/ticket/31328#comment:20
> This is reproducible for any two terms containing the same number of
emoji characters (as opposed to glyphs).
>
> This is because we were searching for duplicates by term name. All
`utf8mb4_*` collations (prior to `utf8mb4_unicode_520_ci`) treat emoji as
being equivalent characters, so would just match the first one found. Now
that we're searching for duplicates by term term slug, this is no longer a
problem. Term slugs store the URL-encoded version of the emoji character,
which the `utf8mb4_` collations correctly interpret as a string of ASCII
text.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/32105#comment:3>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list