[wp-trac] [WordPress Trac] #32405: Database collation upgrade routine to support UTF8MB4 collations

WordPress Trac noreply at wordpress.org
Fri May 15 03:57:28 UTC 2015


#32405: Database collation upgrade routine to support UTF8MB4 collations
--------------------------+--------------------------------------
 Reporter:  netweb        |      Owner:
     Type:  defect (bug)  |     Status:  new
 Priority:  normal        |  Milestone:  Awaiting Review
Component:  Database      |    Version:  4.2
 Severity:  normal        |   Keywords:  dev-feedback needs-patch
  Focuses:                |
--------------------------+--------------------------------------
 Currently at this time writing the Finnish team are using the following
 for the Finnish localised package:
 * `define('DB_CHARSET', 'utf8');` and `define('DB_COLLATE',
 'utf8_swedish_ci');`
 * http://i18n.trac.wordpress.org/browser/fi/branches/4.2/dist/wp-config-
 sample.php

 What they wanted to use was:
 * `define('DB_CHARSET', 'utf8mb4');` and `define('DB_COLLATE',
 'utf8mb4_swedish_ci');`
 * http://i18n.trac.wordpress.org/changeset/26724

 WordPress currently needs to start with `utf8` as the character set as not
 all sites can support `utf8mb4`, so `utf8` in the config file is
 automatically upgraded at runtime to `utf8mb4` if all the requirements for
 it's use are met.

 This upgrade support does not currently stem the same logic to collations,
 i.e. if a the collation is set to `utf8_swedish_ci` in `wp-config.php`
 after successfully upgrading of `utf8` to `utf8mb4` the collation
 `utf8_swedish_ci` is '''NOT''' upgraded to `utf8mb4_swedish_ci`.


 ----

 The following is extracts from a discussion on Slack in #core-i18n, full
 discussion [https://wordpress.slack.com/archives/core-
 i18n/p1431559485000017 here]

 > Netweb: “So after the various chats last night it looks like we have the
 Finnish locale leaving the charset as UTF8 but defining the collation as
 utf8_swedish_ci for the Finish locale, will that explode?
 >
 > dd32 “I don’t know the answer here. If anything we probably need some
 logic to upgrade a `utf8_swedish` to a `utf8mb4_swedish` if supported by
 the server.. I think we also need to look into using
 `utf8mb4_unicode_520_ci` when supported too
 >
 > dd32 "For finish, No it won't explode, but they may have alphabeticalism
 issues if it doesn't play nice with `utf8mb4_unicode_ci`."
 >
 >
 > If the site uses a `utf8mb4` charset, and they have a `utf8_*` character
 set set, it’ll be overridden to `utf8mb4_unicode_ci`.
 >
 > If the site uses `utf8` and they set `utf8mb4_swedish_ci` things will
 break
 >
 > If the site uses `utf8mb4` and they set `utf8mb4_swedish_ci`, then..
 it’ll use `utf8mb4_swedish_ci`.
 >
 > “In other words, customising those values in the default file is really
 a bad idea. Site admins can do that sure, but it should default to our
 defaults.”

--
Ticket URL: <https://core.trac.wordpress.org/ticket/32405>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list