[wp-trac] [WordPress Trac] #32405: Database collation upgrade routine to support UTF8MB4 collations
WordPress Trac
noreply at wordpress.org
Fri May 15 03:57:28 UTC 2015
#32405: Database collation upgrade routine to support UTF8MB4 collations
--------------------------+--------------------------------------
Reporter: netweb | Owner:
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Database | Version: 4.2
Severity: normal | Keywords: dev-feedback needs-patch
Focuses: |
--------------------------+--------------------------------------
Currently at this time writing the Finnish team are using the following
for the Finnish localised package:
* `define('DB_CHARSET', 'utf8');` and `define('DB_COLLATE',
'utf8_swedish_ci');`
* http://i18n.trac.wordpress.org/browser/fi/branches/4.2/dist/wp-config-
sample.php
What they wanted to use was:
* `define('DB_CHARSET', 'utf8mb4');` and `define('DB_COLLATE',
'utf8mb4_swedish_ci');`
* http://i18n.trac.wordpress.org/changeset/26724
WordPress currently needs to start with `utf8` as the character set as not
all sites can support `utf8mb4`, so `utf8` in the config file is
automatically upgraded at runtime to `utf8mb4` if all the requirements for
it's use are met.
This upgrade support does not currently stem the same logic to collations,
i.e. if a the collation is set to `utf8_swedish_ci` in `wp-config.php`
after successfully upgrading of `utf8` to `utf8mb4` the collation
`utf8_swedish_ci` is '''NOT''' upgraded to `utf8mb4_swedish_ci`.
----
The following is extracts from a discussion on Slack in #core-i18n, full
discussion [https://wordpress.slack.com/archives/core-
i18n/p1431559485000017 here]
> Netweb: “So after the various chats last night it looks like we have the
Finnish locale leaving the charset as UTF8 but defining the collation as
utf8_swedish_ci for the Finish locale, will that explode?
>
> dd32 “I don’t know the answer here. If anything we probably need some
logic to upgrade a `utf8_swedish` to a `utf8mb4_swedish` if supported by
the server.. I think we also need to look into using
`utf8mb4_unicode_520_ci` when supported too
>
> dd32 "For finish, No it won't explode, but they may have alphabeticalism
issues if it doesn't play nice with `utf8mb4_unicode_ci`."
>
>
> If the site uses a `utf8mb4` charset, and they have a `utf8_*` character
set set, it’ll be overridden to `utf8mb4_unicode_ci`.
>
> If the site uses `utf8` and they set `utf8mb4_swedish_ci` things will
break
>
> If the site uses `utf8mb4` and they set `utf8mb4_swedish_ci`, then..
it’ll use `utf8mb4_swedish_ci`.
>
> “In other words, customising those values in the default file is really
a bad idea. Site admins can do that sure, but it should default to our
defaults.”
--
Ticket URL: <https://core.trac.wordpress.org/ticket/32405>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list