[wp-trac] [WordPress Trac] #38186: Database Collations Bypassed by determine_charset() in wp-db.php
WordPress Trac
noreply at wordpress.org
Fri Feb 10 14:32:57 UTC 2017
#38186: Database Collations Bypassed by determine_charset() in wp-db.php
--------------------------+------------------------------
Reporter: natecf | Owner:
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Charset | Version: 4.6.1
Severity: major | Resolution:
Keywords: | Focuses:
--------------------------+------------------------------
Changes (by cimatti):
* severity: normal => major
Comment:
I think this is a deep issue with many potential consequence, because even
if WordPress changed the default result of $wpdb->charset and
$wpdb->collate , charset and collations in databases of already existings
installations are not updated.
Even plugins are involved, because they should use $wpdb->charset and
$wpdb->collate to create tables. So plugins that created tables with
collation utf8mb4_unicode_ci with an older WordPress version, now may
create new tables and columns with collation utf8mb4_unicode_520_ci
I already noticed in an old WordPress installation that wordpress columns
remained on collation utf8mb4_unicode_ci but a plugin created a table with
utf8mb4_unicode_520_ci. I have a plugin that has to create a temporary
table and join it to existing tables to do a task. This stopped to work
because old tables uses utf8mb4_unicode_ci and the new temporary table
uses utf8mb4_unicode_520_ci
So the big problem is that if you make a join or an operation between two
columns with collation utf8mb4_unicode_ci and utf8mb4_unicode_520_ci the
query fails
The passage from utf8 to utf8mb4 could be problematic, because MySQL
normally has a limit of 1000 byte for keys, so with utf8 the key can't
hold more than 333 characters, and with utf8mb4 the limit is 250 and if
you have a key valid with utf8 it may be too long with utf8mb4
Changing a collation may be problematic too if you change it on a column
with an unique key, because values that were considered different before,
may be considered equal with the new collation
I propose to follow this path:
- default charset and collation should be chosen during installation, and
you should stick with that
- you should define a standard procedure to change collation, and plugins
should implement a callback to change it in their tables when called
- anyway migration to another collation should be discouraged, and if
necessary it should be tested before on a copy of the site, and in any
case a backup is strongly suggested
--
Ticket URL: <https://core.trac.wordpress.org/ticket/38186#comment:2>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list