[wp-trac] [WordPress Trac] #58871: support uca14.0.0 collation in database where available
WordPress Trac
noreply at wordpress.org
Mon Sep 18 00:55:43 UTC 2023
#58871: support uca14.0.0 collation in database where available
-------------------------------------------------+-------------------------
Reporter: danielblack | Owner: (none)
Type: enhancement | Status: new
Priority: normal | Milestone: Awaiting
| Review
Component: Database | Version: 6.3
Severity: normal | Resolution:
Keywords: has-patch has-unit-tests needs- | Focuses:
testing |
-------------------------------------------------+-------------------------
Comment (by danielblack):
{{{
SHOW COLLATION where Collation IN
('uca1400_ai_ci','utf8mb4_0900_ai_ci','utf8mb4_unicode_520_ci');
}}}
Sounds useful, and this could be just implemented in `determine_charset`,
with cache, so there's one query. If we do that, maybe `has_cap( 'uca1400'
)` need not be implemented. We'll see how easy the test cases are to
write.
I'll prepare another draft implemented in `determine_charset` and
`maybe_convert_table_to_utf8mb4` doing a collation conversion too.
From Misc:
> `@@character_set_collations` is useful here, I just thought I would
mention it incase it gave any inspiration for alternative solutions.
It has some possibly useful implications as a default connection for
coercing the collation that are probably worth while.
https://mariadb.com/kb/en/setting-character-sets-and-collations/#changing-
default-collation
> The MaraDB documentation says "the character set name is always part of
the collation name...
Yep needs an update. I'll see what can be written.
> Running SHOW COLLATION WHERE Collation LIKE "%uca1400%" provides NULL
for the Charset
Finally got to the bottom of this with the original commit -
https://github.com/MariaDB/server/commit/133446828c9dcb484476e4b3598af0d63d056a6e
(also a documentation task to pick up)
Null implies it can apply to multiple character sets.
{{{
MariaDB [test]> select * from
INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY where
COLLATION_NAME='uca1400_ai_ci';
+----------------+--------------------+-----------------------+------+------------+
| COLLATION_NAME | CHARACTER_SET_NAME | FULL_COLLATION_NAME | ID |
IS_DEFAULT |
+----------------+--------------------+-----------------------+------+------------+
| uca1400_ai_ci | utf8mb3 | utf8mb3_uca1400_ai_ci | 2048 |
|
| uca1400_ai_ci | ucs2 | ucs2_uca1400_ai_ci | 2560 |
|
| uca1400_ai_ci | utf8mb4 | utf8mb4_uca1400_ai_ci | 2304 |
|
| uca1400_ai_ci | utf16 | utf16_uca1400_ai_ci | 2816 |
|
| uca1400_ai_ci | utf32 | utf32_uca1400_ai_ci | 3072 |
|
+----------------+--------------------+-----------------------+------+------------+
}}}
MySQL-5.5 still has the first two columns.
> I assume it's still correct to use the utf8mb4 character set, along with
mysqli_set_charset('utf8mb4') for the connection?
Yes. Or any charset from above it seems.
> Also, tables that exist today will use utf8mb4_unicode_520_ci, I don't
think these will be changed during an update, see
`maybe_convert_table_to_utf8mb4()`;
But should they? I suspect doing so would be prudent.
> would that cause any problems (e.g. adding new tables/columns that
would then use a different collation)?
Only when the SQL use corresponds to existing tables as well.
{{{
MariaDB [test]> create table t520 (t varchar(30) character set utf8mb4
collate utf8mb4_unicode_520_ci);
MariaDB [test]> create table t1400 (t varchar(30) character set utf8mb4
collate utf8mb4_uca1400_ai_ci);
MariaDB [test]> insert into t520 values ('bob'),('jack'), ('jane');
MariaDB [test]> insert into t1400 values ('bob'),('jack'), ('jane');
MariaDB [test]> select * from t1400 join t520 on t1400.t = t520.t;
ERROR 1267 (HY000): Illegal mix of collations
(utf8mb4_uca1400_ai_ci,IMPLICIT) and (utf8mb4_unicode_520_ci,IMPLICIT) for
operation '='
}}}
(and bug https://jira.mariadb.org/browse/MDEV-32192 for using
`@@character_set_collations` to resolve this (for 11.2+)).
Given the implicitness of this and compatibility with existing tables a
conversion in update seems a way to avoid some problems.
> Oddly, if I manually run ... which does kinda work with
`maybe_convert_table_to_utf8mb4()` with it's use of explode('_').
I assume that was intentional.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/58871#comment:7>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list