[wp-trac] [WordPress Trac] #58871: support uca14.0.0 collation in database where available
WordPress Trac
noreply at wordpress.org
Sun Sep 17 12:40:29 UTC 2023
#58871: support uca14.0.0 collation in database where available
-------------------------------------------------+-------------------------
Reporter: danielblack | Owner: (none)
Type: enhancement | Status: new
Priority: normal | Milestone: Awaiting
| Review
Component: Database | Version: 6.3
Severity: normal | Resolution:
Keywords: has-patch has-unit-tests needs- | Focuses:
testing |
-------------------------------------------------+-------------------------
Comment (by craigfrancis):
Thanks @danielblack.
Just a thought (as I'm not sure what the repercussions are), but if we
added support for MySQL's `utf8mb4_0900_ai_ci` as well, to avoid multiple
`SHOW COLLATION` queries, we could use:
SHOW COLLATION where Collation IN
('uca1400_ai_ci','utf8mb4_0900_ai_ci','utf8mb4_unicode_520_ci');
Then store the results on a private wpdb property, so it's cached, and can
be used by `has_cap()`?
Note that `determine_charset()` is called by `db_connect()`, via
`init_charset()`; and while it's fairly fast (on my localhost ~0.0006s,
which does not use a network connection), it won't be as fast as
`mysqli_get_server_info()` to **guess** the supported character sets based
on version number (~0.0000001s).
---
And misc points...
- I'm fine with accent-insensitive (like case-insensitive), I just don't
know if it would cause any problems for anyone else (only reason I'm
noting it).
- Agreed, I don't think `@@character_set_collations` is useful here, I
just thought I would mention it incase it gave any inspiration for
alternative solutions.
- The MaraDB documentation says "the character set name is always part of
the collation name" ([https://mariadb.com/kb/en/character-set-and-
collation-overview/ source]), I assume that's incorrect as collation
`uca1400_ai_ci` would imply a different character set.
- Running `SHOW COLLATION WHERE Collation LIKE "%uca1400%"` provides NULL
for the `Charset`?
- I assume it's still correct to use the `utf8mb4` character set, along
with `mysqli_set_charset('utf8mb4')` for the connection?
- Also, tables that exist today will use `utf8mb4_unicode_520_ci`, I don't
think these will be changed during an update, see
`maybe_convert_table_to_utf8mb4()`; would that cause any problems (e.g.
adding new tables/columns that would then use a different collation)?
- Oddly, if I manually run `ALTER TABLE wp_commentmeta CHANGE meta_key
meta_key VARCHAR(255) CHARACTER SET utf8mb4 COLLATE uca1400_ai_ci NULL
DEFAULT NULL`, then the `meta_key` field collation is set to
`utf8mb4_uca1400_ai_ci`, which does kinda work with
`maybe_convert_table_to_utf8mb4()` with it's use of `explode('_')`.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/58871#comment:6>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list