[wp-trac] [WordPress Trac] #59868: Database insert with emoji fails when table has columns with both utf8mb3 (utf8) and utf8mb4 charsets
WordPress Trac
noreply at wordpress.org
Thu Nov 9 14:41:41 UTC 2023
#59868: Database insert with emoji fails when table has columns with both utf8mb3
(utf8) and utf8mb4 charsets
--------------------------+------------------------------------------
Reporter: ianmjones | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Charset | Version: trunk
Severity: normal | Keywords: needs-patch needs-unit-tests
Focuses: |
--------------------------+------------------------------------------
The `wpdb::get_table_charset()` function currently sets the charset to
`utf8` when it detects that both `utf8` and `utf8mb4` charsets are present
in the table's column definitions.
That same function also swaps in `utf8` for `utf8mb3` as they are
effectively the same thing.
This means that the `wpdb::strip_invalid_text_from_query()` function used
early by the `wpdb::query()` function to determine whether text is safe to
be inserted, ends up stripping `utf8mb4` safe characters because it forces
the use of `utf8` in the called `wpdb::strip_invalid_text()` function.
This results in insert queries failing where a table has columns with both
`utf8mb3/utf8` and `utf8mb4` collations used, and there are emojis or
other 4 byte characters being used in the column that has a `utf8mb4`
charset and collation defined.
I propose that the `wpdb::get_table_charset()` function should use
`utf8mb4` as the returned charset when it detects that 2 charsets are
defined on the table, and they are `utf8` and `utf8mb4`, instead of the
current behaviour of returning `utf8`.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/59868>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list