[wp-trac] [WordPress Trac] #59868: Database insert with emoji fails when table has columns with both utf8mb3 (utf8) and utf8mb4 charsets

WordPress Trac noreply at wordpress.org
Thu Nov 9 14:41:41 UTC 2023


#59868: Database insert with emoji fails when table has columns with both utf8mb3
(utf8) and utf8mb4 charsets
--------------------------+------------------------------------------
 Reporter:  ianmjones     |      Owner:  (none)
     Type:  defect (bug)  |     Status:  new
 Priority:  normal        |  Milestone:  Awaiting Review
Component:  Charset       |    Version:  trunk
 Severity:  normal        |   Keywords:  needs-patch needs-unit-tests
  Focuses:                |
--------------------------+------------------------------------------
 The `wpdb::get_table_charset()` function currently sets the charset to
 `utf8` when it detects that both `utf8` and `utf8mb4` charsets are present
 in the table's column definitions.

 That same function also swaps in `utf8` for `utf8mb3` as they are
 effectively the same thing.

 This means that the `wpdb::strip_invalid_text_from_query()` function used
 early by the `wpdb::query()` function to determine whether text is safe to
 be inserted, ends up stripping `utf8mb4` safe characters because it forces
 the use of `utf8` in the called `wpdb::strip_invalid_text()` function.

 This results in insert queries failing where a table has columns with both
 `utf8mb3/utf8` and `utf8mb4` collations used, and there are emojis or
 other 4 byte characters being used in the column that has a `utf8mb4`
 charset and collation defined.

 I propose that the `wpdb::get_table_charset()` function should use
 `utf8mb4` as the returned charset when it detects that 2 charsets are
 defined on the table, and they are `utf8` and `utf8mb4`, instead of the
 current behaviour of returning `utf8`.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/59868>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list