[wp-trac] [WordPress Trac] #54836: Huge error logs filled with "WordPress database error Illegal mix of collations" errors caused by spammers

WordPress Trac noreply at wordpress.org
Sun May 21 11:02:05 UTC 2023


#54836: Huge error logs filled with "WordPress database error Illegal mix of
collations" errors caused by spammers
--------------------------+------------------------------
 Reporter:  jh20001       |       Owner:  (none)
     Type:  defect (bug)  |      Status:  new
 Priority:  normal        |   Milestone:  Awaiting Review
Component:  Database      |     Version:
 Severity:  critical      |  Resolution:
 Keywords:                |     Focuses:
--------------------------+------------------------------

Comment (by craigfrancis):

 The error_log from @jh20001 shows the database tables/fields using
 `latin1_general_ci`, and trying to compare against a value it believes is
 `utf8mb4_unicode_520_ci` - this might be from another table, or from the
 user value (e.g. it guesses this type due to characters such as "πŸ“ž").

 Likewise, the error_log from @varinupadhyay (ref #58133) uses
 `utf8_general_ci` (aka `utf8mb3_general_ci`), which does not support all
 unicode characters (maximum of three bytes per code point), so it has the
 same problem (in this case, characters like "πŸ₯").

 ----

 You can test this with:

 {{{#!sql
 CREATE TABLE `test` (
   `value` TINYTEXT NOT NULL
 ) CHARACTER SET utf8 COLLATE utf8_general_ci;

 INSERT INTO test VALUES ("a"), ("ΓΌ"), ("ζ₯Ό");
 INSERT INTO test VALUES ("πŸ₯");

 SELECT * FROM test WHERE value LIKE "%πŸ₯%";
 }}}

 ----

 WordPress tries to setup new tables/fields with `utf8mb4_unicode_520_ci`
 or `utf8mb4_unicode_ci`, which supports characters like πŸ“ž and πŸ₯, but
 older databases would not have used them.

 You can test your servers support by going to "WP-Admin > Tools > Site
 Health", and checking the result for "UTF8MB4 is supported" (which might
 be under "Passed Tests").

 I'd recommend changing the character-set in both of these databases (with
 the usual notes of making a backup, checking for encoding issues, and
 being aware that it can take time to alter big tables).

 ----

 As to solutions...

 It would be too difficult/dangerous/slow to check and make changes to
 every single SELECT/INSERT/UPDATE query - i.e. parsing the SQL,
 identifying every field and value, checking their character-set, and doing
 something (e.g. using CONVERT) to make them compatible.

 Personally I agree with @pbiron, I don't think core can/should do anything
 about this... unless someone can come up with a way to get the upgrade
 process to check/alter every single WP database/table/field to use utf8mb4
 without any issues.

 ----

 As an aside, `update.php` does have a `maybe_convert_table_to_utf8mb4()`
 function, but it only works when all fields are `utf8` or `utf8mb4`, it's
 only used when upgrading between certain versions of the database, and it
 can still be risky:

 {{{#!sql
 CREATE TABLE `test` (
   `a` TINYTEXT NOT NULL,
   `b` TINYTEXT CHARACTER SET utf8 COLLATE utf8_bin NOT NULL
 ) CHARACTER SET utf8 COLLATE utf8_general_ci;

 ALTER TABLE `test` CONVERT TO CHARACTER SET utf8mb4 COLLATE
 utf8mb4_unicode_ci;

 SHOW CREATE TABLE test;
 }}}

 Note how the Bin field becomes Case-Insensitive, and the fields change
 from TINYTEXT to TEXT.

 https://dev.mysql.com/doc/refman/8.0/en/alter-table.html#alter-table-
 character-set
 https://codex.wordpress.org/Converting_Database_Character_Sets
 https://make.wordpress.org/core/2015/04/02/the-utf8mb4-upgrade/

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/54836#comment:6>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list