[wp-trac] [WordPress Trac] #59868: Database insert with emoji fails when table has columns with both utf8mb3 (utf8) and utf8mb4 charsets

WordPress Trac noreply at wordpress.org
Sat Sep 28 10:32:19 UTC 2024


#59868: Database insert with emoji fails when table has columns with both utf8mb3
(utf8) and utf8mb4 charsets
----------------------------------------+-----------------------------
 Reporter:  ianmjones                   |       Owner:  (none)
     Type:  defect (bug)                |      Status:  new
 Priority:  normal                      |   Milestone:  Future Release
Component:  Charset                     |     Version:  4.2
 Severity:  normal                      |  Resolution:
 Keywords:  needs-unit-tests has-patch  |     Focuses:
----------------------------------------+-----------------------------
Changes (by jondaley):

 * keywords:  needs-patch needs-unit-tests => needs-unit-tests has-patch


Comment:

 I'm not sure how I only ran into this now, but my users are now
 complaining that they can't submit emojis.

 My setup:
 Wordpress 6.6.2
 MariaDB: 10.11.6-MariaDB-0+deb12u1-log
 PHP 8.2.20

 wp_posts.post_content collation: utf8mb3_unicode_ci
 I just upgraded it to utf8mb4 to see if that would fix the problem, as I
 saw some forum posts saying that WordPress is using utf8mb4 and so maybe
 an upgrade failed somewhere along the way, and I didn't notice.  (the
 alter table command did take over 20 minutes - my database is fairly
 large: 3.2GB)

 So, now the wp_posts.post_content collation: utf8mb4_unicode_520_ci

 And I have the same problem - inserting emojis (using the buddyboss
 plugin, that tries to save 😃 gets tripped up in wp_insert_post because
 wp_encode_emoji() is NOT called, because the code is checking for exactly
 "utf8".

 I was able to fix it:

 {{{
 4557c4557
 <                       if ( strpos($charset, 'utf8') !== FALSE ) {
 ---
 >                       if ( 'utf8' === $charset ) {
 }}}

 I don't know if that would be an acceptable fix for everyone.  Since this
 code is emoji specific and not trying to cover all UTF8 characters, maybe
 it would be ok?

 I saw some references to Japanese emojis having problems, so maybe they
 have 4 byte emojis?

 Perhaps a safer change would be to check for utf8mb4 || ... (I'm not sure
 what goes in the ... - if I understand the character sets correctly, utf8
 is 3 bytes by default, and so anyone using utf8mb3 or utf8mb4 would be at
 least as good as someone using utf8, so I'm thinking that all three of
 those choices would work fine, and I don't know if the strpos() solution
 is easier/safe enough to implement?

 Let me know if I can help debug this any further.  I have a development
 site that I can play with and not affect live users.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/59868#comment:3>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list