[wp-trac] [WordPress Trac] #59868: Database insert with emoji fails when table has columns with both utf8mb3 (utf8) and utf8mb4 charsets
WordPress Trac
noreply at wordpress.org
Sat Sep 28 10:32:19 UTC 2024
#59868: Database insert with emoji fails when table has columns with both utf8mb3
(utf8) and utf8mb4 charsets
----------------------------------------+-----------------------------
Reporter: ianmjones | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Future Release
Component: Charset | Version: 4.2
Severity: normal | Resolution:
Keywords: needs-unit-tests has-patch | Focuses:
----------------------------------------+-----------------------------
Changes (by jondaley):
* keywords: needs-patch needs-unit-tests => needs-unit-tests has-patch
Comment:
I'm not sure how I only ran into this now, but my users are now
complaining that they can't submit emojis.
My setup:
Wordpress 6.6.2
MariaDB: 10.11.6-MariaDB-0+deb12u1-log
PHP 8.2.20
wp_posts.post_content collation: utf8mb3_unicode_ci
I just upgraded it to utf8mb4 to see if that would fix the problem, as I
saw some forum posts saying that WordPress is using utf8mb4 and so maybe
an upgrade failed somewhere along the way, and I didn't notice. (the
alter table command did take over 20 minutes - my database is fairly
large: 3.2GB)
So, now the wp_posts.post_content collation: utf8mb4_unicode_520_ci
And I have the same problem - inserting emojis (using the buddyboss
plugin, that tries to save 😃 gets tripped up in wp_insert_post because
wp_encode_emoji() is NOT called, because the code is checking for exactly
"utf8".
I was able to fix it:
{{{
4557c4557
< if ( strpos($charset, 'utf8') !== FALSE ) {
---
> if ( 'utf8' === $charset ) {
}}}
I don't know if that would be an acceptable fix for everyone. Since this
code is emoji specific and not trying to cover all UTF8 characters, maybe
it would be ok?
I saw some references to Japanese emojis having problems, so maybe they
have 4 byte emojis?
Perhaps a safer change would be to check for utf8mb4 || ... (I'm not sure
what goes in the ... - if I understand the character sets correctly, utf8
is 3 bytes by default, and so anyone using utf8mb3 or utf8mb4 would be at
least as good as someone using utf8, so I'm thinking that all three of
those choices would work fine, and I don't know if the strpos() solution
is easier/safe enough to implement?
Let me know if I can help debug this any further. I have a development
site that I can play with and not affect live users.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/59868#comment:3>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list