[wp-trac] [WordPress Trac] #36456: oEmbeds containing Emojis aren't cached if the meta_value column's charset is utf8 and not utf8mb4
WordPress Trac
noreply at wordpress.org
Sat Apr 9 10:56:03 UTC 2016
#36456: oEmbeds containing Emojis aren't cached if the meta_value column's charset
is utf8 and not utf8mb4
--------------------------+-----------------------------
Reporter: birgire | Owner:
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Embeds | Version: 4.4.2
Severity: normal | Keywords:
Focuses: |
--------------------------+-----------------------------
oEmbeds (e.g. Instagram, Twitter, ... ) that contains Emojis aren't cached
in the {{{wp_postmeta}}} table, if the {{{meta_value}}} column's charset
is reported as {{{utf8}}} and not {{{utf8mb4}}}.
This means that {{{wp_oembed_get()}}} is called for every such embed on
each page load.
Here's an example of an Instagram containing an Emoji:
https://www.instagram.com/p/_cj9NXvhrh/
where only the:
{{{
_oembed_time_02ba39bcc45811ad672a288b772ca9a2
}}}
is written to the {{{wp_postmeta}}} table but not
{{{
_oembed_02ba39bcc45811ad672a288b772ca9a2
}}}
containing the cached oEmbed content.
I just noticed this for a site with a shared database server with MySQL
version < 5.5.3.
In the {{{WP_Embed::shortcode()}}} method we have following:
{{{
// Maybe cache the result
if ( $html ) {
update_post_meta( $post_ID, $cachekey, $html );
update_post_meta( $post_ID, $cachekey_time, time() );
} elseif ( ! $cache ) {
update_post_meta( $post_ID, $cachekey, '{{unknown}}' );
}
}}}
This doesn't take care of the situation when {{{update_postmeta()}}} fails
to write to the database.
For new meta inserts it uses:
{{{
$result = $wpdb->insert( $table, array(
$column => $object_id,
'meta_key' => $meta_key,
'meta_value' => $meta_value
) );
}}}
This is a wrapper for:
{{{
wpdb::_insert_replace_helper()
}}}
that contains this part:
{{{
$data = $this->process_fields( $table, $data, $format );
if ( false === $data ) {
return false;
}
}}}
And similarly within {{{wpdb::process_fields()}}} we have:
{{{
$converted_data = $this->strip_invalid_text( $data );
if ( $data !== $converted_data ) {
return false;
}
}}}
and this can return {{{false}}} for some data conversions.
== Example ==
We can test this with a simple example:
{{{
update_post_meta( 123, 'test', 'Smile 😍' );
}}}
that returns {{{false}}} if:
{{{
$wpdb->get_col_charset( $wpdb->postmeta, 'meta_value' )
}}}
returns {{{'utf8'}}} and not {{{'utf8mb4'}}}.
In this case nothing is written to the {{{wp_postmeta}}} table.
== Workaround ==
Here's what I constructed as a temporary workaround:
{{{
/**
* Encode Emojis in all oembed results, if the charset for the meta_value
column is utf8.
*/
add_filter( 'oembed_result', function( $html, $url, $args ) use ( $wpdb )
{
/**
* For WordPress 4.2+ where the get_col_charset method and
wp_encode_emoji function where introduced.
* The charset check is based on the
https://codex.wordpress.org/Function_Reference/wp_encode_emoji#Example
*/
if ( method_exists( $wpdb, 'get_col_charset' )
&& 'utf8' === $wpdb->get_col_charset( $wpdb->postmeta,
'meta_value' )
)
$html = wp_encode_emoji( $html );
return $html;
}, 10, 3 );
}}}
== Suggestions ==
I see two possibilities to rewrite this part:
{{{
// Maybe cache the result
if ( $html ) {
update_post_meta( $post_ID, $cachekey, $html );
update_post_meta( $post_ID, $cachekey_time, time() );
} elseif ( ! $cache ) {
update_post_meta( $post_ID, $cachekey, '{{unknown}}' );
}
}}}
to fix the caching.
'''Suggestion 1)'''
{{{
// Maybe cache the result
if ( $html ) {
$updated_cachekey = update_post_meta( $post_ID, $cachekey, $html );
if( $updated_cachekey ) {
update_post_meta( $post_ID, $cachekey_time, time() );
} else {
update_post_meta( $post_ID, $cachekey, '{{unknown}}' );
}
} elseif ( ! $cache ) {
update_post_meta( $post_ID, $cachekey, '{{unknown}}' );
}
}}}
This means that only the link will show up if {{{$updated_cachekey}}} is
{{{false}}} and the cache key for the time isn't written to the db as
well.
We could also rewrite this to only contain a single
{{{update_post_meta()}}} call.
'''Suggestion 2)'''
We could also extend 1) to encode the Emojis if needed:
{{{
// Maybe cache the result
if ( $html ) {
// Encode Emojis if the charset of the meta_value column is 'utf8'
if ( method_exists( $wpdb, 'get_col_charset' )
&& 'utf8' === $wpdb->get_col_charset( $wpdb->postmeta,
'meta_value' )
&& function_exists( 'wp_encode_emoji' )
) {
$html = wp_encode_emoji( $html );
}
$updated_cachekey = update_post_meta( $post_ID, $cachekey, $html );
if( $updated_cachekey ) {
update_post_meta( $post_ID, $cachekey_time, time() );
} else {
update_post_meta( $post_ID, $cachekey, '{{unknown}}' );
}
} elseif ( ! $cache ) {
update_post_meta( $post_ID, $cachekey, '{{unknown}}' );
}
}}}
where we encode the Emojis when the meta_value column's charset is utf8.
The method/function checks would be needed since they are only available
in WP 4.2+.
This should give us the cached content instead of only the link.
PS: For the above case the Query Monitor plugin reports 4 HTTP GET
requests for a single instagram oEmbed url, that contains an Emoji.
Example:
For https://www.instagram.com/p/_cj9NXvhrh/ they are:
{{{
https://api.instagram.com/oembed
?maxwidth=640
&maxheight=960
&url=https%3A%2F%2Fwww.instagram.com%2Fp%2F_cj9NXvhrh%2F
&format=json
https://api.instagram.com/publicapi/oembed/
?maxwidth=640
&maxheight=960
&url=https%3A%2F%2Fwww.instagram.com%2Fp%2F_cj9NXvhrh%2F
&format=json
https://www.instagram.com/publicapi/oembed/
?maxwidth=640
&maxheight=960
&url=https://www.instagram.com/p/_cj9NXvhrh/
&format=json
https://api.instagram.com/oembed/
?maxwidth=640
&maxheight=960
&url=https://www.instagram.com/p/_cj9NXvhrh/
&format=json
}}}
I'm not sure why but I only get a single request for a Twitter oEmbed
containing Emojis.
PPS: Sorry for the long post.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/36456>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list