[wp-hackers] Escaping post meta values
dan at phiffer.org
Wed May 22 21:43:16 UTC 2013
On May 22, 2013, at 4:52 PM, Otto <otto at ottodestruct.com> wrote:
> On Wed, May 22, 2013 at 3:19 PM, Dan Phiffer <dan at phiffer.org> wrote:
>> But I'm running into a new problem that when I pass objects straight into update_post_meta() it seems that whenever the data structure includes Emoji characters it results in a postmeta string that comes out empty from get_post_meta(). Is there a known workaround for this?
> The json_encode function does have the advantage that it encodes UTF-8
> characters into escape sequences and back, while serialize does not.
> The serialize/unserialize functions also are non-forgiving of
> malformed data, while json is quite forgiving. And depending on the
> underlying MySQL version, character set, etc, it's possible that the
> data being stored gets munged up by MySQL and thus doesn't get stored
> properly (or undergoes a character set conversion), and so when you
> get it back, the unserialize fails, and you get nothing.
From my Sequel Pro connection the wp_postmeta table info shows up as 'utf8' encoding. I've heard of this problem, and I recall there are several potential points of failure where the encoding might be corrupted in the MySQL pipeline. The fact that json_encode is handling UTF-8 properly is a pretty strong advantage for me, so I think double-escaping the JSON encoding wins the day, despite its hackiness.
> This is indeed a problem. There is not a "good" workaround that I know
> of. You could try using iconv() to convert the problem data to a
> different character set before passing it to the meta functions.
> Converting problem strings from UTF-8 to ISO-8859-1 has worked for me
> in the past.
Ah, but the conversion process would likely strip out the good stuff. The Emoji are to be preserved! 😎
For what it's worth, here are the various incoming data I'm working with:
More information about the wp-hackers