[wp-trac] [WordPress Trac] #17210: Massive duplication of oEmbed postmeta

WordPress Trac wp-trac at lists.automattic.com
Thu Apr 21 23:38:36 UTC 2011


#17210: Massive duplication of oEmbed postmeta
--------------------------+-----------------------------
 Reporter:  archon810     |      Owner:
     Type:  defect (bug)  |     Status:  new
 Priority:  normal        |  Milestone:  Awaiting Review
Component:  Embeds        |    Version:  3.1
 Severity:  normal        |   Keywords:
--------------------------+-----------------------------
 Hey guys,

 Ever since my blog grew to a considerable size (a few million PVs a month)
 and started slowing down and exploding my server, I've been looking and
 implementing various optimizations. During one such passes through the
 data, I noticed this really weird oEmbed related behavior, which I've been
 observing for a number of WP version upgrades.

 I use [embed] shortcodes a lot, and every new post after a few minutes
 ends up with a ton of oembed caches that don't belong to it at all -
 they're all from other posts. Posts that don't even have [embed]s at all
 still have over 100 oembed entries in wp_postmeta.

 Here's an example of just a small subset of data residing in the table:

 [[Image(http://farm6.static.flickr.com/5230/5641419581_0610c9e267_b.jpg)]]

 There are now about 150,000 entries in the wp_postmeta table due to this,
 half of which are duplicated _oembed entries, which I think has heavy
 impact on server load. Not only that but I'm sure WP is filling the table
 up with values by redoing oEmbed queries, which may explain that load
 shoots up very high at times when publishing.
 {{{
 mysql> select count(*) from wp_postmeta where meta_key like '_oembed%';
 +----------+
 | count(*) |
 +----------+
 |    81499 |
 +----------+
 1 row in set

 mysql> select count(*) from wp_postmeta;
 +----------+
 | count(*) |
 +----------+
 |   148451 |
 +----------+
 1 row in set
 }}}

 Just look at how many times this random video embed value shows up in the
 table. I'm sure it was used in only one actual post:
 {{{
 mysql> select count(*) from wp_postmeta where meta_value like '%p2oWELcd-
 lI%';
 +----------+
 | count(*) |
 +----------+
 |      815 |
 +----------+
 1 row in set
 }}}

 Just to clarify - I don't have 815 updates to a single post that may have
 explained this - these are completely unrelated, separate, published
 posts.

 To put things in perspective, here are the top 20 offenders:
 {{{
 mysql> select distinct meta_key, count(*) as cnt from wp_postmeta where
 meta_key like '_oembed%' group by meta_key order by cnt desc limit 20;
 +------------------------------------------+-----+
 | meta_key                                 | cnt |
 +------------------------------------------+-----+
 | _oembed_5607e41abb700707540a854ae76182cf | 864 |
 | _oembed_984bc07d3bc0f61b6b35230cd2fa7ced | 859 |
 | _oembed_da8ae36275b4576cfcd92c0ed455be96 | 859 |
 | _oembed_71dd4068a9a6911f50dbe57b3ff477c5 | 858 |
 | _oembed_9f817e820c23ccbfac9b22b3474e5dd3 | 858 |
 | _oembed_f3c1c03a81bc301b5f1a063f65119328 | 857 |
 | _oembed_31bf10d95cb7c8e9f646d9d6e5728da0 | 857 |
 | _oembed_25d0ebf59c994050cb604900cf04f53f | 856 |
 | _oembed_6265dae657e38579c0a8ddb66132d526 | 852 |
 | _oembed_562dd8c13888905cbd15dbd74e8699cc | 849 |
 | _oembed_30ea17d1cc73acd925a74373d2be32ec | 848 |
 | _oembed_87f16916b4da6571f454266bfbfaebe0 | 847 |
 | _oembed_9f1f038d43e973bd60929201eee24f57 | 843 |
 | _oembed_d46317d44fe11c0d90ef2cc3b45bce57 | 843 |
 | _oembed_b1f8685ba405feee46baf9408eb632f7 | 841 |
 | _oembed_1b56f492eba4c4ea698d816d0ecf2d51 | 840 |
 | _oembed_fe597714de4081e6e7e78a88256c7db4 | 840 |
 | _oembed_fb843e7b604cbc4e1ffa144d4eb300c8 | 839 |
 | _oembed_97b12f2f1e59ee6eff95c61095aa5bef | 838 |
 | _oembed_2b94d9f7c28ee37bfbead0a622c8be85 | 838 |
 +------------------------------------------+-----+
 20 rows in set
 }}}

 I'm quite at a loss here and would appreciate the next debugging steps. I
 haven't been able to determine where things are going wrong on my own.

 Thank you.

 P.S. The site in question is AndroidPolice.com

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/17210>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list