[wp-trac] [WordPress Trac] #22742: Fix feed inefficiencies - Possibly Extend Etag/Last-Modified/Conditional Get functionality

WordPress Trac noreply at wordpress.org
Tue Dec 4 21:36:32 UTC 2012


#22742: Fix feed inefficiencies - Possibly Extend Etag/Last-Modified/Conditional
Get functionality
-------------------------+------------------------------
 Reporter:  brokentone   |       Owner:
     Type:  enhancement  |      Status:  new
 Priority:  normal       |   Milestone:  Awaiting Review
Component:  Performance  |     Version:  3.4.2
 Severity:  normal       |  Resolution:
 Keywords:               |
-------------------------+------------------------------
Description changed by SergeyBiryukov:

Old description:

> '''First the story.'''
>
> Attempting to generate what are effectively feeds on our WP site, we've
> been using the WP rewrite functions in a page/template context for some
> time, but recently it's made more sense to use the feed context for this.
> It seems as if there is some liability to using pages--requires a page to
> be created and not  changed by non-dev users and requires a coordinated
> database and code solution to create. And it introduces some
> complexity/mess with creating template files for feed logic/display, and
> again, these dummy pages have to exist. Finally--assigning this feed to a
> page required the WP URL canonical functionality to occur, so I couldn't
> use "feed.json", I had to use "feed.json/" which is just not cool. An
> entirely code-based solution that effectively rewrites a URL pattern to a
> function is perfect--which is effectively what the feed context is.
>
> As an aside--for my purposes could I use actual Apache rewrites? Well
> sure, but I am using WP functions to retrieve the data for my feed--in
> which case I would have to separately bootstrap my function, and I would
> assume incur fairly similar performance implications. Also, adding a feed
> still feels like something that should be code-addressable to me and
> rolling rewrites to all of our production web servers isn't necessarily
> fun. I'll be benchmarking this and get back to you though.
>
> In my page->feed conversion, I was curious what the performance
> implications were. I figured with less files and functions (related to
> the theme layer) involved, it could only be better using a feed. Not so,
> it was worse at about 5x to run the same feed creating logic. I sat down
> with xdebug and query logging to figure out what was going on and found
> that there were a few extra queries being run on the feed side:
>
> SELECT post_modified_gmt FROM wp_posts WHERE post_status = 'publish' AND
> post_type IN ('post', 'page', 'attachment', 'story') ORDER BY
> post_modified_gmt DESC LIMIT 1;
> SELECT post_date_gmt FROM wp_posts WHERE post_status = 'publish' AND
> post_type IN ('post', 'page', 'attachment', 'story') ORDER BY
> post_date_gmt DESC LIMIT 1;
> SELECT SQL_CALC_FOUND_ROWS  wp_posts.ID FROM wp_posts  WHERE 1=1  AND
> wp_posts.post_type = 'post' AND (wp_posts.post_status = 'publish' OR
> wp_posts.post_status = 'private')  ORDER BY wp_posts.post_date DESC LIMIT
> 0, 50;
> SELECT FOUND_ROWS();
>
> These are coming from a branch in /wp-includes/class-wp.php
> WP::send_headers(). It's doing very basic logic to determine via the
> query string if we expect this request to be a feed, then it uses either
> ''get_lastcommentmodified()'' or ''get_lastpostmodified()'' and uses the
> result of that to set and send some headers for last modified and etag,
> and evaluate request values for conditional get (304) responses.
>

> '''The problem:'''
>
> For my application--creating arbitrary feeds actually based on WP options
> --the last time any post or comment was updated/added has no correlation
> to my feed. It may not be updated at the point a post is updated, but it
> would show as being updated--more dangerously, it would show as not being
> updated and return a 304 if a post hasn't been updated. Also, it's
> incurring overhead doing this lookup that has absolutely no value.
>
> How about the most common WP core uses? I'm thinking it's still going to
> result in a lot of false positives (saying it's updated more recently
> than it really is) that certainly is going to depend on number of post
> types, but even with say a default install with pages and posts, any time
> a page is updated (which is not in any default feed), it will be
> affecting the default posts feed.
>
> There is no hook to allow for the disabling or changing of this logic.
> And, there is no way to apply some form of this logic to heavier page
> generations that might be kind of nice.
>
> '''The options:'''
>
> 1. Add a hook to this send headers function that would allow it to be
> fully disabled for certain feeds.
> 2. Move the functionality to a discrete function that can be passed the
> actual page/feed updated time and it can then send the proper headers.
> Then the default feed generation functions can call back to this
> functionality when they know the actual updated dates.
>
> I actually favor the latter--this is kinda nice functionality that is
> trapped inside a less useful branch and is "stuck on" even when it's not
> helpful. I would be happy to help write a patch for some of this if some
> core people can jump on here and provide some direction and blessing.
>
> '''I've noticed a little discussion on this issue elsewhere:'''
>
> http://core.trac.wordpress.org/ticket/19466 - This person is attempting
> to apply the existing logic--page/comment last modified across every
> request which would be a massive mistake. Assuming that's why there has
> been no response.
>
> http://core.trac.wordpress.org/ticket/15499 - This person is suggesting
> an index to get the results quicker--that could be beneficial
> additionally, but perhaps my modifications could reduce the need for such
> an index.

New description:

 '''First the story.'''

 Attempting to generate what are effectively feeds on our WP site, we've
 been using the WP rewrite functions in a page/template context for some
 time, but recently it's made more sense to use the feed context for this.
 It seems as if there is some liability to using pages--requires a page to
 be created and not  changed by non-dev users and requires a coordinated
 database and code solution to create. And it introduces some
 complexity/mess with creating template files for feed logic/display, and
 again, these dummy pages have to exist. Finally--assigning this feed to a
 page required the WP URL canonical functionality to occur, so I couldn't
 use "feed.json", I had to use "feed.json/" which is just not cool. An
 entirely code-based solution that effectively rewrites a URL pattern to a
 function is perfect--which is effectively what the feed context is.

 As an aside--for my purposes could I use actual Apache rewrites? Well
 sure, but I am using WP functions to retrieve the data for my feed--in
 which case I would have to separately bootstrap my function, and I would
 assume incur fairly similar performance implications. Also, adding a feed
 still feels like something that should be code-addressable to me and
 rolling rewrites to all of our production web servers isn't necessarily
 fun. I'll be benchmarking this and get back to you though.

 In my page->feed conversion, I was curious what the performance
 implications were. I figured with less files and functions (related to the
 theme layer) involved, it could only be better using a feed. Not so, it
 was worse at about 5x to run the same feed creating logic. I sat down with
 xdebug and query logging to figure out what was going on and found that
 there were a few extra queries being run on the feed side:
 {{{
 SELECT post_modified_gmt FROM wp_posts WHERE post_status = 'publish' AND
 post_type IN ('post', 'page', 'attachment', 'story') ORDER BY
 post_modified_gmt DESC LIMIT 1;
 SELECT post_date_gmt FROM wp_posts WHERE post_status = 'publish' AND
 post_type IN ('post', 'page', 'attachment', 'story') ORDER BY
 post_date_gmt DESC LIMIT 1;
 SELECT SQL_CALC_FOUND_ROWS  wp_posts.ID FROM wp_posts  WHERE 1=1  AND
 wp_posts.post_type = 'post' AND (wp_posts.post_status = 'publish' OR
 wp_posts.post_status = 'private')  ORDER BY wp_posts.post_date DESC LIMIT
 0, 50;
 SELECT FOUND_ROWS();
 }}}
 These are coming from a branch in /wp-includes/class-wp.php
 WP::send_headers(). It's doing very basic logic to determine via the query
 string if we expect this request to be a feed, then it uses either
 ''get_lastcommentmodified()'' or ''get_lastpostmodified()'' and uses the
 result of that to set and send some headers for last modified and etag,
 and evaluate request values for conditional get (304) responses.


 '''The problem:'''

 For my application--creating arbitrary feeds actually based on WP options
 --the last time any post or comment was updated/added has no correlation
 to my feed. It may not be updated at the point a post is updated, but it
 would show as being updated--more dangerously, it would show as not being
 updated and return a 304 if a post hasn't been updated. Also, it's
 incurring overhead doing this lookup that has absolutely no value.

 How about the most common WP core uses? I'm thinking it's still going to
 result in a lot of false positives (saying it's updated more recently than
 it really is) that certainly is going to depend on number of post types,
 but even with say a default install with pages and posts, any time a page
 is updated (which is not in any default feed), it will be affecting the
 default posts feed.

 There is no hook to allow for the disabling or changing of this logic.
 And, there is no way to apply some form of this logic to heavier page
 generations that might be kind of nice.

 '''The options:'''

 1. Add a hook to this send headers function that would allow it to be
 fully disabled for certain feeds.
 2. Move the functionality to a discrete function that can be passed the
 actual page/feed updated time and it can then send the proper headers.
 Then the default feed generation functions can call back to this
 functionality when they know the actual updated dates.

 I actually favor the latter--this is kinda nice functionality that is
 trapped inside a less useful branch and is "stuck on" even when it's not
 helpful. I would be happy to help write a patch for some of this if some
 core people can jump on here and provide some direction and blessing.

 '''I've noticed a little discussion on this issue elsewhere:'''

 #19466 - This person is attempting to apply the existing logic--
 page/comment last modified across every request which would be a massive
 mistake. Assuming that's why there has been no response.

 #15499 - This person is suggesting an index to get the results quicker--
 that could be beneficial additionally, but perhaps my modifications could
 reduce the need for such an index.

--

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/22742#comment:1>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list