[wp-trac] [WordPress Trac] #22742: Fix feed inefficiencies - Possibly Extend Etag/Last-Modified/Conditional Get functionality
WordPress Trac
noreply at wordpress.org
Tue Dec 4 21:25:21 UTC 2012
#22742: Fix feed inefficiencies - Possibly Extend Etag/Last-Modified/Conditional
Get functionality
-----------------------------+-------------------------
Reporter: brokentone | Type: enhancement
Status: new | Priority: normal
Milestone: Awaiting Review | Component: Performance
Version: 3.4.2 | Severity: normal
Keywords: |
-----------------------------+-------------------------
'''First the story.'''
Attempting to generate what are effectively feeds on our WP site, we've
been using the WP rewrite functions in a page/template context for some
time, but recently it's made more sense to use the feed context for this.
It seems as if there is some liability to using pages--requires a page to
be created and not changed by non-dev users and requires a coordinated
database and code solution to create. And it introduces some
complexity/mess with creating template files for feed logic/display, and
again, these dummy pages have to exist. Finally--assigning this feed to a
page required the WP URL canonical functionality to occur, so I couldn't
use "feed.json", I had to use "feed.json/" which is just not cool. An
entirely code-based solution that effectively rewrites a URL pattern to a
function is perfect--which is effectively what the feed context is.
As an aside--for my purposes could I use actual Apache rewrites? Well
sure, but I am using WP functions to retrieve the data for my feed--in
which case I would have to separately bootstrap my function, and I would
assume incur fairly similar performance implications. Also, adding a feed
still feels like something that should be code-addressable to me and
rolling rewrites to all of our production web servers isn't necessarily
fun. I'll be benchmarking this and get back to you though.
In my page->feed conversion, I was curious what the performance
implications were. I figured with less files and functions (related to the
theme layer) involved, it could only be better using a feed. Not so, it
was worse at about 5x to run the same feed creating logic. I sat down with
xdebug and query logging to figure out what was going on and found that
there were a few extra queries being run on the feed side:
SELECT post_modified_gmt FROM wp_posts WHERE post_status = 'publish' AND
post_type IN ('post', 'page', 'attachment', 'story') ORDER BY
post_modified_gmt DESC LIMIT 1;
SELECT post_date_gmt FROM wp_posts WHERE post_status = 'publish' AND
post_type IN ('post', 'page', 'attachment', 'story') ORDER BY
post_date_gmt DESC LIMIT 1;
SELECT SQL_CALC_FOUND_ROWS wp_posts.ID FROM wp_posts WHERE 1=1 AND
wp_posts.post_type = 'post' AND (wp_posts.post_status = 'publish' OR
wp_posts.post_status = 'private') ORDER BY wp_posts.post_date DESC LIMIT
0, 50;
SELECT FOUND_ROWS();
These are coming from a branch in /wp-includes/class-wp.php
WP::send_headers(). It's doing very basic logic to determine via the query
string if we expect this request to be a feed, then it uses either
''get_lastcommentmodified()'' or ''get_lastpostmodified()'' and uses the
result of that to set and send some headers for last modified and etag,
and evaluate request values for conditional get (304) responses.
'''The problem:'''
For my application--creating arbitrary feeds actually based on WP options
--the last time any post or comment was updated/added has no correlation
to my feed. It may not be updated at the point a post is updated, but it
would show as being updated--more dangerously, it would show as not being
updated and return a 304 if a post hasn't been updated. Also, it's
incurring overhead doing this lookup that has absolutely no value.
How about the most common WP core uses? I'm thinking it's still going to
result in a lot of false positives (saying it's updated more recently than
it really is) that certainly is going to depend on number of post types,
but even with say a default install with pages and posts, any time a page
is updated (which is not in any default feed), it will be affecting the
default posts feed.
There is no hook to allow for the disabling or changing of this logic.
And, there is no way to apply some form of this logic to heavier page
generations that might be kind of nice.
'''The options:'''
1. Add a hook to this send headers function that would allow it to be
fully disabled for certain feeds.
2. Move the functionality to a discrete function that can be passed the
actual page/feed updated time and it can then send the proper headers.
Then the default feed generation functions can call back to this
functionality when they know the actual updated dates.
I actually favor the latter--this is kinda nice functionality that is
trapped inside a less useful branch and is "stuck on" even when it's not
helpful. I would be happy to help write a patch for some of this if some
core people can jump on here and provide some direction and blessing.
'''I've noticed a little discussion on this issue elsewhere:'''
http://core.trac.wordpress.org/ticket/19466 - This person is attempting to
apply the existing logic--page/comment last modified across every request
which would be a massive mistake. Assuming that's why there has been no
response.
http://core.trac.wordpress.org/ticket/15499 - This person is suggesting an
index to get the results quicker--that could be beneficial additionally,
but perhaps my modifications could reduce the need for such an index.
--
Ticket URL: <http://core.trac.wordpress.org/ticket/22742>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list