[wp-trac] [WordPress Trac] #29201: File versioning should not use query strings, but rename the filename to allow caching

WordPress Trac noreply at wordpress.org
Fri Jun 24 18:15:00 UTC 2016


#29201: File versioning should not use query strings, but rename the filename to
allow caching
---------------------------+--------------------------
 Reporter:  benoitchantre  |       Owner:
     Type:  enhancement    |      Status:  closed
 Priority:  normal         |   Milestone:
Component:  Script Loader  |     Version:  3.9.1
 Severity:  normal         |  Resolution:  wontfix
 Keywords:                 |     Focuses:  performance
---------------------------+--------------------------

Comment (by drzraf):

 > I'm not sure what problem you're trying to solve.

 - remove a workaround
 - simplify my software stack
 - remove corner cases/bugs

 Query-string appended to static files may causes (non-exhaustive list):
 - mime-type to be messed-up
 - impossibility to cache resource
 - multiple (too many) cached resource
 - log files/stats files inconsistencies (same file, multiple URL)
 - ...

 > > Refreshing downstream proxies is as easy as issuing a "Cache-Control"
 or "Pragma", and the webserver as well as the client:
 > > If a resource changes, just trigger
 > > {{{
 > > wp_remote_get($post_url, [ 'blocking'  => false 'headers'  => array
 ('Cache-Control' => 'no-cache') ]);
 > > }}}
 > > * fire a GET with `Cache-Control: no-cache`, or `Pragma: no-cache` (or
 both if you want) to said resource and forget (you can be sure the proxy
 will kindly refresh its cache)
 >
 > How does this help? There's no way to know what proxies there are
 between some arbitrary visitor and your site.  In any case, browsers will
 still have the old version cached.


 First, please note that changing resource location is '''not''' a cache
 invalidation.
 But you're right, and I was wrong in the above post: this will '''not'''
 refresh '''downstream''' proxies.
 It will refresh '''server-side''' proxies = any proxy between public-
 address and internal webserver IP (where most reverse-proxies lie).
 Although for clarity it could have been written:
 `Cache-Control: max-age=0, must-revalidate`

 Indeed you're mostly right here, it does not force refresh of down-stream
 proxies or user-agent caches.

 This is something the user alone (= the HTML webpage) can do, we may want
 to split both discussions (downstream proxies / reverse-proxies) to avoid
 confusion.


 About user-agent cache refresh, there are alternatives (worth noting we
 are working around buggy server-side imposed Expire header):
 For example [https://developer.mozilla.org/fr/docs/Web/API/Location/reload
 window.reload(True)]
 which is likely the same as sending XHR + `Cache-Control: max-age=0` for
 all enqueued assets.
 As a HTML-inlined javascript it will refresh UA cache and intermediary
 proxies.

 This is better than query-string '''but''' ask the very interesting
 question:
 ''When is it right to triggering the cache-refresh routine (= how does the
 WordPress application, when asked to generated HTML, knows whether the
 user-agent uses an old version of a static file or not)?''


 > To restate what's been mentioned above, a pretty common setup is for the
 webserver to issue long expiry times (let's say 1 year).

 That's the moot-point, and what needs a better definition before going
 effectively forward:
 - how common is it?
 - does it represents the majority of cache-enabled webservers?
 - which OS, distributions, hosting services are known to distribute such a
 setup?

 It must be added that we '''do have''' control over assets (including
 caching options) if we want to, it's just a matters of RewriteRules and
 [https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.3 Cache-
 Control headers]:
 > If a response includes both an Expires header and a max-age directive,
 the max-age directive overrides the Expires header.

 If buggy webservers are not common enough or if there do not represent the
 majority of cache-enabled webservers it would fair to assume the
 workaround (and performance cost it implies) must impact them rather than
 well-behaving proxies. (as a bonus it gives incentive for implementing
 cache right)


 > There's no way for the server to clear unknown proxy caches.

 That right *but* they are cached because WordPress asked it in the first
 place, and WordPress decides about this using
 [https://tools.ietf.org/html/rfc2616#section-14.9 headers].
 Of course in some setup the webserver may try to bypass the application,
 but using RewriteRules it's easy to regain full-control.

 > There's no way for the server to clear the browser cache. Instead, the
 server tells the browser to request a new resource (by altering this query
 string parameter).

 That's somehow the point of caching and a point for not playing with
 `Cache-*` or `Expires` HTTP headers.


 > Plugins can, of course, change the core behavior.  For example, I know
 some sites remove the `?ver=` parameter and replace it with a query string
 parameter that corresponds to the `mtime` of the file. This mtime method
 is more robust but is hard to implement correctly on sites served by many
 webservers. It also may be non-performant on many hosts.  Core's `?ver=`
 method is good compromise that works pretty well most places.


 Are `Expires +1 years` so frequents that they need this compromise into WP
 core?
 It would be easy for a plugin to introduce one of the various workarounds
 for their specific problem?
 I bet these `?ver=` are not needed for 100% of non-cached WP instances,
 and not needed for 80% of cached WP instances because their (Apache?)
 webserver is configured correctly.


 > > What/Who may configure *by mistake* WP to set a too-large expire time?
 >
 > I don't think this long cache expiry choice is a mistake.

 See [https://tools.ietf.org/html/rfc2616#section-14.21 this]:
 > To mark a response as "never expires," an origin server sends an Expires
 date approximately one year from the time the response is sent. HTTP/1.1
 servers SHOULD NOT send Expires dates more than one year in the future.

 This I'm pretty sure +1y asset caching is mostly a mistake but it's bound
 to "Unique Resource Locator" definition/interpretation and related RFCs.
 RFC2616 terms do not imply a widespread use of such a caching policy.

 Paraphrasing this, it's saying to the UA ''Assume that the webpage will
 point you to the newer resource.''
 (people caching WP front-page some minutes or some hours breaks the
 assumption that HTML page is the (only) way to refresh the assets)


 It's all about website visitor patterns and website assets upgrade
 transitions (and also about whether HTML output itself is cached or not).

 The query-string method is a way to keep webpage and assets in sync' in a
 cache-enabled context and thus avoid this kind of questions:
 - do we accept old CSS for an NEW page?
 - do we accept new CSS for an OLD page?

 In one hand it implies that `jquery.js?ver=1.2.3` will be universally OK,
 but on the other one non-suffixed version `jquery.js` will be inconsistent
 (according to my place in the network I would be given a different
 resource).

 The logical implication of a +1y `Expires` for WP would to explicitly put
 versioning inside the filenames, ex: `jquery-1.2.3.js` rather than using
 query-string.
 But please leave that to "long-expires" webservers (or those, like
 [https://developer.yahoo.com/performance/rules.html#expires Yahoo!] ones
 who are ready to deal with the side-effects it induces)

--
Ticket URL: <https://core.trac.wordpress.org/ticket/29201#comment:13>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list