[wp-trac] [WordPress Trac] #57627: The Cache-Control header for logged-in pages should include `private`

WordPress Trac noreply at wordpress.org
Fri Feb 3 16:49:29 UTC 2023


#57627: The Cache-Control header for logged-in pages should include `private`
--------------------------+-----------------------------
 Reporter:  markdoliner   |      Owner:  (none)
     Type:  defect (bug)  |     Status:  new
 Priority:  normal        |  Milestone:  Awaiting Review
Component:  General       |    Version:
 Severity:  normal        |   Keywords:
  Focuses:                |
--------------------------+-----------------------------
 I believe WordPress returns the following Cache-Control header for pages
 that are rendered for logged-in users:

 {{{
 Cache-Control: no-cache, must-revalidate, max-age=0
 }}}

 I think the relevant code is
 [https://build.trac.wordpress.org/browser/tags/6.1.1/wp-includes/class-
 wp.php#L424 here] and [https://build.trac.wordpress.org/browser/tags/6.1.1
 /wp-includes/functions.php#L1485 here].

 For pages for logged-in users I believe this header should be modified to
 include the `private` directive to indicate that the response should not
 be cached by intermediary shared cache servers.

 The change should not be made everywhere `nocache_headers()` is used--only
 for responses that vary based on the logged-in user. And maybe also for
 users who have recently left a comment (#16612 is related), though it
 seems like this is hard for the server to know reliably. You could key off
 the presence of one of the `comment_author_*` cookies but those aren't
 always set.

 ==== The Meanings of `no-cache` and `private`

 You might think that `no-cache` would be sufficient to accomplish this,
 but it's not. It's a bit confusing but `no-cache` means "this response may
 be stored in a cache but it must be revalidated before it is used." And so
 I believe that shared cache servers are allowed to cache pages rendered
 for logged-in users.

 I've found [https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching
 MDN's caching guide] to be helpful while trying to understand the meaning
 of the various directives. The Private Caches section says, "If a response
 contains personalized content and you want to store the response only in
 the private cache, you must specify a `private` directive." It's
 reiterated in the "Do Not Share With Others" section under "Don't Cache."
 And MDN's [https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers
 /Cache-Control Cache-Control header reference] contains a similar
 statement.

 ==== What's the Harm?

 And of course the risk isn't just that the page is ''cached'' on a shared
 server, but that it's served to a user other than the logged-in user.
 Thankfully I think the risk is minimal for a few reasons:

 1. `no-cache` means the cache will attempt to revalidate the page before
 using it. I believe revalidation is not possible by default because
 WordPress does not set the ETag or Last-Modified header for these
 responses. Though this isn't a guarantee: Someone could configure their
 web server or a caching reverse proxy server to set the headers and return
 HTTP 304 if appropriate. Or a plugin could do these things. The WP Super
 Cache plugin even has options for "304 Browser caching" and "Enable
 caching for all visitors" (even logged-in visitors), though I couldn't get
 it to serve a logged-in page to a non-logged-in user so it looks like it's
 clever enough to use different cached data based on the user's cookie (I
 see that `Cookie` is added to the Vary header), so that's great.

 2. When used as caching reverse proxies Nginx and Varnish appear to not
 cache responses if the Cache-Control header includes `no-cache`, so they
 won't cache pages for logged-in users. For Nginx I think it's
 [https://github.com/nginx/nginx/blob/dad65f3e449f215469943628f2b1f12a118fcf7e/src/http/ngx_http_upstream.c#L4811
 this logic]. For Varnish I think it's [https://github.com/varnishcache
 /varnish-
 cache/blob/582ded6a2d6ae1a4467b1eb500f2725b42888016/bin/varnishd/builtin.vcl#L212-L240
 this logic]. I think they're ''allowed'' to cache these responses and it
 seems possible that they will in the future, but they don't currently. And
 as a counter example I believe Squid ''is'' willing to cache these
 responses ([https://wiki.squid-cache.org/SquidFaq/InnerWorkings#how-come-
 some-objects-do-not-get-cached this FAQ] is related but not super clear).

 3. I suspect shared cache servers are uncommon (thought I've made no
 attempt to find data about it).

 4. The number of https sites has increased greatly over time and shared
 cache servers can't cache objects served over https (unless they decrypt
 and reencrypt the data, which is mostly only possible in company-managed
 computers where the company is able to add their own signing certificate
 to the browser trust store).

 ==== So Why Should We Change It?

 While I think it's rare that the lack of `private` will cause harm,
 WordPress is widely used and there are many ways to configure cache-
 related headers. I'd guess there is a non-zero chance that this problem
 has surfaced at some point in time and so I feel that it's worth changing.
 The risk from adding the header feels low to me.

 I'll caveat this ticket by saying that I'm not intimately familiar with
 caching behavior. I've just been looking at it a lot over the last few
 days. It's entirely possible that I'm wrong about all of this.

 ==== Related Tickets

 - #16612 proposes using `nocache headers()` for requests with comment
 cookies. That seems appropriate to me, and also using `private`.
 - #21938 proposes adding `no-store` to the `nocache headers()` list. This
 is a separate consideration from the issue I'm raising above. I don't know
 whether it's a good proposal. There's a lot to think about there.
 - #22258, #23021, and #40444 dealt with removing Last-Modified from
 `nocache_headers()`.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/57627>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list