[wp-trac] [WordPress Trac] #23070: Generated robots.txt file with privacy on should have Allow: /robots.txt

Tue Jan 1 01:51:02 UTC 2013

#23070: Generated robots.txt file with privacy on should have Allow: /robots.txt
-------------------------+------------------------------
 Reporter:  iamfriendly  |       Owner:
     Type:  enhancement  |      Status:  new
 Priority:  normal       |   Milestone:  Awaiting Review
Component:  General      |     Version:
 Severity:  normal       |  Resolution:
 Keywords:               |
-------------------------+------------------------------

Comment (by dd32):

 The addition of `Allow: /robots.txt` shouldn't significantly improve this
 situation, but it does look like it'll work around a bug in the Google
 Crawler.

 Crawlers are supposed to ignore robots.txt for robots.txt access, however,
 they DO cache it for long periods of time (7 days is recommended from
 memory), but ARE supposed to verify the contents before using cached
 content.

 Since it doesn't look like we're sending any cache control headers, it's
 up to the client (google in this case) to manage the caching of the
 files.. I'd seriously suggest that this is more of a Google bug that they
 should fix, rather than something we should change.

 To refer to a [http://www.robotstxt.org/norobots-rfc.txt draft
 specification]:
 {{{
 3.4 Expiration
    Robots should cache /robots.txt files, but if they do they must
    periodically verify the cached copy is fresh before using its
    contents.

    Standard HTTP cache-control mechanisms can be used by both origin
    server and robots to influence the caching of the /robots.txt file.
    Specifically robots should take note of Expires header set by the
    origin server.

    If no cache-control directives are present robots should default to
    an expiry of 7 days.
 }}}

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/23070#comment:11>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software