[wp-trac] [WordPress Trac] #23070: Generated robots.txt file with privacy on should have Allow: /robots.txt
WordPress Trac
noreply at wordpress.org
Tue Jan 1 01:51:02 UTC 2013
#23070: Generated robots.txt file with privacy on should have Allow: /robots.txt
-------------------------+------------------------------
Reporter: iamfriendly | Owner:
Type: enhancement | Status: new
Priority: normal | Milestone: Awaiting Review
Component: General | Version:
Severity: normal | Resolution:
Keywords: |
-------------------------+------------------------------
Comment (by dd32):
The addition of `Allow: /robots.txt` shouldn't significantly improve this
situation, but it does look like it'll work around a bug in the Google
Crawler.
Crawlers are supposed to ignore robots.txt for robots.txt access, however,
they DO cache it for long periods of time (7 days is recommended from
memory), but ARE supposed to verify the contents before using cached
content.
Since it doesn't look like we're sending any cache control headers, it's
up to the client (google in this case) to manage the caching of the
files.. I'd seriously suggest that this is more of a Google bug that they
should fix, rather than something we should change.
To refer to a [http://www.robotstxt.org/norobots-rfc.txt draft
specification]:
{{{
3.4 Expiration
Robots should cache /robots.txt files, but if they do they must
periodically verify the cached copy is fresh before using its
contents.
Standard HTTP cache-control mechanisms can be used by both origin
server and robots to influence the caching of the /robots.txt file.
Specifically robots should take note of Expires header set by the
origin server.
If no cache-control directives are present robots should default to
an expiry of 7 days.
}}}
--
Ticket URL: <http://core.trac.wordpress.org/ticket/23070#comment:11>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list