[wp-trac] [WordPress Trac] #14069: do_robots() ignores charset setting
WordPress Trac
wp-trac at lists.automattic.com
Tue Jul 20 03:15:44 UTC 2010
#14069: do_robots() ignores charset setting
--------------------------+-------------------------------------------------
Reporter: hakre | Owner:
Type: defect (bug) | Status: new
Priority: normal | Milestone: Future Release
Component: Charset | Version:
Severity: normal | Keywords: has-patch
--------------------------+-------------------------------------------------
Comment(by hakre):
I'm not the character queen either. Historically the safest route should
be US-ASCII (or 7bit ascii). If robots.txt would support the encoding like
we have it with URLs, then the robots.txt file can be 100% US-ASCII
endocded, and the content it transport can be an urlencoded representation
of any other character set then (which would not make much sense, becuase
how should a robot determine a charset then?).
To make a longer story short, the charset meta-information as provided by
the headers must match with the body encoding of the robots.txt file
server response. The suggestion from that bing website can be useful but
should not matter here. In the end a blogs admin decides which charset a
blog uses. That's the charset robots.txt is encoded in as well. If it's
incompatible with robots, then it's the admins choice.
Blogs should be either US-ASCII or UTF-8 btw. You can (but must not) use
latin-1 for historical or performance reasons. This is how I would
formulate a best practice suggestion.
Related: #14201
--
Ticket URL: <http://core.trac.wordpress.org/ticket/14069#comment:4>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list