[wp-trac] [WordPress Trac] #14069: do_robots() ignores charset setting

WordPress Trac wp-trac at lists.automattic.com
Tue Jul 20 03:15:44 UTC 2010


#14069: do_robots() ignores charset setting
--------------------------+-------------------------------------------------
 Reporter:  hakre         |       Owner:                
     Type:  defect (bug)  |      Status:  new           
 Priority:  normal        |   Milestone:  Future Release
Component:  Charset       |     Version:                
 Severity:  normal        |    Keywords:  has-patch     
--------------------------+-------------------------------------------------

Comment(by hakre):

 I'm not the character queen either. Historically the safest route should
 be US-ASCII (or 7bit ascii). If robots.txt would support the encoding like
 we have it with URLs, then the robots.txt file can be 100% US-ASCII
 endocded, and the content it transport can be an urlencoded representation
 of any other character set then (which would not make much sense, becuase
 how should a robot determine a charset then?).

 To make a longer story short, the charset meta-information as provided by
 the headers must match with the body encoding of the robots.txt file
 server response. The suggestion from that bing website can be useful but
 should not matter here. In the end a blogs admin decides which charset a
 blog uses. That's the charset robots.txt is encoded in as well. If it's
 incompatible with robots, then it's the admins choice.

 Blogs should be either US-ASCII or UTF-8 btw. You can (but must not) use
 latin-1 for historical or performance reasons. This is how I would
 formulate a best practice suggestion.

 Related: #14201

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/14069#comment:4>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list