[wp-trac] [WordPress Trac] #14069: do_robots() ignores charset setting

WordPress Trac wp-trac at lists.automattic.com
Tue Jul 20 15:49:26 UTC 2010


#14069: do_robots() ignores charset setting
--------------------------+-------------------------------------------------
 Reporter:  hakre         |       Owner:                
     Type:  defect (bug)  |      Status:  new           
 Priority:  normal        |   Milestone:  Future Release
Component:  Charset       |     Version:                
 Severity:  normal        |    Keywords:  has-patch     
--------------------------+-------------------------------------------------

Comment(by hakre):

 More important then the actual encoding of the file (okay, if you want
 that any robot reads it, make it ASCII - period.) is the encoding of the
 relative URLs used inside the file.

 Those should be properly urlencoded.

 I made a write-up here: [http://hakre.wordpress.com/2010/07/20/encoding-
 of-the-robots-txt-file/ Encoding of the robots.txt file] and the resource
 ocen90 linked has this useful information as well:

 > [M]ake sure the bots can properly read the file and directory path
 names, regardless of whether it adheres to ASCII standards. When writing
 directives that include characters unavailable in ASCII, you can "escape"
 (aka percent-encode) them, which enables the bot to read them.

 I think this is mostly important for webmasters who really want to care
 about these issues. My suggestion is to deliver the file in US-ASCII. It
 then can even mislabled as UTF-8 or Latin-1 as w/o running into any
 problems as long as the rules created by other parts of the webapplication
 are correctly urlencoded.

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/14069#comment:5>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list