[wp-trac] [WordPress Trac] #14069: do_robots() ignores charset setting
WordPress Trac
wp-trac at lists.automattic.com
Tue Jul 20 15:49:26 UTC 2010
#14069: do_robots() ignores charset setting
--------------------------+-------------------------------------------------
Reporter: hakre | Owner:
Type: defect (bug) | Status: new
Priority: normal | Milestone: Future Release
Component: Charset | Version:
Severity: normal | Keywords: has-patch
--------------------------+-------------------------------------------------
Comment(by hakre):
More important then the actual encoding of the file (okay, if you want
that any robot reads it, make it ASCII - period.) is the encoding of the
relative URLs used inside the file.
Those should be properly urlencoded.
I made a write-up here: [http://hakre.wordpress.com/2010/07/20/encoding-
of-the-robots-txt-file/ Encoding of the robots.txt file] and the resource
ocen90 linked has this useful information as well:
> [M]ake sure the bots can properly read the file and directory path
names, regardless of whether it adheres to ASCII standards. When writing
directives that include characters unavailable in ASCII, you can "escape"
(aka percent-encode) them, which enables the bot to read them.
I think this is mostly important for webmasters who really want to care
about these issues. My suggestion is to deliver the file in US-ASCII. It
then can even mislabled as UTF-8 or Latin-1 as w/o running into any
problems as long as the rules created by other parts of the webapplication
are correctly urlencoded.
--
Ticket URL: <http://core.trac.wordpress.org/ticket/14069#comment:5>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list