[wp-trac] [WordPress Trac] #16893: Stop or reduce crawling of comment reply ?replytocom URLs

WordPress Trac wp-trac at lists.automattic.com
Wed Apr 6 07:09:25 UTC 2011


#16893: Stop or reduce crawling of comment reply ?replytocom URLs
------------------------------------+------------------------------
 Reporter:  joelhardi               |       Owner:
     Type:  enhancement             |      Status:  new
 Priority:  normal                  |   Milestone:  Awaiting Review
Component:  Comments                |     Version:  3.1
 Severity:  normal                  |  Resolution:
 Keywords:  has-patch dev-feedback  |
------------------------------------+------------------------------

Comment (by joelhardi):

 Reporting back on my running of attachment:general-template.php.17522.diff
 (which adds the robots noindex,nofollow meta tag to ?replytocom URLs) on 2
 live sites for the past couple of weeks since this ticket was added.

 It's worked as well as (or better than) I expected and I'd recommend
 adding this functionality to a future release.

 ?replytocom pages have not been indexed by Google and there's been no
 increase in googlebot crawling of these sites (previously I'd had
 robots.txt block access to these URLs). So, even the hypothesis about
 googlebot intelligently not trying to recrawl these URLs once it
 encounters the meta tag has borne out.

 Also, in Google Webmaster Tools there's a "crawl errors" section which
 normally lists URLs blocked by robots.txt. These URLs aren't included (in
 fact they don't show up anywhere in Webmaster Tools) since they're blocked
 by the meta tag. So, the end-user goal of users not having these URLs
 litter their screen when they log into Webmaster Tools is also achieved. I
 think this is a good improvement to quiet the complaining on the other
 thread about Google now crawling these pages since the rel="nofollow"
 attrib was dropped from <a> tags, and don't see any potential downsides.

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/16893#comment:3>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list