[wp-meta] [Making WordPress.org] #6763: Update robots.txt (and rosetta variations)

Making WordPress.org noreply at wordpress.org
Fri Feb 17 03:46:02 UTC 2023


#6763: Update robots.txt (and rosetta variations)
----------------------------+-----------------------------
 Reporter:  jonoaldersonwp  |      Owner:  (none)
     Type:  enhancement     |     Status:  new
 Priority:  low             |  Milestone:
Component:  General         |   Keywords:  seo performance
----------------------------+-----------------------------
 The [[https://wordpress.org/robots.txt|robots.txt]] file could be
 tightened up in order to prevent unnecessary crawling, which could have
 significant SEO and performance/efficiency benefits.

 Additionally, we apply rules inconsistently across different rosetta
 subdomains; we should probably standardize these!

 I'm looking at wordpress.org/robots.txt as a starting point for
 standardization; I'd suggest:
 - Remove the various wp-admin 'allow' rules (e.g.,
 wordpress.org/robots.txt)
 - Combine the remaining disallow rules
 - Tweak the 'search' disallow rule to add a trailing slash
 - Moving sitemap references to the end
 - Disallow `/*/wp-json/` (which is crawled by Google upwards of 40k times
 per day!)
 - Disallow subfolder variations of some rules so that they catch subsites
 (e.g., `/*/wp-admin/`)
 - Disallow 'non-pretty' variations (e.g., `?rest_route=`)
 - Add some inline comments

 That gets us to the following:

 {{{

 # Prevent crawling of WP internals
 # --------------------------------
 User-agent: *
 Disallow: /wp-admin/
 Disallow: /*/wp-admin/
 Disallow: /wp-includes/
 Disallow: /*/wp-includes/
 Disallow: /wp-json/
 Disallow: /*/wp-json/
 Disallow: /?rest_route=
 Disallow: /xmlrpc.php

 # Prevent crawling of search URLs
 # --------------------------------
 User-agent: *
 Disallow: /search/
 Disallow: /*/search/
 Disallow: /?s=
 Disallow: /*/?s=

 # Sitemaps
 # --------------------------------
 Sitemap: https://wordpress.org/sitemap.xml
 Sitemap: https://wordpress.org/news-sitemap.xml
 Sitemap: https://wordpress.org/themes/sitemap.xml
 Sitemap: https://wordpress.org/plugins/sitemap.xml
 Sitemap: https://wordpress.org/news/sitemap.xml
 Sitemap: https://wordpress.org/showcase/sitemap.xml
 }}}

 Note that there are problems with some of these XML sitemaps and that the
 final list of URLs might change (pending a subsequent ticket).

-- 
Ticket URL: <https://meta.trac.wordpress.org/ticket/6763>
Making WordPress.org <https://meta.trac.wordpress.org/>
Making WordPress.org


More information about the wp-meta mailing list