[wp-meta] [Making WordPress.org] #6763: Update robots.txt (and rosetta variations)

Making WordPress.org noreply at wordpress.org
Fri Feb 17 04:05:36 UTC 2023


#6763: Update robots.txt (and rosetta variations)
-----------------------------+---------------------
 Reporter:  jonoaldersonwp   |       Owner:  (none)
     Type:  enhancement      |      Status:  new
 Priority:  low              |   Milestone:
Component:  General          |  Resolution:
 Keywords:  seo performance  |
-----------------------------+---------------------
Description changed by jonoaldersonwp:

Old description:

> The [[https://wordpress.org/robots.txt|robots.txt]] file could be
> tightened up in order to prevent unnecessary crawling, which could have
> significant SEO and performance/efficiency benefits.
>
> Additionally, we apply rules inconsistently across different rosetta
> subdomains; we should probably standardize these!
>
> I'm looking at wordpress.org/robots.txt as a starting point for
> standardization; I'd suggest:
> - Remove the various wp-admin 'allow' rules (e.g.,
> wordpress.org/robots.txt)
> - Combine the remaining disallow rules
> - Tweak the 'search' disallow rule to add a trailing slash
> - Moving sitemap references to the end
> - Disallow `/*/wp-json/` (which is crawled by Google upwards of 40k times
> per day!)
> - Disallow subfolder variations of some rules so that they catch subsites
> (e.g., `/*/wp-admin/`)
> - Disallow 'non-pretty' variations (e.g., `?rest_route=`)
> - Add some inline comments
>
> That gets us to the following:
>
> {{{
>
> # Prevent crawling of WP internals
> # --------------------------------
> User-agent: *
> Disallow: /wp-admin/
> Disallow: /*/wp-admin/
> Disallow: /wp-includes/
> Disallow: /*/wp-includes/
> Disallow: /wp-json/
> Disallow: /*/wp-json/
> Disallow: /?rest_route=
> Disallow: /xmlrpc.php
>
> # Prevent crawling of search URLs
> # --------------------------------
> User-agent: *
> Disallow: /search/
> Disallow: /*/search/
> Disallow: /?s=
> Disallow: /*/?s=
>
> # Sitemaps
> # --------------------------------
> Sitemap: https://wordpress.org/sitemap.xml
> Sitemap: https://wordpress.org/news-sitemap.xml
> Sitemap: https://wordpress.org/themes/sitemap.xml
> Sitemap: https://wordpress.org/plugins/sitemap.xml
> Sitemap: https://wordpress.org/news/sitemap.xml
> Sitemap: https://wordpress.org/showcase/sitemap.xml
> }}}

New description:

 The [[https://wordpress.org/robots.txt|robots.txt]] file could be
 tightened up in order to prevent unnecessary crawling, which could have
 significant SEO and performance/efficiency benefits.

 Additionally, we apply rules inconsistently across different rosetta
 subdomains; we should probably standardize these!

 I'm looking at wordpress.org/robots.txt as a starting point for
 standardization; I'd suggest:
 - Remove the various wp-admin 'allow' rules (e.g.,
 wordpress.org/robots.txt)
 - Combine the remaining disallow rules
 - Tweak the 'search' disallow rule to add a trailing slash
 - Moving sitemap references to the end
 - Disallow `/*/wp-json/` (which is crawled by Google upwards of 40k times
 per day!)
 - Disallow subfolder variations of some rules so that they catch subsites
 (e.g., `/*/wp-admin/`)
 - Disallow 'non-pretty' variations (e.g., `?rest_route=`)
 - Add some inline comments

 That gets us to the following:

 {{{

 # Prevent crawling of WP internals
 # --------------------------------
 User-agent: *
 Disallow: /wp-admin/
 Disallow: /*/wp-admin/
 Disallow: /wp-json/
 Disallow: /*/wp-json/
 Disallow: /?rest_route=
 Disallow: /xmlrpc.php

 # Prevent crawling of search URLs
 # --------------------------------
 User-agent: *
 Disallow: /search/
 Disallow: /*/search/
 Disallow: /?s=
 Disallow: /*/?s=

 # Sitemaps
 # --------------------------------
 Sitemap: https://wordpress.org/sitemap.xml
 Sitemap: https://wordpress.org/news-sitemap.xml
 Sitemap: https://wordpress.org/themes/sitemap.xml
 Sitemap: https://wordpress.org/plugins/sitemap.xml
 Sitemap: https://wordpress.org/news/sitemap.xml
 Sitemap: https://wordpress.org/showcase/sitemap.xml
 }}}

--

-- 
Ticket URL: <https://meta.trac.wordpress.org/ticket/6763#comment:2>
Making WordPress.org <https://meta.trac.wordpress.org/>
Making WordPress.org


More information about the wp-meta mailing list