[wp-meta] [Making WordPress.org] #6763: Update robots.txt (and rosetta variations)
Making WordPress.org
noreply at wordpress.org
Fri Feb 17 04:02:28 UTC 2023
#6763: Update robots.txt (and rosetta variations)
-----------------------------+---------------------
Reporter: jonoaldersonwp | Owner: (none)
Type: enhancement | Status: new
Priority: low | Milestone:
Component: General | Resolution:
Keywords: seo performance |
-----------------------------+---------------------
Description changed by jonoaldersonwp:
Old description:
> The [[https://wordpress.org/robots.txt|robots.txt]] file could be
> tightened up in order to prevent unnecessary crawling, which could have
> significant SEO and performance/efficiency benefits.
>
> Additionally, we apply rules inconsistently across different rosetta
> subdomains; we should probably standardize these!
>
> I'm looking at wordpress.org/robots.txt as a starting point for
> standardization; I'd suggest:
> - Remove the various wp-admin 'allow' rules (e.g.,
> wordpress.org/robots.txt)
> - Combine the remaining disallow rules
> - Tweak the 'search' disallow rule to add a trailing slash
> - Moving sitemap references to the end
> - Disallow `/*/wp-json/` (which is crawled by Google upwards of 40k times
> per day!)
> - Disallow subfolder variations of some rules so that they catch subsites
> (e.g., `/*/wp-admin/`)
> - Disallow 'non-pretty' variations (e.g., `?rest_route=`)
> - Add some inline comments
>
> That gets us to the following:
>
> {{{
>
> # Prevent crawling of WP internals
> # --------------------------------
> User-agent: *
> Disallow: /wp-admin/
> Disallow: /*/wp-admin/
> Disallow: /wp-includes/
> Disallow: /*/wp-includes/
> Disallow: /wp-json/
> Disallow: /*/wp-json/
> Disallow: /?rest_route=
> Disallow: /xmlrpc.php
>
> # Prevent crawling of search URLs
> # --------------------------------
> User-agent: *
> Disallow: /search/
> Disallow: /*/search/
> Disallow: /?s=
> Disallow: /*/?s=
>
> # Sitemaps
> # --------------------------------
> Sitemap: https://wordpress.org/sitemap.xml
> Sitemap: https://wordpress.org/news-sitemap.xml
> Sitemap: https://wordpress.org/themes/sitemap.xml
> Sitemap: https://wordpress.org/plugins/sitemap.xml
> Sitemap: https://wordpress.org/news/sitemap.xml
> Sitemap: https://wordpress.org/showcase/sitemap.xml
> }}}
>
> Note that there are problems with some of these XML sitemaps and that the
> final list of URLs might change (pending a subsequent ticket).
New description:
The [[https://wordpress.org/robots.txt|robots.txt]] file could be
tightened up in order to prevent unnecessary crawling, which could have
significant SEO and performance/efficiency benefits.
Additionally, we apply rules inconsistently across different rosetta
subdomains; we should probably standardize these!
I'm looking at wordpress.org/robots.txt as a starting point for
standardization; I'd suggest:
- Remove the various wp-admin 'allow' rules (e.g.,
wordpress.org/robots.txt)
- Combine the remaining disallow rules
- Tweak the 'search' disallow rule to add a trailing slash
- Moving sitemap references to the end
- Disallow `/*/wp-json/` (which is crawled by Google upwards of 40k times
per day!)
- Disallow subfolder variations of some rules so that they catch subsites
(e.g., `/*/wp-admin/`)
- Disallow 'non-pretty' variations (e.g., `?rest_route=`)
- Add some inline comments
That gets us to the following:
{{{
# Prevent crawling of WP internals
# --------------------------------
User-agent: *
Disallow: /wp-admin/
Disallow: /*/wp-admin/
Disallow: /wp-includes/
Disallow: /*/wp-includes/
Disallow: /wp-json/
Disallow: /*/wp-json/
Disallow: /?rest_route=
Disallow: /xmlrpc.php
# Prevent crawling of search URLs
# --------------------------------
User-agent: *
Disallow: /search/
Disallow: /*/search/
Disallow: /?s=
Disallow: /*/?s=
# Sitemaps
# --------------------------------
Sitemap: https://wordpress.org/sitemap.xml
Sitemap: https://wordpress.org/news-sitemap.xml
Sitemap: https://wordpress.org/themes/sitemap.xml
Sitemap: https://wordpress.org/plugins/sitemap.xml
Sitemap: https://wordpress.org/news/sitemap.xml
Sitemap: https://wordpress.org/showcase/sitemap.xml
}}}
--
--
Ticket URL: <https://meta.trac.wordpress.org/ticket/6763#comment:1>
Making WordPress.org <https://meta.trac.wordpress.org/>
Making WordPress.org
More information about the wp-meta
mailing list