[wp-meta] [Making WordPress.org] #6763: Update robots.txt (and rosetta variations)
Making WordPress.org
noreply at wordpress.org
Fri Feb 17 03:46:02 UTC 2023
#6763: Update robots.txt (and rosetta variations)
----------------------------+-----------------------------
Reporter: jonoaldersonwp | Owner: (none)
Type: enhancement | Status: new
Priority: low | Milestone:
Component: General | Keywords: seo performance
----------------------------+-----------------------------
The [[https://wordpress.org/robots.txt|robots.txt]] file could be
tightened up in order to prevent unnecessary crawling, which could have
significant SEO and performance/efficiency benefits.
Additionally, we apply rules inconsistently across different rosetta
subdomains; we should probably standardize these!
I'm looking at wordpress.org/robots.txt as a starting point for
standardization; I'd suggest:
- Remove the various wp-admin 'allow' rules (e.g.,
wordpress.org/robots.txt)
- Combine the remaining disallow rules
- Tweak the 'search' disallow rule to add a trailing slash
- Moving sitemap references to the end
- Disallow `/*/wp-json/` (which is crawled by Google upwards of 40k times
per day!)
- Disallow subfolder variations of some rules so that they catch subsites
(e.g., `/*/wp-admin/`)
- Disallow 'non-pretty' variations (e.g., `?rest_route=`)
- Add some inline comments
That gets us to the following:
{{{
# Prevent crawling of WP internals
# --------------------------------
User-agent: *
Disallow: /wp-admin/
Disallow: /*/wp-admin/
Disallow: /wp-includes/
Disallow: /*/wp-includes/
Disallow: /wp-json/
Disallow: /*/wp-json/
Disallow: /?rest_route=
Disallow: /xmlrpc.php
# Prevent crawling of search URLs
# --------------------------------
User-agent: *
Disallow: /search/
Disallow: /*/search/
Disallow: /?s=
Disallow: /*/?s=
# Sitemaps
# --------------------------------
Sitemap: https://wordpress.org/sitemap.xml
Sitemap: https://wordpress.org/news-sitemap.xml
Sitemap: https://wordpress.org/themes/sitemap.xml
Sitemap: https://wordpress.org/plugins/sitemap.xml
Sitemap: https://wordpress.org/news/sitemap.xml
Sitemap: https://wordpress.org/showcase/sitemap.xml
}}}
Note that there are problems with some of these XML sitemaps and that the
final list of URLs might change (pending a subsequent ticket).
--
Ticket URL: <https://meta.trac.wordpress.org/ticket/6763>
Making WordPress.org <https://meta.trac.wordpress.org/>
Making WordPress.org
More information about the wp-meta
mailing list