[wp-trac] [WordPress Trac] #50456: Multisite robots.txt files should reference all network XML sitemaps
WordPress Trac
noreply at wordpress.org
Tue Jun 23 20:14:58 UTC 2020
#50456: Multisite robots.txt files should reference all network XML sitemaps
----------------------------+------------------------------
Reporter: jonoaldersonwp | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Sitemaps | Version:
Severity: normal | Resolution:
Keywords: seo | Focuses: multisite
----------------------------+------------------------------
Changes (by SergeyBiryukov):
* focuses: => multisite
Old description:
> https://core.trac.wordpress.org/changeset/48072 adds XML sitemaps to
> core, with the objective of making public URLs more 'discoverable'.
>
> Part of this discoverability relies on alterations to the site's
> robots.txt file, to add a reference the URL of the sitemap index.
>
> On multisite setups where sites run in ''subfolders'', this mechanism
> breaks; a domain can only have one robots.txt file at the domain root,
> which means that sub-sites don't expose the location of their sitemap.
>
> To address this, we should, in all viable cases, add the sitemap URL(s)
> for ''every site in a network'' to the top-level robots.txt file.
>
> For the sake of completeness, robustness and utility, this should be
> extended to also include multi-site setups on multiple domains/subdomains
> (or in fact, on any setup).
>
> NB, most consumers support cross-domain XML sitemap references in
> robots.txt files, so this isn't a concern.
>
> E.g.,
>
> On a theoretical multi-site setup running across multiple hostnames
> ''and'' folders, I'd expect https://www.example.com/robots.txt to contain
> something like the following:
>
> {{{
> Sitemap: https://www.example.com/wp-sitemap.xml
> Sitemap: https://www.example.com/sub-site/wp-sitemap.xml
> Sitemap: https://other.example.com/wp-sitemap.xml
> }}}
New description:
[48072] adds XML sitemaps to core, with the objective of making public
URLs more 'discoverable'.
Part of this discoverability relies on alterations to the site's
robots.txt file, to add a reference the URL of the sitemap index.
On multisite setups where sites run in ''subfolders'', this mechanism
breaks; a domain can only have one robots.txt file at the domain root,
which means that sub-sites don't expose the location of their sitemap.
To address this, we should, in all viable cases, add the sitemap URL(s)
for ''every site in a network'' to the top-level robots.txt file.
For the sake of completeness, robustness and utility, this should be
extended to also include multi-site setups on multiple domains/subdomains
(or in fact, on any setup).
NB, most consumers support cross-domain XML sitemap references in
robots.txt files, so this isn't a concern.
E.g.,
On a theoretical multi-site setup running across multiple hostnames
''and'' folders, I'd expect https://www.example.com/robots.txt to contain
something like the following:
{{{
Sitemap: https://www.example.com/wp-sitemap.xml
Sitemap: https://www.example.com/sub-site/wp-sitemap.xml
Sitemap: https://other.example.com/wp-sitemap.xml
}}}
--
--
Ticket URL: <https://core.trac.wordpress.org/ticket/50456#comment:6>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list