[wp-trac] [WordPress Trac] #50456: Multisite robots.txt files should reference all network XML sitemaps

WordPress Trac noreply at wordpress.org
Tue Jun 23 20:14:58 UTC 2020


#50456: Multisite robots.txt files should reference all network XML sitemaps
----------------------------+------------------------------
 Reporter:  jonoaldersonwp  |       Owner:  (none)
     Type:  defect (bug)    |      Status:  new
 Priority:  normal          |   Milestone:  Awaiting Review
Component:  Sitemaps        |     Version:
 Severity:  normal          |  Resolution:
 Keywords:  seo             |     Focuses:  multisite
----------------------------+------------------------------
Changes (by SergeyBiryukov):

 * focuses:   => multisite


Old description:

> https://core.trac.wordpress.org/changeset/48072 adds XML sitemaps to
> core, with the objective of making public URLs more 'discoverable'.
>
> Part of this discoverability relies on alterations to the site's
> robots.txt file, to add a reference the URL of the sitemap index.
>
> On multisite setups where sites run in ''subfolders'', this mechanism
> breaks; a domain can only have one robots.txt file at the domain root,
> which means that sub-sites don't expose the location of their sitemap.
>
> To address this, we should, in all viable cases, add the sitemap URL(s)
> for ''every site in a network'' to the top-level robots.txt file.
>
> For the sake of completeness, robustness and utility, this should be
> extended to also include multi-site setups on multiple domains/subdomains
> (or in fact, on any setup).
>
> NB, most consumers support cross-domain XML sitemap references in
> robots.txt files, so this isn't a concern.
>
> E.g.,
>
> On a theoretical multi-site setup running across multiple hostnames
> ''and'' folders, I'd expect https://www.example.com/robots.txt to contain
> something like the following:
>
> {{{
> Sitemap: https://www.example.com/wp-sitemap.xml
> Sitemap: https://www.example.com/sub-site/wp-sitemap.xml
> Sitemap: https://other.example.com/wp-sitemap.xml
> }}}

New description:

 [48072] adds XML sitemaps to core, with the objective of making public
 URLs more 'discoverable'.

 Part of this discoverability relies on alterations to the site's
 robots.txt file, to add a reference the URL of the sitemap index.

 On multisite setups where sites run in ''subfolders'', this mechanism
 breaks; a domain can only have one robots.txt file at the domain root,
 which means that sub-sites don't expose the location of their sitemap.

 To address this, we should, in all viable cases, add the sitemap URL(s)
 for ''every site in a network'' to the top-level robots.txt file.

 For the sake of completeness, robustness and utility, this should be
 extended to also include multi-site setups on multiple domains/subdomains
 (or in fact, on any setup).

 NB, most consumers support cross-domain XML sitemap references in
 robots.txt files, so this isn't a concern.

 E.g.,

 On a theoretical multi-site setup running across multiple hostnames
 ''and'' folders, I'd expect https://www.example.com/robots.txt to contain
 something like the following:

 {{{
 Sitemap: https://www.example.com/wp-sitemap.xml
 Sitemap: https://www.example.com/sub-site/wp-sitemap.xml
 Sitemap: https://other.example.com/wp-sitemap.xml
 }}}

--

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/50456#comment:6>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list