[wp-trac] [WordPress Trac] #51211: Implement a consistent function to obtain the sitemap location

Thu Dec 28 11:22:44 UTC 2023

#51211: Implement a consistent function to obtain the sitemap location
-------------------------+------------------------------
 Reporter:  GreatBlakes  |       Owner:  (none)
     Type:  enhancement  |      Status:  new
 Priority:  normal       |   Milestone:  Awaiting Review
Component:  Sitemaps     |     Version:  5.5.1
 Severity:  trivial      |  Resolution:
 Keywords:  has-patch    |     Focuses:
-------------------------+------------------------------

Comment (by letraceursnork):

 @swissspidy excuse my english level in advance, text below was translated
 by ChatGPT (though, the situation is real and I'm struggling with it right
 now):

 Okay, here's a real-life example:

 I have WordPress 6.4.2, Dockerized, deployed to instances as a Docker
 image and orchestrated in Kubernetes.

 Our company has an SEO department that wants the robots.txt file to be
 different from default. They're fine with directly editing robots.txt
 using any plugin that allows this (specifically, we use Yoast SEO).
 However, since this file is actually created in the file system and is
 absent in the repository, all edits are overwritten after redeployment,
 especially after a new release.

 The solution we came up with is: there's a RobotsTxtController that
 'constructs' robots.txt from partials. It fetches User-Agent, Allow, and
 Disallow directives from a specific file (depending on the instance -
 local, staging, production). Then, at the end, it appends Host (using the
 get_site_url() function) and Sitemap (using the get_sitemap_url()
 function) directives. The problem arises precisely because
 get_sitemap_url() is the native and correct way to get the sitemap link.
 However, since it's not filtered and its output cannot be overridden, one
 of two problems occurs:

 1. The plugin generates its own sitemap, which WordPress isn't aware of.
 The plugin wants to add sitemap's url to robots.txt as a separate
 directive, but it can't because we want to control robots.txt ourselves.
 At the point of control, we can't determine if the sitemap has been
 overwritten/regenerated and, if so, what the correct path is.

 2. The plugin does the above but sets a redirect from /wp-sitemap.xml to
 its own URL. In this case, search engine bots might (theoretically) say:
 "We don't want to follow your 301 redirects; the Sitemap directive is
 incorrect. Bye!"

 The solution to both these problems is to add a hook filter for the
 get_sitemap_url() function. Then, each plugin can independently decide if
 it wants to use this native engine functionality or not (but generally, I
 think they would want to).

 P.S. Currently, I'm using a makeshift solution - individually checking if
 a plugin with a specific name is connected. If yes, I provide certain
 hardcoded URLs in the controller method, which fundamentally isn't
 correct.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/51211#comment:12>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform