[wp-trac] [WordPress Trac] #51211: Implement a consistent function to obtain the sitemap location
WordPress Trac
noreply at wordpress.org
Thu Dec 28 11:22:44 UTC 2023
#51211: Implement a consistent function to obtain the sitemap location
-------------------------+------------------------------
Reporter: GreatBlakes | Owner: (none)
Type: enhancement | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Sitemaps | Version: 5.5.1
Severity: trivial | Resolution:
Keywords: has-patch | Focuses:
-------------------------+------------------------------
Comment (by letraceursnork):
@swissspidy excuse my english level in advance, text below was translated
by ChatGPT (though, the situation is real and I'm struggling with it right
now):
Okay, here's a real-life example:
I have WordPress 6.4.2, Dockerized, deployed to instances as a Docker
image and orchestrated in Kubernetes.
Our company has an SEO department that wants the robots.txt file to be
different from default. They're fine with directly editing robots.txt
using any plugin that allows this (specifically, we use Yoast SEO).
However, since this file is actually created in the file system and is
absent in the repository, all edits are overwritten after redeployment,
especially after a new release.
The solution we came up with is: there's a RobotsTxtController that
'constructs' robots.txt from partials. It fetches User-Agent, Allow, and
Disallow directives from a specific file (depending on the instance -
local, staging, production). Then, at the end, it appends Host (using the
get_site_url() function) and Sitemap (using the get_sitemap_url()
function) directives. The problem arises precisely because
get_sitemap_url() is the native and correct way to get the sitemap link.
However, since it's not filtered and its output cannot be overridden, one
of two problems occurs:
1. The plugin generates its own sitemap, which WordPress isn't aware of.
The plugin wants to add sitemap's url to robots.txt as a separate
directive, but it can't because we want to control robots.txt ourselves.
At the point of control, we can't determine if the sitemap has been
overwritten/regenerated and, if so, what the correct path is.
2. The plugin does the above but sets a redirect from /wp-sitemap.xml to
its own URL. In this case, search engine bots might (theoretically) say:
"We don't want to follow your 301 redirects; the Sitemap directive is
incorrect. Bye!"
The solution to both these problems is to add a hook filter for the
get_sitemap_url() function. Then, each plugin can independently decide if
it wants to use this native engine functionality or not (but generally, I
think they would want to).
P.S. Currently, I'm using a makeshift solution - individually checking if
a plugin with a specific name is connected. If yes, I provide certain
hardcoded URLs in the controller method, which fundamentally isn't
correct.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/51211#comment:12>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list