[wp-trac] [WordPress Trac] #43590: Search Engine Visibility option does not work as intended
WordPress Trac
noreply at wordpress.org
Wed Aug 21 08:28:54 UTC 2019
#43590: Search Engine Visibility option does not work as intended
--------------------------+------------------------------
Reporter: mamaedler | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: General | Version: 4.9.4
Severity: normal | Resolution:
Keywords: has-patch | Focuses: administration
--------------------------+------------------------------
Comment (by jonoaldersonwp):
Resurrecting this, as there's some nuance here.
1) As pointed out above, the Reading setting infers that it's intended to
prevent search engines from ''indexing'' the content, rather than from
''crawling'' it. However, **the presence of the robots disallow rule
prevents search engines from ever discovering the `noindex` directive**,
and thus they may index 'fragments' (where the page is indexed without
content).
2) Google recently announced that they're making efforts to prevent
fragment indexing. However, until this exists (and I'm not sure it will;
it's still a necessary/correct solution sometimes), we should solve for
current behaviours. **Let's remove the `robots.txt` disallow rule**, and
allow Google (and others) to ''crawl'' the site.
3) **The output of the meta robots currently isn't in line with
[https://wordpress.org/support/article/settings-reading-screen/ the
documentation]**. It outputs a value of `noindex,follow`, which should be
altered to `noindex,nofollow` in line with the documentation.
**Now, here's a challenge...**
Removing the `robots.txt` disallow rule opens up the whole site. Including
images, files, bits of plugin folders, and other files which don't use `x
-robots-tag` headers (or server indexing options) to manage/prevent
exposure or indexation. That might result in, e.g., an `assets` folder in
a plugin being crawled and indexed. E.g., anything like this:
https://www.google.com/search?q=inurl%3Awp-content%2Fplugins&oq=inurl
%3Awp-content%2Fplugins
This is already the case on many(!!) live sites, so, this isn't a ''new''
problem, but we'd be newly exposing it on sites which currently ''think''
that they're blocking search engines from indexing their content.
Are we comfortable with that impact?
If so, let's **remove the robots.txt disallow rule**, and **fix the meta
robots tag**.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/43590#comment:9>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list