[wp-trac] [WordPress Trac] #43590: Search Engine Visibility option does not work as intended

WordPress Trac noreply at wordpress.org
Wed Aug 21 08:28:54 UTC 2019


#43590: Search Engine Visibility option does not work as intended
--------------------------+------------------------------
 Reporter:  mamaedler     |       Owner:  (none)
     Type:  defect (bug)  |      Status:  new
 Priority:  normal        |   Milestone:  Awaiting Review
Component:  General       |     Version:  4.9.4
 Severity:  normal        |  Resolution:
 Keywords:  has-patch     |     Focuses:  administration
--------------------------+------------------------------

Comment (by jonoaldersonwp):

 Resurrecting this, as there's some nuance here.

 1) As pointed out above, the Reading setting infers that it's intended to
 prevent search engines from ''indexing'' the content, rather than from
 ''crawling'' it. However, **the presence of the robots disallow rule
 prevents search engines from ever discovering the `noindex` directive**,
 and thus they may index 'fragments' (where the page is indexed without
 content).

 2) Google recently announced that they're making efforts to prevent
 fragment indexing. However, until this exists (and I'm not sure it will;
 it's still a necessary/correct solution sometimes), we should solve for
 current behaviours. **Let's remove the `robots.txt` disallow rule**, and
 allow Google (and others) to ''crawl'' the site.

 3) **The output of the meta robots currently isn't in line with
 [https://wordpress.org/support/article/settings-reading-screen/ the
 documentation]**. It outputs a value of `noindex,follow`, which should be
 altered to `noindex,nofollow` in line with the documentation.

 **Now, here's a challenge...**
 Removing the `robots.txt` disallow rule opens up the whole site. Including
 images, files, bits of plugin folders, and other files which don't use `x
 -robots-tag` headers (or server indexing options) to manage/prevent
 exposure or indexation. That might result in, e.g., an `assets` folder in
 a plugin being crawled and indexed. E.g., anything like this:
 https://www.google.com/search?q=inurl%3Awp-content%2Fplugins&oq=inurl
 %3Awp-content%2Fplugins

 This is already the case on many(!!) live sites, so, this isn't a ''new''
 problem, but we'd be newly exposing it on sites which currently ''think''
 that they're blocking search engines from indexing their content.

 Are we comfortable with that impact?

 If so, let's **remove the robots.txt disallow rule**, and **fix the meta
 robots tag**.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/43590#comment:9>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list