[wp-trac] [WordPress Trac] #52900: Instantly index WordPress web sites content in Search Engines

WordPress Trac noreply at wordpress.org
Tue Oct 19 00:38:05 UTC 2021


#52900: Instantly index WordPress web sites content in Search Engines
-------------------------------------------------+-------------------------
 Reporter:  fabricecanel                         |       Owner:  (none)
     Type:  feature request                      |      Status:  new
 Priority:  normal                               |   Milestone:  Awaiting
                                                 |  Review
Component:  General                              |     Version:
 Severity:  normal                               |  Resolution:
 Keywords:  reporter-feedback has-patch has-     |     Focuses:
  unit-tests                                     |
-------------------------------------------------+-------------------------

Comment (by fabricecanel):

 Replying to [comment:22 dd32]:
 > Replying to [comment:21 fabricecanel]:
 > > We did our first pull request
 >
 > Hi @fabricecanel,
 >
 > To follow up on some earlier comments here - have you looked into
 integrating with either http://pingomatic.com/ or http://blo.gs/cloud.php
 ?
 >
 > They're admittedly not very modern API's, but benefit from millions of
 existing sites already making use of them, combined with existing
 standards such as Sitemaps it can provide what's needed without additional
 code on the clients side.

 > > Replying to [comment:22 dd32]: As shared in this feature request,
 today Microsoft Bing and Yandex release Microsoft Bing and Yandex, came up
 with this search industry wide specification https://www.indexnow.org/
 open to all major search engines; already supported by Microsoft Bing,
 Yandex and few actors in the industry. We need a service secure (key is
 provided by the site), easy to integrate, scaling to the whole industry,
 all scenarios (web site, CMS, CDN, SEO companies), targeted for search
 engines as to support add, update and delete, and helping search engines
 to minimize crawl load. So, a broader scope. One key scenario for
 WordPress sites is that most sites owners expect to see their content
 quickly indexed (except in case of noindex tag) without having to do
 something to do, ability to be indexed fast should be built in the search
 engines, not all webmasters want to adopt a ping service to see their
 content stolen and duplicated all over the internet.

 >
 > There might also be room in the middle to act as a middleman - consuming
 those API's and relaying it onto Bing and others using the API, or having
 Pingomattic or blo.gs to relay it onwards to those too.
 >
 > Before this proposal is really viable to consider for WordPress
 inclusion (IMHO) there needs to be industry support on it being a
 generalised system that allows for all players (small and large) to be
 supported without additional need from site authors or software vendors. A
 standard is only truely open if multiple vendors support it, otherwise
 it's just an proprietary format that so happens to be documented publicly.
 >
 > To me, it seems that having client websites actively "pinging" select
 search engines added in WordPress core is not exactly open, I would want
 anyone interested in the data being able to access a stream of the changes
 - and having them get their crawler added to WordPress seems like a high
 barrier to entry.
 >
 > This seems like one of the major benefits of centralised open relay
 services like those mentioned above.
 >
 > I'm assuming that one of the reasons for this approach, based on the
 inclusion of a per-site key that can be validated through a HTTP callback,
 is that the existing methods include a lot of spam and lack of any way to
 verify that whom sent the request is actually the author of it. Monitoring
 the Blo.gs feed definitely shows a LOT of spam. While the key verification
 will allow verifying it is who they say they are, it won't prevent spam
 being pushed into the system.
 >
 > ----
 >
 > To throw some ideas in here:
 >  - What would need to be done to improve the existing pingback services
 in place?
 >  - Do they ''need'' to be replaced?
 >  - Do they need to supply extra details to clients to improve the
 service?
 >
 > Looking at the output from blo.gs feed:
 > {{{
 > <weblog name="My Site" url="https://example.org/" service="ping"
 ts="20210928T08:00:00Z" />
 > }}}
 > > Replying to [comment:22 dd32]: Existing ping services are not open.
 Users of these ping systems, generally ping only a few dominant players.
 https://www.indexnow.org/ is open, it shares URLs submitted between all
 search engines having adopted. You ping one, you ping in fact all.

 >
 > That's not super useful as-is, it doesn't say what changed, but the
 addition of a link to a) The sitemap and b) the page changed would benefit
 greatly and provide a lot of what this proposal adds.

 > > Replying to [comment:22 dd32]: a) Sitemaps is a great way to tell
 search engines all the relevant URLs on your site. Search Engines attempt
 looking at sitemaps once a day. Do you like to wait 1+ days to see your
 content indexed? IndexNow https://www.indexnow.org/ allows you to have
 your content index now, not in few days. b) Page changes is not a great
 solution we have to pull often millions of sites to discover if the
 content has changed. Right model is  IndexNow + Sitemaps... IndexNow to
 get indexing done fast and sitemaps to catchup if a ping is missed.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/52900#comment:28>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list