[wp-trac] [WordPress Trac] #7394: Search: order results by relevance

WordPress Trac noreply at wordpress.org
Sun Oct 21 19:09:02 UTC 2012


#7394: Search: order results by relevance
-------------------------------------------------+-------------------------
 Reporter:  markjaquith                          |       Owner:
     Type:  enhancement                          |      Status:  assigned
 Priority:  normal                               |   Milestone:  Future
Component:  General                              |  Release
 Severity:  normal                               |     Version:  2.6
 Keywords:  has-patch needs-refresh needs-unit-  |  Resolution:
  tests                                          |
-------------------------------------------------+-------------------------

Comment (by azaozz):

 Replying to [comment:31 nacin]:
 >  * Split word cleaning (removal of short words, etc) into a separate
 patch. This should be considered separately. That probably means we can
 continue to use _search_terms_tidy() instead of _check_search_terms().

 It used to be #21688 however sanity checks and removal of one letter terms
 and stopwords is needed for implementing sorting by relevance.

 If you mean separating the "stopwords" functionality in another function,
 it used to be that way in a previous patch. There might be a possibility
 to use stopwords somewhere else, so not merging them in
 `_check_search_terms()` makes sense.

 `_search_terms_tidy()` was designed to be a callback for `array_filter()`
 and has limitations.

 > There remain three distinct concerns:
 >
 > 1. Plugin compatibility: Does this have the potential to break plugins?

 Not plugins that implement fulltext index on the posts table. Will look
 for other plugins that (perhaps) implement something similar.

 > 2. Performance: This worked well on WP.com, but they use SSDs, query
 caching, and have mostly vanilla use cases (ties back into plugin
 compatibility). Does this cause problems under strain?

 The results from WP.com show no change to the load of MySQL whether it's
 on the same server or on a dedicated DB server with SSDs, etc. Also ran
 quite a bit of tests on my tests server and didn't see any MySQL
 performance problems.

 In most cases the ORDER BY would run several more LIKE on the selected
 rows. While at first look this seems slow, in reality it's very fast.
 Further the sorting uses only the whole search string if it's too specific
 (contains many search terms) and has some sensible "sanity limits".

 > 3. Results: Does this result in bad search results on occasion by
 promoting the wrong things to the top? One example could include P2 auto
 titles. Yes, there is a filter, but if there are concerns that were raised
 by WP.com developers, I'd like to work them out here.

 Did quite a bit of research while working on this. The sorting was
 modelled to mimic how the search engines work. This improvement concerns
 mostly the front-end searches when a visitor to the site uses our search
 form. The results we return should be similar to the results Google, Bing,
 etc. return for the site.

 It heavily emphasis term matches in the title with full search string
 matches receiving the highest priority.

 In the particular case for P2s, the auto-generated title is the same as
 the first few words of the content and may not represent the post very
 well. For that case matches in the title are disabled but full search
 string matches in the content are still being used to improve the sorting.

 > Overall, not looking likely for 3.5. This is something that needs
 further review and needs to land early. Also, unit tests...

 That's pity. Our search has been pretty bad for a very long time, look at
 when this ticket was opened :)

 The proposed patch makes it many times better both for the site visitors
 and for the admin. I know the SQL may look scary at first but it's just a
 simple MySQL functionality. It's not more complicated that a join or a
 subquery.

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/7394#comment:32>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list