[wp-trac] [WordPress Trac] #7394: Search: order results by relevance
WordPress Trac
noreply at wordpress.org
Sun Oct 21 19:09:02 UTC 2012
#7394: Search: order results by relevance
-------------------------------------------------+-------------------------
Reporter: markjaquith | Owner:
Type: enhancement | Status: assigned
Priority: normal | Milestone: Future
Component: General | Release
Severity: normal | Version: 2.6
Keywords: has-patch needs-refresh needs-unit- | Resolution:
tests |
-------------------------------------------------+-------------------------
Comment (by azaozz):
Replying to [comment:31 nacin]:
> * Split word cleaning (removal of short words, etc) into a separate
patch. This should be considered separately. That probably means we can
continue to use _search_terms_tidy() instead of _check_search_terms().
It used to be #21688 however sanity checks and removal of one letter terms
and stopwords is needed for implementing sorting by relevance.
If you mean separating the "stopwords" functionality in another function,
it used to be that way in a previous patch. There might be a possibility
to use stopwords somewhere else, so not merging them in
`_check_search_terms()` makes sense.
`_search_terms_tidy()` was designed to be a callback for `array_filter()`
and has limitations.
> There remain three distinct concerns:
>
> 1. Plugin compatibility: Does this have the potential to break plugins?
Not plugins that implement fulltext index on the posts table. Will look
for other plugins that (perhaps) implement something similar.
> 2. Performance: This worked well on WP.com, but they use SSDs, query
caching, and have mostly vanilla use cases (ties back into plugin
compatibility). Does this cause problems under strain?
The results from WP.com show no change to the load of MySQL whether it's
on the same server or on a dedicated DB server with SSDs, etc. Also ran
quite a bit of tests on my tests server and didn't see any MySQL
performance problems.
In most cases the ORDER BY would run several more LIKE on the selected
rows. While at first look this seems slow, in reality it's very fast.
Further the sorting uses only the whole search string if it's too specific
(contains many search terms) and has some sensible "sanity limits".
> 3. Results: Does this result in bad search results on occasion by
promoting the wrong things to the top? One example could include P2 auto
titles. Yes, there is a filter, but if there are concerns that were raised
by WP.com developers, I'd like to work them out here.
Did quite a bit of research while working on this. The sorting was
modelled to mimic how the search engines work. This improvement concerns
mostly the front-end searches when a visitor to the site uses our search
form. The results we return should be similar to the results Google, Bing,
etc. return for the site.
It heavily emphasis term matches in the title with full search string
matches receiving the highest priority.
In the particular case for P2s, the auto-generated title is the same as
the first few words of the content and may not represent the post very
well. For that case matches in the title are disabled but full search
string matches in the content are still being used to improve the sorting.
> Overall, not looking likely for 3.5. This is something that needs
further review and needs to land early. Also, unit tests...
That's pity. Our search has been pretty bad for a very long time, look at
when this ticket was opened :)
The proposed patch makes it many times better both for the site visitors
and for the admin. I know the SQL may look scary at first but it's just a
simple MySQL functionality. It's not more complicated that a join or a
subquery.
--
Ticket URL: <http://core.trac.wordpress.org/ticket/7394#comment:32>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list