[wp-trac] [WordPress Trac] #21688: Add sanity checks and improve performance when searching for posts
WordPress Trac
wp-trac at lists.automattic.com
Mon Aug 27 16:06:45 UTC 2012
#21688: Add sanity checks and improve performance when searching for posts
-------------------------+------------------------------
Reporter: azaozz | Owner:
Type: enhancement | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Query | Version:
Severity: normal | Resolution:
Keywords: |
-------------------------+------------------------------
Comment (by azaozz):
Replying to [comment:13 johnbillion]:
> The string length has a very loose correlation to the relevance of the
word in a search.
Right. However out search doesn't use word boundaries, it uses `LIKE
'%term%'` not `REGEXP '[[:<:]]term[[:>:]]'` so it matches inside words
too. This has it's advantages (quite faster, matches different forms of
the same word, etc.) but also makes matching of most very short terms
irrelevant as they will match all or nearly all posts.
With the current patch and the patch from #7394 terms like TV, 3G, MD, UK,
US, etc. will not be used when in a multi-word search but will be used for
sorting the results as part of the whole search string. Also when a search
is only for a short word, the search string is used literally.
> The stopword list is the best method and should also include all the
one- and two-letter words in English that are to be filtered out. The
Relevanssi search plugin, for example, includes quite an exhaustive list
of stopwords.
Was thinking about that too but it would make that array very long.
Removing one and two letter terms also acts as a sanity check. There is no
point in running searches like `q w e r t y u i o p` or `qw er ty ui op as
df gh jk` as separate terms (they will still run as "sentence").
It's possible to improve the removal of one and two letter terms by
looking for capitalization and numbers and not remove these. It would slow
down that bit of code though. Will look into it.
--
Ticket URL: <http://core.trac.wordpress.org/ticket/21688#comment:14>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list