[wp-trac] [WordPress Trac] #21688: Add sanity checks and improve performance when searching for posts

WordPress Trac wp-trac at lists.automattic.com
Mon Aug 27 16:06:45 UTC 2012


#21688: Add sanity checks and improve performance when searching for posts
-------------------------+------------------------------
 Reporter:  azaozz       |       Owner:
     Type:  enhancement  |      Status:  new
 Priority:  normal       |   Milestone:  Awaiting Review
Component:  Query        |     Version:
 Severity:  normal       |  Resolution:
 Keywords:               |
-------------------------+------------------------------

Comment (by azaozz):

 Replying to [comment:13 johnbillion]:
 > The string length has a very loose correlation to the relevance of the
 word in a search.

 Right. However out search doesn't use word boundaries, it uses `LIKE
 '%term%'` not `REGEXP '[[:<:]]term[[:>:]]'` so it matches inside words
 too. This has it's advantages (quite faster, matches different forms of
 the same word, etc.) but also makes matching of most very short terms
 irrelevant as they will match all or nearly all posts.

 With the current patch and the patch from #7394 terms like TV, 3G, MD, UK,
 US, etc. will not be used when in a multi-word search but will be used for
 sorting the results as part of the whole search string. Also when a search
 is only for a short word, the search string is used literally.

 > The stopword list is the best method and should also include all the
 one- and two-letter words in English that are to be filtered out. The
 Relevanssi search plugin, for example, includes quite an exhaustive list
 of stopwords.

 Was thinking about that too but it would make that array very long.
 Removing one and two letter terms also acts as a sanity check. There is no
 point in running searches like `q w e r t y u i o p` or `qw er ty ui op as
 df gh jk` as separate terms (they will still run as "sentence").

 It's possible to improve the removal of one and two letter terms by
 looking for capitalization and numbers and not remove these. It would slow
 down that bit of code though. Will look into it.

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/21688#comment:14>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list