[wp-trac] [WordPress Trac] #21688: Add sanity checks and improve performance when searching for posts

WordPress Trac wp-trac at lists.automattic.com
Fri Sep 7 18:28:53 UTC 2012


#21688: Add sanity checks and improve performance when searching for posts
-------------------------+------------------------------
 Reporter:  azaozz       |       Owner:
     Type:  enhancement  |      Status:  new
 Priority:  normal       |   Milestone:  Awaiting Review
Component:  Query        |     Version:
 Severity:  normal       |  Resolution:
 Keywords:               |
-------------------------+------------------------------

Comment (by gibrown):

 Really like these improvements. Tough problem given mysql's text search
 limitations.

 A few thoughts that came to mind. These probably don't all work together,
 just throwing out ideas:
 - Removing all two letter words seems like it will have a lot of
 implications for abbreviations and a few other english words ('id', 'ha',
 'ma').
 - To reduce impact of matching sub-words with "%id%" while still matching
 "house" to both "house" and "houses" we could explicitly match against
 something like "%house[s $,.\"']". In most cases (for English) any ending
 besides a plural 's' will change the meaning for the word ("person" vs
 "personal"). Could define another filter that provides the endings to
 match against based on language.
 - Similarly, could detect whether there is any whitespace in the query and
 if there is then add whitespace (plus punctuation) around short terms (< 4
 letters?). For example "%[^ ]cat[s $,.\"']%" The reason to condition this
 on the query having whitespace is to not break in foreign languages
 without whitespace. Again these patterns should probably be able to be
 language specific and filterable. The reason for not doing this for all
 words is to improve matching against compound words ("house" can match
 "treehouse").
 - Could expand the stopwords list if we didn't have to convert it from a
 translated string. There's some pretty good stopword lists for multiple
 languages here: http://www.ranks.nl/resources/stopwords.html And the
 filter mechanism allows people to modify them. Speed might be more
 important than the flexibility glotpress provides. I think generally fewer
 stop words is better.

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/21688#comment:16>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list