[wp-trac] [WordPress Trac] #21688: Add sanity checks and improve performance when searching for posts
WordPress Trac
wp-trac at lists.automattic.com
Fri Sep 7 18:28:53 UTC 2012
#21688: Add sanity checks and improve performance when searching for posts
-------------------------+------------------------------
Reporter: azaozz | Owner:
Type: enhancement | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Query | Version:
Severity: normal | Resolution:
Keywords: |
-------------------------+------------------------------
Comment (by gibrown):
Really like these improvements. Tough problem given mysql's text search
limitations.
A few thoughts that came to mind. These probably don't all work together,
just throwing out ideas:
- Removing all two letter words seems like it will have a lot of
implications for abbreviations and a few other english words ('id', 'ha',
'ma').
- To reduce impact of matching sub-words with "%id%" while still matching
"house" to both "house" and "houses" we could explicitly match against
something like "%house[s $,.\"']". In most cases (for English) any ending
besides a plural 's' will change the meaning for the word ("person" vs
"personal"). Could define another filter that provides the endings to
match against based on language.
- Similarly, could detect whether there is any whitespace in the query and
if there is then add whitespace (plus punctuation) around short terms (< 4
letters?). For example "%[^ ]cat[s $,.\"']%" The reason to condition this
on the query having whitespace is to not break in foreign languages
without whitespace. Again these patterns should probably be able to be
language specific and filterable. The reason for not doing this for all
words is to improve matching against compound words ("house" can match
"treehouse").
- Could expand the stopwords list if we didn't have to convert it from a
translated string. There's some pretty good stopword lists for multiple
languages here: http://www.ranks.nl/resources/stopwords.html And the
filter mechanism allows people to modify them. Speed might be more
important than the flexibility glotpress provides. I think generally fewer
stop words is better.
--
Ticket URL: <http://core.trac.wordpress.org/ticket/21688#comment:16>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list