[wp-trac] [WordPress Trac] #25585: Arabic stopwords comparison
WordPress Trac
noreply at wordpress.org
Tue Oct 15 20:31:59 UTC 2013
#25585: Arabic stopwords comparison
-------------------------------------------+--------------------
Reporter: alex-ye | Owner:
Type: enhancement | Status: new
Priority: normal | Milestone: 3.7
Component: General | Version: trunk
Severity: normal | Resolution:
Keywords: needs-patch reporter-feedback |
-------------------------------------------+--------------------
Comment (by azaozz):
> I think the crazy regular expression in parse_search() should possibly
be moved into it, and search_terms_count should be set after? Right now,
it's possible for parse_search_terms() to return less terms than is
specified in search_terms_count.
Yeah, it can be moved to the proposed filter so plugins could change the
pattern for specific languages.
The idea is to remove single letter terms from the search. The pattern
`/^\p{L}$/u` is the safest way to match a single letter in any language.
It's not particularly fast as it looks through the Unicode character
properties. A better (but quite slower) pattern could be
`/^\p{L}\p{M}*|\p{Z}|\p{P}|\p{C}$/u` which also matches separators (any
kind of whitespace or invisible separators), punctuation, and invisible
control characters and unused code points.
`search_terms_count` is the count before the terms were cleaned. It's used
to determine if the sorting would use AND and OR for the title, or just
the sentence match. This part of parse_search_order() has gone through
quite a few changes, maybe there is a simpler way to do that now.
--
Ticket URL: <http://core.trac.wordpress.org/ticket/25585#comment:10>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list