[wp-trac] [WordPress Trac] #25585: Arabic stopwords comparison

WordPress Trac noreply at wordpress.org
Tue Oct 15 20:31:59 UTC 2013


#25585: Arabic stopwords comparison
-------------------------------------------+--------------------
 Reporter:  alex-ye                        |       Owner:
     Type:  enhancement                    |      Status:  new
 Priority:  normal                         |   Milestone:  3.7
Component:  General                        |     Version:  trunk
 Severity:  normal                         |  Resolution:
 Keywords:  needs-patch reporter-feedback  |
-------------------------------------------+--------------------

Comment (by azaozz):

 > I think the crazy regular expression in parse_search() should possibly
 be moved into it, and search_terms_count should be set after? Right now,
 it's possible for parse_search_terms() to return less terms than is
 specified in search_terms_count.

 Yeah, it can be moved to the proposed filter so plugins could change the
 pattern for specific languages.

 The idea is to remove single letter terms from the search. The pattern
 `/^\p{L}$/u` is the safest way to match a single letter in any language.
 It's not particularly fast as it looks through the Unicode character
 properties. A better (but quite slower) pattern could be
 `/^\p{L}\p{M}*|\p{Z}|\p{P}|\p{C}$/u` which also matches separators (any
 kind of whitespace or invisible separators), punctuation, and invisible
 control characters and unused code points.

 `search_terms_count` is the count before the terms were cleaned. It's used
 to determine if the sorting would use AND and OR for the title, or just
 the sentence match. This part of parse_search_order() has gone through
 quite a few changes, maybe there is a simpler way to do that now.

--
Ticket URL: <http://core.trac.wordpress.org/ticket/25585#comment:10>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list