[wp-trac] [WordPress Trac] #25585: Arabic stopwords comparison

WordPress Trac noreply at wordpress.org
Thu Oct 17 14:41:48 UTC 2013


#25585: Arabic stopwords comparison
-------------------------+--------------------
 Reporter:  alex-ye      |       Owner:
     Type:  enhancement  |      Status:  new
 Priority:  normal       |   Milestone:  3.7
Component:  General      |     Version:  trunk
 Severity:  normal       |  Resolution:
 Keywords:  needs-patch  |
-------------------------+--------------------

Comment (by alex-ye):

 Replying to [comment:10 azaozz]:
 > The idea is to remove single letter terms from the search. The pattern
 `/^\p{L}$/u` is the safest way to match a single letter in any language.
 It's not particularly fast as it looks through the Unicode character
 properties. A better (but quite slower) pattern could be
 `/^\p{L}\p{M}*|\p{Z}|\p{P}|\p{C}$/u` which also matches separators (any
 kind of whitespace or invisible separators), punctuation, and invisible
 control characters and unused code points ([http://www.regular-
 expressions.info/unicode.html#category more info]).

 See the example below, I have used the 'str_replace' function and I will
 try to apply your suggestion about the RegExp:


 {{{
 function ArWP_normalize( $str ) {

         // Normalize the Alef.
         $str = str_replace( array(
                 'أ','إ','آ'
         ), 'ا', $str );

         // Normalize the Diacritics.
         $str = str_replace( array(
                 'َ','ً','ُ','ٌ','ِ','ٍ','ْ','ّ'
         ), '', $str );

         // Return the new string.
         return $str;

 } // end ArWP_normalize()
 }}}

--
Ticket URL: <http://core.trac.wordpress.org/ticket/25585#comment:12>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list