[wp-trac] [WordPress Trac] #25585: Arabic stopwords comparison
WordPress Trac
noreply at wordpress.org
Thu Oct 17 14:41:48 UTC 2013
#25585: Arabic stopwords comparison
-------------------------+--------------------
Reporter: alex-ye | Owner:
Type: enhancement | Status: new
Priority: normal | Milestone: 3.7
Component: General | Version: trunk
Severity: normal | Resolution:
Keywords: needs-patch |
-------------------------+--------------------
Comment (by alex-ye):
Replying to [comment:10 azaozz]:
> The idea is to remove single letter terms from the search. The pattern
`/^\p{L}$/u` is the safest way to match a single letter in any language.
It's not particularly fast as it looks through the Unicode character
properties. A better (but quite slower) pattern could be
`/^\p{L}\p{M}*|\p{Z}|\p{P}|\p{C}$/u` which also matches separators (any
kind of whitespace or invisible separators), punctuation, and invisible
control characters and unused code points ([http://www.regular-
expressions.info/unicode.html#category more info]).
See the example below, I have used the 'str_replace' function and I will
try to apply your suggestion about the RegExp:
{{{
function ArWP_normalize( $str ) {
// Normalize the Alef.
$str = str_replace( array(
'أ','إ','آ'
), 'ا', $str );
// Normalize the Diacritics.
$str = str_replace( array(
'َ','ً','ُ','ٌ','ِ','ٍ','ْ','ّ'
), '', $str );
// Return the new string.
return $str;
} // end ArWP_normalize()
}}}
--
Ticket URL: <http://core.trac.wordpress.org/ticket/25585#comment:12>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list