[wp-trac] [WordPress Trac] #7394: Search: order results by relevance

WordPress Trac wp-trac at lists.automattic.com
Thu Aug 23 03:18:08 UTC 2012


#7394: Search: order results by relevance
-------------------------+-----------------------------
 Reporter:  markjaquith  |       Owner:
     Type:  enhancement  |      Status:  assigned
 Priority:  normal       |   Milestone:  Future Release
Component:  General      |     Version:  2.6
 Severity:  normal       |  Resolution:
 Keywords:  has-patch    |
-------------------------+-----------------------------

Comment (by tomauger):

 Well, it appears that using REGEXP is significantly slower than a brute-
 force WHERE and ORDER BY clause, though the SQL is arguably more elegant
 (but who cares, I guess).

 However, one takeaway from the SQL below is that we might want to be a bit
 more careful around word boundaries. I would argue that a post title
 called "Best Post Evah" matches the search term "Post" better than "Ten
 Composting Tricks". See below:


 {{{
 SELECT SQL_CALC_FOUND_ROWS wp_posts.ID, wp_posts.post_title
 FROM wp_posts
 WHERE 1=1
         AND (wp_posts.post_title REGEXP 'one|two|three' OR
 wp_posts.post_content REGEXP 'one|two|three')
         AND wp_posts.post_type IN ('post', 'page', 'attachment')
         AND (wp_posts.post_status = 'publish' OR wp_posts.post_author = 1
 AND wp_posts.post_status = 'private')
 ORDER BY
         wp_posts.post_title NOT REGEXP '[[:<:]]one two three[[:>:]]',
         wp_posts.post_content NOT REGEXP '[[:<:]]one two three[[:>:]]',
         wp_posts.post_title NOT REGEXP '[[:<:]]one two[[:>:]]|[[:<:]]two
 three[[:>:]]',
         wp_posts.post_content NOT REGEXP '[[:<:]]one two[[:>:]]|[[:<:]]two
 three[[:>:]]',
         wp_posts.post_title NOT REGEXP 'one|two|three',
         wp_posts.post_date DESC
 LIMIT 0, 10
 }}}

 Note that I'm unsure as to the weighting of the same search sequence
 within post_content as post_title. We may decide that two search terms
 with proper word boundaries in the title is still better than the full
 match in the content.

 Of course, this then excludes pluralizations and so forth, so "Post" would
 no longer have a high relevance rating with "Top Ten Posts of 2012"
 because of the "s".

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/7394#comment:20>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list