[wp-hackers] Overriding get_posts() behaviour
Denis de Bernardy
denis at semiologic.com
Tue Jul 5 12:59:46 GMT 2005
Hi Bill,
To be very frank I'd hardly expect any of the results we'll get in php will
be close to the things you can get from natural language seach spin offs
such as Delphes. Even with Nutche -- Lucene -- or one of MSN, Yahoo, Ask,
Vivimo or Google around.
That said, feel free to send your mods, I imagine. :)
A few comments:
#1 may break the very index. I think a more reasonable approache is some
kind of node_meta field. Best I know, a match against (node_name,
node_content) with meta in node_content is not the same as a match against
(node_name, node_meta, node_content).
#2 Fetching beans on a bean search as in a like statement is quite simple:
change the regex to \b\w*bean\w*\b rather than \bbean\b. Then again, if the
searcher wants queries on a query search, he won't get the expected result,
so I think a soundex or a similar approach is a more reasonable option.
Alternatively, there are stemming librairies, but it is top-down and as
Christian points out it is language dependant.
#3 breaks three relevance criterias. The first is the match against sql
statement. The second is whether all terms are present, and the last is
whether they are in the same order.
#4 makes good sense, and it even makes sense to search on it directly and to
give it a high score I think. Then again, it is language dependant, like
stemming. Also, you've to be wary that metaphone("this is a test") !=
metaphone("this") . metaphone("is") . metaphone("a") . metaphone("test").
As a last remark, my original intent was not to serve the resulting spew
directly. Rather, it was to use it as a seed for a second query that
conducts a structural analysis of the node graph. I've yet to start working
on that part. :)
--
Denis de Bernardy
http://www.semiologic.com
> -----Original Message-----
> From: wp-hackers-bounces at lists.automattic.com
> [mailto:wp-hackers-bounces at lists.automattic.com] On Behalf Of
> ml_wordpress at copperleaf.org
> Sent: Sunday, July 03, 2005 2:48 PM
> To: wp-hackers at lists.automattic.com
> Subject: Re: [wp-hackers] Overriding get_posts() behaviour
>
>
> I've also taken a look at Denis' plugin and have a few ideas
> that maybe
> you guys could add. I've modified Denis' plugin on my testbed
> just for
> fun and could send you the code if you wish. Here is a list of ideas:
>
> 1) Probably the most useful function for me was that I added
> a filter to
> the sem_search_index function that allows additional plugins to add
> additional words to the node_content column. This could actually be a
> foundation filter for the plugin in that all search words
> could be added
> that way: the_content, the_title, post_tags, and, in my case,
> data from
> columns in new tables.
>
> 2) I found that by using a fulltext search, is you search on
> 'bean', it
> won't match 'beans'. I don't know if there is a something in the
> fulltext search that can allow you to do 'like' queries. If
> not, maybe
> that could be either a global option or something selected from an
> advanced search page that would allow you to do fulltext searches or
> like searches. (BTW Denis, some hosts default to Innodb tables so in
> your create table statement you need to specify ENGINE=myisam so that
> fulltext indexes can be created.)
>
> 3) I added some code that would clean out all funky
> characters, remove
> all duplicates and collapse all whitespace in the
> node_content column.
> This can shorten the size of the field significantly and removing the
> dups is nice if you aren't doing weighted searches. Something else to
> consider would be to remove all stopwords. (Configurable from
> an admin
> page?)
>
> 4) One last idea is that perhaps an option could be so store
> the soundex
> (or some other algorithm) for the word list so that searches
> are done on
> that instead of the actual word.
>
> Anyway, like I said above, I'd be glad to send you guys my
> mods or I'd
> be glad to help with parts of this if you wish.
>
> Bill
> _______________________________________________
> wp-hackers mailing list
> wp-hackers at lists.automattic.com
> http://lists.automattic.com/mailman/listinfo/wp-hackers
>
More information about the wp-hackers
mailing list