[wp-hackers] [WPSoC] The search system improvements

Kodie kodieg at gmail.com
Wed Jul 9 11:42:06 GMT 2008


Hello everyone,

I'm participant in Google Summer of Code and I'm working on developing 
the new search system. I was asked to post here some info about my 
project before mid-term evaluation. Below is some description of my 
project so if you're not interested in reading it, you can skip this 
paragraph. So, my idea was to create framework integrated with wordpress 
which would manage searches. Firstly, it would enable adding new search 
engines (for example packaged as normal wordpress plugins). Thanks to 
this we could simply allow users to use Zend_Lucene search engine or one 
based on Google Search (now AJAX) API. Secondly, I wanted to create new 
search engine for wordpress which would allow searching in posts, pages 
and comments. I've based my design a bit on Lucene, however, I made it 
much easier and I used mysql to store index. I wanted to create plugins 
for google and zend_lucene as well. Some information you can find at: 
http://inzynieriawiedzy.org/blog/.

Testing blog is at: http://inzynieriawiedzy.org/kodie/gsoc/ (login: 
admin, password wpgsoc). Feel free not to destroy it and please do no 
harm to server. It contains more or less infos from some wikis or other 
blog. However, you may find some info useful to understand how 
everything should work.

So, what is implemented? Basically there is integrated with wordpress 
search framework. You can reach it using $wpsearch variable. Using this 
object you can register new search engine, unregister one, etc... It 
also gives administrators simple management page (Management/Search 
engines) to show what plugins are working and to give links to 
configuration (if possible). Framework handles also changing template 
for search results.

The new search engine is also working. It is searching in posts, pages, 
comments and even attachments. You can exclude words from query, use 
"terms like this", search in title (title: prefix), by author (author: 
prefix) or in specified date range (date_start: and date_end:). See this 
post for more details:  http://inzynieriawiedzy.org/kodie/gsoc/?p=22
This engine is based on idea of inverted index (used for example in 
lucene). It gives administrator page with options to clear and rebuild 
index and delete also some documents from index.

Known bugs:
  - date issue in admin panel
  - double entries for some attachments
  - no type: (post,page,comment,attachment) option in query

I've also moved old searching code to plugin. So if someone needs to use 
old search engine he will be able to do it.

You can download code here: 
http://inzynieriawiedzy.org/kodie/gsoc/wp-content/uploads/2008/07/wpgsoc.tgz 
I couldn't find where google wanted me to upload it (I found only place 
after project ends).
See this page for some notes:
http://inzynieriawiedzy.org/kodie/gsoc/?p=22

I would be very grateful for comments, ideas, testing, looking through a 
code or anything which might help me.

Best wishes,
Kodie


More information about the wp-hackers mailing list