[wp-hackers] Indexing documents for search
    John Blackbourn 
    johnbillion+wp at gmail.com
       
    Sun Oct 30 22:50:38 UTC 2011
    
    
  
On 30 October 2011 21:45, Eric Mann <eric at eam.me> wrote:
> Does anyone have any experience extending WP's search functionality to
> include the content of uploaded, non-DB housed documents?
Last year I wrote a plugin for a client which indexes the contents of
PDFs uploaded into WordPress. It uses one of the many PDF to text PHP
classes available [1]. We had mixed results with its reliability and
accuracy. For example, it's not always possible to extract text from
PDFs created in certain PDF applications, and others may give text all
squished together without spaces where you might expect. Different PHP
classes for the text extraction probably give different results.
Integrating the PDF text extraction with WordPress was simply a case
of hooking into the 'update_attached_file' hook and creating a post
containing the PDF's text which is then searchable as usual in
WordPress. If you're interested in the plugin feel free to email me
off-list.
[1] http://www.google.com/search?q=pdf+to+text+php
    
    
More information about the wp-hackers
mailing list