Getting the ExtractingRequestHandler to work in Solr

5.5k views Asked by At

I am attempting to get Solr to work with Tika so I can index Word and PDF documents in my Drupal web site.

I've looked at the Wiki page and this page and they indicate adding a requestHandler in solrconfig.xml.

I did that and now Solr throws an exception:

org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.extraction.ExtractingRequestHandler'

I have did some searches and see that others have had this problem but see no easy fix. I'm using Solr 3.4.0 on Windows Server 2003. Any ideas about how to resolve this?

As a side note I've got Drupal using Solr for searching and that is working. But what I cannot get working is to have Solr index PDF and Word documents. I'm sure this is a common need for most web sites but I have spent days on this and I cannot believe it is this poorly documented and this hard to figure out.

1

There are 1 answers

3
Jayendra On BEST ANSWER

If you are running Solr from the example directory with the jetty setup, it should run as is without any changes.

However, for multicore setup you would need to copy the jars into the lib directory.

If you check the solrconfig in the example folders, it includes the jars for solr cell and extraction libraries.

solrconfig.xml -

Uncomment this line to include all the lib jars -

<lib dir="./lib" />

Copy the jars from these folders to your multicore lib folder. These jars for used for extraction. (Apache pdfbox, poi, fontbox etc)

<lib dir="../../dist/" regex="apache-solr-cell-\d.*\.jar" />
<lib dir="../../contrib/extraction/lib" />

When you start Solr, you should see all the jars loaded. Should get you working.