Upload a file to solr with my own parameters added

2k views Asked by At

I would like to upload a file (some ms word document) for instance to solr, but I would like to add my own fields to this upload, like the userId of the person who uploaded it or a number of tags. The content of the file must be parsed and searchable and the exta parameters should be added as fields. Therefor I have added the following definition in schema.xml

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.1">
  <types>
   <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
   <fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
    <!-- A general text field that has reasonable, generic
         cross-language defaults: it tokenizes with StandardTokenizer,
     removes stop words from case-insensitive "stopwords.txt"
     (empty by default), and down cases.  At query time only, it
     also applies synonyms. -->
    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>
 </types>


 <fields>
    <field name="documentId" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="text" type="string" indexed="true" stored="false" multiValued="true"/>
<dynamicField name="metadata_*" type="text_general" indexed="true" stored="true" multiValued="true"/>
 </fields>

 <uniqueKey>documentId</uniqueKey>
 <defaultSearchField>text</defaultSearchField>
 <solrQueryParser defaultOperator="AND"/>

</schema>

The relevant part of my solrconfig.xml now looks like this:

  <equestHandler name="/update/extract" 
                startup="lazy"
                class="solr.extraction.ExtractingRequestHandler">
 <lst name="defaults">
   <str name="fmap.content">text</str>
   <str name="lowernames">true</str>
   <str name="fmap.documentId">documentId</str>
   <!-- also tried with
   <str name="fmap.literal.documentId">documentId</str>
   and
   <str name="literal.documentId">documentId</str>
   -->
   <str name="uprefix">metadata_</str>

   <!-- capture link hrefs but ignore div attributes -->
   <str name="captureAttr">true</str>
   <str name="fmap.a">links</str>
   <str name="fmap.div">ignored_</str>
  </lst>
  </requestHandler>

However no matter what combination I try with this command:

java -Durl=http://localhost:9090/solr/update/extract?documentId=test -jar post.jar somedoc.pdf

or

java -Durl=http://localhost:9090/solr/update/extract?literal.documentId=test -jar post.jar somedoc.pdf

I keep on getting missing required field for documentId

Regards Ronald

2

There are 2 answers

0
vinnie On BEST ANSWER

I had the same issue and the problem was the name of my field "documentId". Turns out there is a problem checking for required fields when the field name ends in "Id" (capital I)

See this other question which helped me figure it out : Solr - Missing Required Field

I changed my field name to "id" and all is fine now. This really makes no sense and has probably driven a few people completely crazy

8
Fuxi On

The reason you have 0 docs it probably you are not specifying documentId (or any other required fields for that matter), and indexing is failing on that (look up the logs).

You have to just fallow example: http://wiki.apache.org/solr/ExtractingRequestHandler#Getting_Started_with_the_Solr_Example

To add any field to document indexed with Tika you have to use literal parameter. In your case it might be:

&literal.userId=123&literal.documentId=doc1

If you have some other question, please ask (add possibly add some more details: what your command looks like, errors from the log)