I have the binary content of a pdf file, and I want to upload it to SOLR and index its content:
ContentStreamUpdateRequest up = new ContentStreamUpdateRequest('/update/extract')
up.setParam("literal.id", map.id)
def tmpFile = null
tmpFile = File.createTempFile(map.id, ".tmp")
tmpFile.append(binary)
up.addFile(tmpFile, ".pdf")
// Do the SOLR stuff here
def solr = getSolrServer()
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true)
def response = solr.request(up)
if (tmpFile) {
tmpFile.delete()
}
return response
When I query SOLR, I can retrieve the SOLR document. How can I get the actual content of the file? Basically I need to find the word count of the document I've uploaded so I was planning to do a size() on the string returned (if that's even possible)....
I'm very new to SOLR so am probably on the wrong track... any assistance greatly appreciated :)
I am assuming you want to count the number of words in the PDF which you have indexed. Make sure that
Once you do this you can find the number of words either using facets or Term vector component. The below SO answer might be helpful:
https://stackoverflow.com/a/26933126/689625