I am using apache Solr and Java to attempt to index some files. I have been unsuccessful using Java and solrj. I am using version 5.2, but I have also tried with 5.1 and no success
I can use curl to send a file for indexing and then I can successfully search for this file with Solr. This is the command I use:
curl "http://solraddress/solr/my_core/update/extract?literal.id=testdoc&commit=true" -F "testfile=@/Users/lesson2.pdf"
As said, this works I can then search for this file and get it.
Using solrj I was attempting to use this code to send a file for indexing:
ContentStreamUpdateRequest req = new ContentStreamUpdateRequest("/update/extract");
req.addFile(myFile, "application/octet-stream");
req.setParam("literal.id", "testfile1.pdf");
req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
NamedList<Object> result = solr.request(req);
System.out.println("Result: " + result);
This yields this error:
Error adding field 'stream_size'='null' msg=For input string: "null" using ContentStreamUpdateRequest
I could not find a solution for that error so I said, I'll just make my own wrapper to do this. I got the headers from my curl request, which were:
> POST solr/my_core/update/extract?literal.id=testdoc&commit=true HTTP/1.1
> User-Agent: curl/7.37.1
> Host: MyHost
> Accept: */*
> Content-Length: 220
> Expect: 100-continue
> Content-Type: multipart/form-data; boundary=------------------------aad460cc324256ec
and built a POST request to contain these headers and a multipart file in the body of the request doing so gives me a 200 response and the body:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">137</int></lst>
</response>
Which seems like a positive response, as it matches the response my curl request gives me, yet the file does not appear to ever have been indexed, as I can not find it on solr.
Anyone have any idea?
It's a bug in Solr 5. There is an opened ticket on Solr JIRA to resolve this problem:
SOLR-7498: Error adding field 'stream_size'='null' msg=For input string: "null" using ContentStreamUpdateRequest