SOLR POST files with no extension

211 views Asked by At

I am using SOLR 5 and I want to scan documents that have no extensions. Unfortunately changing the file to have extensions is not an option in my case.

the command I am using is simply:

$bin/post -c mycore ../foldertobescaned -type application/pdf

the command works fine for documents that do have extension but I am getting:

Entering auto mode. File endings considered are xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log

1

There are 1 answers

0
tkja On

If renaming the files is not an option, you can use the following script as a workaround until Solr improves its post method. It is a simple bash for loop that submits each file individually and works regardless of the file extension. Note that this script will be slower than using post on the whole folder, because each individual file transfer needs to be initialized.

Save the script below as postFolderToSolr.sh inside your Solr folder (so that Solrs bin/ folder is a subdirectory), make it executable with chmod +x postFolderToSolr.sh and then use it as follows: ./postFolderToSolr.sh mycore /home/user1/foldertobescaned/ application/pdf

Using no arguments or the wrong number of arguments prints a short usage message as help.

#!/bin/bash
set -o nounset

if [ "$#" -ne 3 ]
then
echo "Post contents of a folder to Solr."
echo
echo "Usage: postFolderToSolr.sh <colletionName> </path/to/folder> <MIME>"
echo
exit 1
fi

collection=$1
inputPath=${2%/} # remove suffix / if it exists
mime=$3

for element in $inputPath"/"*; do
    bin/post -c $collection -type $mime $element
done