Adding to custom detector class to apache tika

1k views Asked by At
public class CustomDetector implements Detector {

public MediaType detect(InputStream stream, Metadata metadata) throws IOException {
    MediaType type = MediaType.OCTET_STREAM;

    InputStream lookahead = new LookaheadInputStream(stream, 1024);
    try {
        //Detect File Type
        File file = new File("ToolConfig.properties");
        Tika tika = new Tika();
        String filetype = tika.detect(file);

        //Read File content
        Properties properties = new Properties();
        properties.load(new FileInputStream("ToolConfig.properties"));
        for (String key : properties.stringPropertyNames()) {
            String value = properties.getProperty(key);
            if (key instanceof String && value instanceof String && filetype.contains("text/plain")) {
                type = MediaType.application("properties");
            }
        }

    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        lookahead.close();
    }
    return type;
    }
}

Using tika I want to detect .properties file as properties file(text/properties) based on key and values format present in properties file else as a text file(text/plain)

Above I written a custom class which implements Detector interface of tika and also created a custom file for mime type:

<mime-info>
<mime-type type="text/properties">
<glob pattern="*.properties"/>
</mime-type>
</mime-info>

Added the above custom class to a jar file along with META-INF/services/org.apache.tika.detect.Detector file but when I run the program Its printing a .properties file as text/plain but not as text/properties file

I am not sure what went wrong and there's not much info about adding custom mime or customizing the existing parsers of tika.

1

There are 1 answers

5
Buhake Sindi On

It seems that your XML starts with (some) space(s), try and remove the space(s) at the very beginning of the XML, like so:

<mime-info>
    <mime-type type="text/properties">
        <glob pattern="*.properties"/>
    </mime-type>
</mime-info>

I would prefer that you add an XML declaration at the very first line of your file and on the very next line proceed to the instruction I've mentioned above, like so:

<?xml version="1.0" encoding="utf-8"?>
<mime-info>
    <mime-type type="text/properties">
        <glob pattern="*.properties"/>
    </mime-type>
</mime-info>

I hope this helps.