How to add new mime type to apache tika

7.1k views Asked by At

This is my class for reading mime types. I am trying to add a new mime type(properties file) and read it.

This is my class file:

/*
 * To change this license header, choose License Headers in Project Properties.
 * To change this template file, choose Tools | Templates
 * and open the template in the editor.
 */
package check_mime;

import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;
import org.apache.tika.Tika;
import org.apache.tika.mime.MimeTypes;


public class TikaFileTypeDetector {

    private final Tika tika = new Tika();

    public TikaFileTypeDetector() {
        super();
    }

    public String probeContentType(Path path) throws IOException {

        // Check contents first
        String fileContentDetect = tika.detect(path.toFile());
        if (!fileContentDetect.equals(MimeTypes.OCTET_STREAM)) {
            return fileContentDetect;
        }

        // Try file name only if content search was not successful
        String fileNameDetect = tika.detect(path.toString());
        if (!fileNameDetect.equals(MimeTypes.OCTET_STREAM)) {
            return fileNameDetect;
        }

        return null;
    }

    public static void main(String[] args) throws IOException {

        Tika tika = new Tika();

        if (args.length != 1) {
            printUsage();
            return;
        }
        Path path = Paths.get(args[0]);

        TikaFileTypeDetector detector = new TikaFileTypeDetector();

        String contentType = detector.probeContentType(path);

        System.out.println("File is of type - " + contentType);
    }

    public static void printUsage() {
        System.out.print("Usage: java -classpath ... "
                + TikaFileTypeDetector.class.getName()
                + " ");
    }
}

From the docs I have created a custom xml:

 <?xml version="1.0" encoding="UTF-8"?>
 <mime-info>
   <mime-type type="text/properties">
          <glob pattern="*.properties"/>
   </mime-type>
 </mime-info>

Now how do I add to my program and read it. Do I have to create a parser? I'm stuck here.

4

There are 4 answers

5
Gagravarr On

This is covered in the Apache Tika 5 minute parser instructions. To add support for Java .properties files, you should first create a file called custom-mimetypes.xml and populate it with something like:

<?xml version="1.0" encoding="UTF-8"?>
<mime-info>
  <mime-type type="text/properties">
     <_comment>Java Properties</_comment>
     <glob pattern="*.properties"/>
     <sub-class-of type="text/plain"/>
   </mime-type>
</mime-info>

Next, you need to put that somewhere that Tika can find it, with the right name. It must be stored as org/apache/tika/mime/custom-mimetypes.xml on your classpath. The easiest thing to do is to create that directory structure, move the new file in, then add the root directory to your classpath. For deployment, you should wrap that up into a jar and put it on the classpath

You can use the Tika App to check your mime type file was loaded, if you're careful. With your code pacakged as a jar, run it as something like:

java -classpath tika-app-1.10-SNAPSHOT.jar:my-custom-mimetypes.jar org.apache.tika.cli.TikaCLI --list-supported-types | grep text/properties

Alternately, if you have it in a local directory, try something like

ls -l org/apache/tika/mime/custom-mimetypes.xml
# Check a file was found, with some content in it
java -classpath tika-app-1.10-SNAPSHOT.jar:. org.apache.tika.cli.TikaCLI --list-supported-types | grep text/properties

If that isn't showing your mime type, then you didn't get the path or filename correct, double check them

(Alternately, upgrade to a newer version of Apache Tika, as since r1686315 Tika has a Java Properties mimetype built in!)

7
wero On

Tika will detect your custom definition via Java resource loading and automatically add it to its own definitions: For that you need to name it custom-mimetypes.xml and put it into package org.apache.tika.mime within your codebase.

If you create a jar file from your classes, you also need to include your custom-mimetypes.xml in the jar.

2
Jagdishkumar Patel On
MediaType mediaType = detector.detect(stream, metadata);
        System.out.println("Detected Media Type: " + mediaType.toString());
        MimeType mimeType = config.getMimeRepository().forName(mediaType.toString());
        String extension = mimeType.getExtension();
0
mahfuj asif On

In your resources folder add package org\apache\tika\mime and create file custom-mimetypes.xml.

Put the following code

<?xml version="1.0" encoding="UTF-8"?>
<mime-info>
  <mime-type type="custom-mime-type">
    <glob pattern="*.custom-extension"/>
  </mime-type>
</mime-info>

Replace custom-mime-type with your mime type and custom-extension with your extension. Please check bellow the directory structure.

Btw you can also load tike mime-types locally by downloading that file and placing alongside custom-mimetypes.xml . This is helpful only when you need to change standard tike mime-types. One thing to remember you can not have same mime-type/extension in both xml.

enter image description here