Finding out file type based on character inside a file

132 views Asked by At

I want to find out file type of .properties file which is basically a text file. Using Apache tika and other mime type detectors prints out a .properties file as "text/plain" file as the magic number used for text file and .properties would be same.

I want to some how figure out a way to distinguish this based on special character or symbol inside a .properties file which is = symbol or ascii value = chr(61) 00111101 between key/value pairs along with validation of .extension type.

So if I say to validate a file: If the file contains = sign and .extension type is .properties then return out put as .properties file. I am not sure is this a good approach and also If I can achieve it, how would I add this other mime type detectors so that I can detect all other formats as well instead of having different custom classes.

Note: I tried adding custom type to apche tika which didn't work at all. May be some other library if you can suggest (example: MimeUtils)

2

There are 2 answers

2
iullianr On

First of all, you have to know the type of the file you are checking (if is text or binary, etc.) since you have to know how to read it. So the first step is to detect that is a text/plain file type. Secondly, to determine if it is a properties file, is not enough to check for "=". Because you might have this:

key1=val1=val3
key2=val4
key3

From the three lines above, only the second is a valid properties file line. so you need to check that each line of the file follows the pattern (it is limited strict only to characters and numbers but you get the idea):

^[a-zA-Z0-9_]+=[a-zA-Z0-9_]+!

I think one easy way to validate a properties file is to just load the file content into a Properties object (see java.util.Properties, has a method to load it from a resource).

0
VGR On

A Java properties file almost always has a ".properties" extension. Other than that, it has no identifiable signature. Most mechanisms which read such files expect them to be ISO-8859-1 text files (since that was required prior to Java 6), so even checking if they only contain ASCII bytes isn't sufficient.

If you have some idea of which keys will be in the file, you should load it with Properties.load, and check for those keys in the Properties object. Otherwise, checking the extension in the filename is probably the most reliable thing you can do.