How to get the magic number from File in java

15.7k views Asked by At

I have file from UploadedFile button, and I want to print the extension files by use in magic number,

My code:

UploadedFile file = (UploadedFile)valueChangeEvent.getNewValue();
byte[] fileByteArray = IOUtils.toByteArray(file.getInputStream());

pay attention: Mime type and content file (from file and from the filename) not same to magic number (magic number comes from the first bytes of the inputStream)

How can I do it?

2

There are 2 answers

0
Dewa Syahrizal MN On

I know this is an old question, just put my answer here hopefully someone finds it useful when searching for the same solution.

import java.io.File;
import java.io.IOException;
import java.io.InputStream;

import javax.servlet.ServletContext;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import javax.servlet.http.Part;

import javax.servlet.annotation.MultipartConfig;

import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;

@MultipartConfig(
    fileSizeThreshold = 0,
    maxFileSize = 1024 * 1024 * 50,       // 50MB
    maxRequestSize = 1024 * 1024 * 100)   // 100MB
public class FileUpload extends HttpServlet {    

    private static final Logger logger = LogManager.getLogger(FileUpload.class);
    private byte[] data = new byte[4];

    public void doPost(HttpServletRequest request, HttpServletResponse response)
        throws IOException, ServletException {

        response.setContentType("text/plain");
        response.setCharacterEncoding("UTF-8");

        try {
            fileSignature(request
              .getPart("image_file")
              .getInputStream());
        } catch (IOException | NullPointerException ex) {
            logger.error(ex);
        }

        String fileType = getFileType(data);

        // return the recognized type 
        response.getWriter().write(fileType);
    }

    /**
     * Get the first 4 byte of a file file signature. 
     * 
     * @param part File from part.
     */
     private void fileSignature(InputStream is)
             throws IOException, NullPointerException {
         is.read(data, 0, 4);
     }

     /**
      * Get the file type based on the file signature.
      * Here restricted to only recognized file type jpeg, jpg, png and
      * pdf where the signature of jpg and jpeg files are the same.
      *
      * @param fileData Byte array of the file.
      * @return String of the file type.
      */
     private String getFileType(byte[] fileData) {
         String type = "undefined";
         if(Byte.toUnsignedInt(fileData[0]) == 0x89 && Byte.toUnsignedInt(fileData[1]) == 0x50)
             type = "png";
         else if(Byte.toUnsignedInt(fileData[0]) == 0xFF && Byte.toUnsignedInt(fileData[1]) == 0xD8)
             type = "jpg";
         else if(Byte.toUnsignedInt(fileData[0]) == 0x25 && Byte.toUnsignedInt(fileData[1]) == 0x50)
             type = "pdf";

        return type;
    }
}

References for file magic numbers:

0
Lorenzo On

I'm putting my solution just in case people want an alternative without java-servlet related code:

public enum MagicBytes {
    PNG(0x89, 0x50),  // Define just like previous answer 
    JPG(0xFF, 0xD8),
    PDF(0x25, 0x50);
    
    private final int[] magicBytes;
    
    private MagicBytes(int...bytes) {
        magicBytes = bytes;
    }
    
    // Checks if bytes match a specific magic bytes sequence
    public boolean is(byte[] bytes) {
        if (bytes.length != magicBytes.length)
            throw new RuntimeException("I need the first "+magicBytes.length
                    + " bytes of an input stream.");
        for (int i=0; i<bytes.length; i++)
            if (Byte.toUnsignedInt(bytes[i]) != magicBytes[i])
                return false;
        return true;
    }
    
    // Extracts head bytes from any stream
    public static byte[] extract(InputStream is, int length) throws IOException {
        try (is) {  // automatically close stream on return
            byte[] buffer = new byte[length];
            is.read(buffer, 0, length);
            return buffer;
        }
    }
    
    /* Convenience methods */
    public boolean is(File file) throws IOException {
        return is(new FileInputStream(file));
    }
    
    public boolean is(InputStream is) throws IOException {
        return is(extract(is, magicBytes.length));
    }
}

Then just call like this depending on if you have a file or InputStream :

MagicBytes.PNG.is(new File("picture.png"))
MagicBytes.PNG.is(new FileInputStream("picture.png"))

Being an enum also allows us to loop over each format if we need to by using MagicBytes.values().

EDIT: The previous code i put is a simplified version of the actual enum i'm using for my own lib, but adapted using the previous answer to help people understand faster. However, some file formats might have different kinds of headers, so this class would be more appropriate if that is an issue for your specific use-case: gist