Decompressing tar file with Apache Commons Compress

8.4k views Asked by At

I'm using Apache Commons Compress to create tar archives and decompress them. My problems start with this method:

    private void decompressFile(File file) throws IOException {
    logger.info("Decompressing " + file.getName());

    BufferedOutputStream outputStream = null;
    TarArchiveInputStream tarInputStream = null;

    try {
        tarInputStream = new TarArchiveInputStream(
                new FileInputStream(file));

        TarArchiveEntry entry;
        while ((entry = tarInputStream.getNextTarEntry()) != null) {
            if (!entry.isDirectory()) {
                File compressedFile = entry.getFile();
                File tempFile = File.createTempFile(
                        compressedFile.getName(), "");

                byte[] buffer = new byte[BUFFER_MAX_SIZE];
                outputStream = new BufferedOutputStream(
                        new FileOutputStream(tempFile), BUFFER_MAX_SIZE);

                int count = 0;
                while ((count = tarInputStream.read(buffer, 0, BUFFER_MAX_SIZE)) != -1) {
                    outputStream.write(buffer, 0, count);
                }
            }

            deleteFile(file);
        }
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        if (outputStream != null) {
            outputStream.flush();
            outputStream.close();
        }
    }
}

Every time I run the code, compressedFile variable is null, but the while loop is iterating over all entries in my test tar.

Could you help me to understand what I'm doing wrong?

2

There are 2 answers

1
O.C. On BEST ANSWER

Try using getNextEntry() method instead of getNextTarEntry() method.

The second method returns a TarArchiveEntry. Probably this is not what you want!

1
blackbird014 On

From the official documentation
Reading entries from an tar archive:

    TarArchiveEntry entry = tarInput.getNextTarEntry();
    byte[] content = new byte[entry.getSize()];
    LOOP UNTIL entry.getSize() HAS BEEN READ {
        tarInput.read(content, offset, content.length - offset);
    }

I have written an example starting from your implementation and testing with a very trivial .tar (just one entry of text).
Not knowing the exact requirement I just take care of solving the problem of reading the archive avoiding the nullpointer. Debugging, the entry is available as you also have found

    private static void decompressFile(File file) throws IOException {

        BufferedOutputStream outputStream = null;
        TarArchiveInputStream tarInputStream = null;

        try {
            tarInputStream = new TarArchiveInputStream(
                new FileInputStream(file));

            TarArchiveEntry entry;
            while ((entry = tarInputStream.getNextTarEntry()) != null) {
                if (!entry.isDirectory()) {
                    File compressedFile = entry.getFile();
                    String name = entry.getName();

                    int size = 0;
                    int c;
                    while (size < entry.getSize()) {
                        c = tarInputStream.read();
                        System.out.print((char) c);
                        size++;
                }
    (.......)

AS I said: I tested with a tar including only an entry of text (you can also try this approach to verify the code) to be sure that the null is avoided.
You need to make all the needed adaptations for your real needs. It is clear that you will have to handle streams as in the metacode I posted on top.
It shows how to deal with the single entries.