Python zipfile, bizarre limit to number of files: "folder is invalid"

1.6k views Asked by At

The computer is toying with me, I know it!

I am creating a zip folder in Python. The individual files are generated in memory and then the whole thing is zipped and saved to a file. I am allowed to add 9 files to the zip. I am allowed to add 11 files to the zip. But 10, no, not 10 files. The zip file IS saved to my computer, but I'm not allowed to open it; Windows says that the compressed zipped folder is invalid.

I use the code below, which I got from another stackoverflow question. It appends 10 files and saves the zipped folder. When I click on the folder, I cannot extract it. BUT, remove one of the appends() and it's fine. Or, add another append and it works!

What am I missing here? How can I make this work every time?

imz = InMemoryZip() 
imz.append("1a.txt", "a").append("2a.txt", "a").append("3a.txt", "a").append("4a.txt", "a").append("5a.txt", "a").append("6a.txt", "a").append("7a.txt", "a").append("8a.txt", "a").append("9a.txt", "a").append("10a.txt", "a")
imz.writetofile("C:/path/test.zip") 


import zipfile
import StringIO
class InMemoryZip(object):
    def __init__(self):
        # Create the in-memory file-like object
        self.in_memory_zip = StringIO.StringIO()

    def append(self, filename_in_zip, file_contents):
        '''Appends a file with name filename_in_zip and contents of 
        file_contents to the in-memory zip.'''
        # Get a handle to the in-memory zip in append mode
        zf = zipfile.ZipFile(self.in_memory_zip, "a", zipfile.ZIP_DEFLATED, False)

        # Write the file to the in-memory zip
        zf.writestr(filename_in_zip, file_contents)

        # Mark the files as having been created on Windows so that
        # Unix permissions are not inferred as 0000
        for zfile in zf.filelist:
            zfile.create_system = 0        

        return self

    def read(self):
        '''Returns a string with the contents of the in-memory zip.'''
        self.in_memory_zip.seek(0)
        return self.in_memory_zip.read()

    def writetofile(self, filename):
        '''Writes the in-memory zip to a file.'''
        f = file(filename, "w")
        f.write(self.read())
        f.close()
1

There are 1 answers

0
Sir Spock On

You should use the 'wb' mode when creating the file you are saving to the file system. This will ensure that the file is written in binary.

Otherwise, any time a newline (\n) character happens to be encountered in the zip file python will replace it to match the windows line ending (\r\n). The reason 10 files is a problem is that 10 happens to be the code for \n.

So your write function should look like this:

def writetofile(self, filename):
    '''Writes the in-memory zip to a file.'''
    f = file(filename, 'wb')
    f.write(self.read())
    f.close()

This should fix your problem and work for the files in your example. Although, in your case you might find it easier to write the zip file directly to the file system like this code which includes some of the comments from above:

import StringIO
import zipfile

class ZipCreator:
    buffer = None

    def __init__(self, fileName=None):
        if fileName:
            self.zipFile = zipfile.ZipFile(fileName, 'w', zipfile.ZIP_DEFLATED, False)
            return

        self.buffer = StringIO.StringIO()
        self.zipFile = zipfile.ZipFile(self.buffer, 'w', zipfile.ZIP_DEFLATED, False)

    def addToZipFromFileSystem(self, filePath, filenameInZip):
        self.zipFile.write(filePath, filenameInZip)

    def addToZipFromMemory(self, filenameInZip, fileContents):
        self.zipFile.writestr(filenameInZip, fileContents)

        for zipFile in self.zipFile.filelist:
            zipFile.create_system = 0

    def write(self, fileName):            
        if not self.buffer:  # If the buffer was not initialized the file is written by the ZipFile
            self.zipFile.close()
            return

        f = file(fileName, 'wb')
        f.write(self.buffer.getvalue())
        f.close()

# Use File Handle
zipCreator = ZipCreator('C:/path/test.zip')

# Use Memory Buffer
# zipCreator = ZipCreator()

for i in range(1, 10):
    zipCreator.addToZipFromMemory('test/%sa.txt' % i, 'a')

zipCreator.write('C:/path/test.zip')

Ideally, you would probably use separate classes for an in-memory zip and a zip that is tied to the file system from the beginning. I have also seem some issues with the in-memory zip when folders are added which are difficult to recreate and which I am still trying to track down.