I have an application that occasionally needs to be able to read improperly closed gzip files. The files behave like this:
>>> import gzip
>>> f = gzip.open("path/to/file.gz", 'rb')
>>> f.read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/gzip.py", line 292, in read
return self._buffer.read(size)
File "/usr/lib/python3.8/gzip.py", line 498, in read
raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached
I wrote a function to handle this by reading the file line by line and catching the EOFError, and now I want to test it.
The input to my test should be a gz file that behaves in the same way as demonstrated. How do I make this happen in a controlled testing environment?
I really strongly prefer not making a copy of the improperly closed files that I get in production.
Very simple: do the compression, then snip the result.
Even easier, just two bytes is enough for the
gzipmodule to recognise the gzip format, but is obviously not a complete compressed file.This is in-memory for the simplicity of demonstration; it would work the same if you manipulated the file instead of the string. For example: