I'm working with some old legacy code and getting some build errors. I have a zip file called vocab100k.zip, and the code says that it should unzip to include 2 files: vocab.100k.utf8 and vectors.100k.utf8.
When I try to run System.IO.Compression.ZipFile.OpenRead(zipFileFullPath), I get System.IO.InvalidDataException: 'End of Central Directory record could not be found.' When I try to manually unzip through the File Explorer using WinRAR, I get "Unexpected end of archive".
Double clicking to preview the contents shows me that one of my two files is present inside.

I used WinRAR's repair function but attempted extraction on the repaired zip will load to about 90% before it throws the folowing errors.
I suspect that this may have been one of a multi-part zip at some point, and the later zips have been lost. Is there any way to extract even a partial of the vectors.100k.utf8 that I see there? Are there maybe other ways the zip could have been corrupted?

Recovering Data from a Truncated Zip File
Assuming the file is simply truncated in the middle of
vectors.100k.utf8and the corruption isn't more serious, you should be able to recover part of the data. The output you've shown does suggest that this is a truncation issue. Won't know for sure without the zipdetails output I requested.If this is just a truncation issue, you may be able to uncompress what is present with the perl script,
recoverzip, below. This should work on Windows, MacOS or Linux -- the only prerequisite is you needperlinstalled.The script takes three parameters
This script isn't guaranteed to get any data from a truncated zip file, but it can in some cases. It just depends where the truncation is at.
Create a truncated zip file
Here is a worked example to show how it works. Note that I'm using Linux tools to generate the truncated zip file. The recovery part is not dependent on Linux -- all just need is to have
perlinstalled on your system.First pick an input file to add to a zip file
Add
lorem.txtto a zip file calledtry.zipNow we need to truncate
try.zipin the middle of thelorem.txtmember. To do that we need to understand where the compressed data lives at in the zip file. Can use zipdetails to get that information.There is quite a lot of output from zipdetails, but for our purposes we need to look at the
PAYLOADline -- that shows the offset where the compressed data forlorem.txtstarts. In this case it is hex 043. The next field is theCENTRAL HEADERat offset hex 0151. So that means the compressed payload starts at offset 0x43 and ends at 0x150.Now truncate the zip file in the middle of the
lorem.txtcompressed data at offset 0x100 and write the truncated zip file totrunc.zipWe now have a sample truncated zip file to test. First check what
unzipthinks of the truncated file - it shows a very similar error to yoursRecover data from the truncated zip file
Now run the
recoverzipscript to see if we can get any data from the zip file..The
unexpected end of fileerror is to be expected in this use-case.Finally, let's see what data was recovered
Success! In this instance we have recovered some of the data from
lorem.txt.