I am creating a zip file using ZipArchive + FileStream. When new item is added into zip file, I would like to flush/write newly added item to underneath zip stream.
The code below is not flushing the individual zip item. The whole zip gets written to output.zip when FileStream disposes.
var files = Directory.GetFiles("C:\\Temp","*.pdf");
using (var output = new FileStream("c:\\temp\\output.zip", FileMode.Create, FileAccess.Write))
{
using (System.IO.Compression.ZipArchive zip = new ZipArchive(output, ZipArchiveMode.Create, true))
{
foreach (var file in files)
{
using (var internalFile = new FileStream(file, FileMode.Open))
{
var zipItem = zip.CreateEntry(Path.GetFileName(file));
using var entryStream = zipItem.Open();
{
await internalFile.CopyToAsync(entryStream).ConfigureAwait(false);
}
}
await output.FlushAsync();
// after each file flush the output stream.
// expectation at this point, individual zip item will be written to physical file.
// however I don't see the file size changes in windows explorer.
} // put breakpoint here
}
} // The whole output get flush at this point when FileStream is disposed
I'm going to say "this is by design".
It certainly looks like it will hard to get any different behaviour.
The reason why this might be of value from a design point of view relates to how the zip process works. It identifies repeating series of bytes, and rather than writing that series out several times, it writes it once then whenever that sequence of bytes is required, it writes a reference, rather than the entire sequence. That's how the zip file gets to be smaller than the original file. (Caveat: that's my understanding, in lay terms, and it's been a long time since I looked at the zip algorithm).
So it's 'of value' to have the whole file available before it writes, to optimise the identification of duplicate sequences of bytes.
This is some code that looks like ZipArchive from the dotnet runtime github repo.
https://github.com/dotnet/runtime/blob/6072e4d3a7a2a1493f514cdf4be75a3d56580e84/src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs
(It might not be the latest, or the actual version you're running though).
It looks like compression is done from the
private void WriteFile()
method. Certainly that's where theseek(0)
happens. This method isprivate
and it's only referenced from theDispose()
method.Your code is calling
FlushAsync()
on your output stream. This is a standard IO File stream. When you callFlushAsync()
it will be writing all of the bytes that theZipArchive
object has given it. Unfortunately, that will be zero bytes.You could try disposing the ZipArchive after each object is written, but I think that would not be a very happy experiment. I suspect it would rewrite the entire stream each time, rather than individually adding new elements (but I'm not sure).