how to report progress of data read on a QuaGzipFile (QuaZIP library)

1.2k views Asked by At

I am using QuaZIP 0.5.1 with Qt 5.1.1 for C++ on Ubuntu 12.04 x86_64.

My program reads a large gzipped binary file, usually 1GB of uncompressed data or more, and makes some computations on it. It is not computational-extensive, and most of the time is passed on I/O. So if I can find a way to report how much data of the file is read, I can report it on a progress bar, and even provide an estimation of ETA.

I open the file with:

QuaGzipFile gzip(fileName);
if (!gzip.open(QIODevice::ReadOnly))
{
    // report error
    return;
}

But there is no functionality in QuaGzipFile to find the file size nor the current position.

I do not need to find size and position of uncompressed stream, the size and position of compressed stream are fine, because a rough estimation of progress is enough.

Currently, I can find size of compressed file, using QFile(fileName).size(). Also, I can easily find current position in uncompressed stream, by keeping sum of return values of gzip.read(). But these two numbers do not match.

I can alter the QuaZIP library, and access internal zlib-related stuff, if it helps.

3

There are 3 answers

1
Pavel Strakhov On BEST ANSWER

There is no reliable way to determine total size of uncompressed stream. See this answer for details and possible workarounds.

However, there is a way to get position in compressed stream:

QFile file(fileName);
file.open(QFile::ReadOnly);
QuaGzipFile gzip;
gzip.open(file.handle(), QuaGzipFile::ReadOnly);
while(true) {
  QByteArray buf = gzip.read(1000);
  //process buf
  if (buf.isEmpty()) { break; }
  QFile temp_file_object;
  temp_file_object.open(file.handle(), QFile::ReadOnly);
  double progress = 100.0 * temp_file_object.pos() / file.size();
  qDebug() << qRound(progress) << "%";
}

The idea is to open file manually and use file descriptor to get position. QFile cannot track external position changes, so file.pos() will be always 0. So we create temp_file_object from the file descriptor forcing QFile to request file position. I could use some lower level API (such as lseek()) to get file position but I think my way is more cross-platform.

Note that this method is not very accurate and can give progress values bigger than real. That's because zlib can internally read and decode more data than you have already read.

0
Ali On

Using an ugly hack to zlib, I was able to find position in compressed stream.

First, I copied definition of gz_stream from gzio.c (from zlib-1.2.3.4 source), to the end of quagzipfile.cpp. Then I reimplemented the virtual function qint64 QIODevice::pos() const:

qint64 QuaGzipFile::pos() const
{
    gz_stream *s = (gz_stream *)d->gzd;
    return ftello64(s->file);
}

Since quagzipfile.cpp and quagzipfile.h seem to be independent from other QuaZIP library files, maybe it is better to copy the functionality I need from these files and avoid this hack?

The current version of program is something like this:

QFile infile(fileName);
if (!infile.open(QIODevice::ReadOnly))
    return;
qint64 fileSize = infile.size;
infile.close();

QuaGzipFile gzip(fileName);
if (!gzip.open(QIODevice::ReadOnly))
    return;
qint64 nread;
char buffer[bufferSize];
while ((nread = gzip.read(&buffer, bufferSize)) > 0)
{
    // use buffer
    int percent = 100.0 * gzip.pos() / fileSize;
    // report percent
}
gzip.close();
1
Mark Adler On

In zlib 1.2.4 and greater you can use the gzoffset() function to get the current position in the compressed file. The current version of zlib is 1.2.8.