(Python/tqdm) Getting all zeros in timer when working with Pandas

1.7k views Asked by At

I'm trying to display a progress bar whenever a file is loaded into pandas. But all I get back is this.

0it [00:00, ?it/s]

Here is the code I'm working with.
I'm importing tqdm based on some examples that I've found.

from tqdm import tqdm

def function(self):
    params = self.getGuiParams()
    filename = params['fileNameLineEdit']
    keyname = params['dataSetNameLineEdit']
        print('Loading data file: ' + str(filename))
        chunksize = 50000
        df = tqdm(pd.read_csv(filename, header=[0, 1], chunksize=chunksize, iterator=True))
        self.data[keyname] = spectral_data(df)
    except Exception as e:
        print('Problem reading data: {}'.format(e))

There are 2 answers

Danielle M. On BEST ANSWER

tqdm requires an iterator. While you are using the iterator=True option for read_csv, you are assigning the resulting TextFileReader object back to df, without actually iterating on it.

Try something like:

tfr = pd.read_csv(filename, header=[0, 1], chunksize=chunksize, iterator=True
with tqdm() as pbar:
  # do something with the chunk

I've never used tqdm so that may not work out of the box - you might need to calculate file size and how many chunks that will take etc.

Christian Steinmeyer On

In addition to the other answer, that manually updates the tqdm progress bar, I'd like to suggest an alternative that might be a little more intuitive:

text_file_reader = pd.read_csv(filename, chunksize=chunksize, iterator=True)
for chunk in tqdm(text_file_reader):
    # chunk is a pd.DataFrame with *chunksize* rows of pd.read_csv(filename)
    # (the last chunk might have fewer rows)

    # do something with the chunk

This will not give you a standard progress bar slowly filling up to 100%. Instead, you will get information about how many chunks were already processed and what their average processing time was: like so:

18/? [00:22<00:00, 1.29s/it]

One might be able to fill the progress bar with meaningful data - however, as I see it this would require some kind of estimate of number or rows from file size which doesn't seem trivial to me.