(Python/tqdm) Getting all zeros in timer when working with Pandas

1.7k views Asked by At

I'm trying to display a progress bar whenever a file is loaded into pandas. But all I get back is this.

0it [00:00, ?it/s]

Here is the code I'm working with.
I'm importing tqdm based on some examples that I've found.

from tqdm import tqdm
...

def function(self):
    params = self.getGuiParams()
    filename = params['fileNameLineEdit']
    keyname = params['dataSetNameLineEdit']
    try:
        print('Loading data file: ' + str(filename))
        self.datakeys.append(keyname)
        chunksize = 50000
        df = tqdm(pd.read_csv(filename, header=[0, 1], chunksize=chunksize, iterator=True))
        self.data[keyname] = spectral_data(df)
    except Exception as e:
        print('Problem reading data: {}'.format(e))
2

There are 2 answers

4
Danielle M. On BEST ANSWER

tqdm requires an iterator. While you are using the iterator=True option for read_csv, you are assigning the resulting TextFileReader object back to df, without actually iterating on it.

Try something like:

tfr = pd.read_csv(filename, header=[0, 1], chunksize=chunksize, iterator=True
with tqdm() as pbar:
  # do something with the chunk
  pbar.update()

I've never used tqdm so that may not work out of the box - you might need to calculate file size and how many chunks that will take etc.

1
Christian Steinmeyer On

In addition to the other answer, that manually updates the tqdm progress bar, I'd like to suggest an alternative that might be a little more intuitive:

text_file_reader = pd.read_csv(filename, chunksize=chunksize, iterator=True)
for chunk in tqdm(text_file_reader):
    # chunk is a pd.DataFrame with *chunksize* rows of pd.read_csv(filename)
    # (the last chunk might have fewer rows)

    # do something with the chunk

This will not give you a standard progress bar slowly filling up to 100%. Instead, you will get information about how many chunks were already processed and what their average processing time was: like so:

18/? [00:22<00:00, 1.29s/it]

One might be able to fill the progress bar with meaningful data - however, as I see it this would require some kind of estimate of number or rows from file size which doesn't seem trivial to me.