Keras - with regards to performance - is bcolz better than using datagenerator?

367 views Asked by At

I am struggling with the following points:

  1. When should bcolz be used instead of keras' data generator? Looks like the keras' model has apis to accept an array with batch or define the data generator as well.
  2. Is there a performance improvement when using bcolz with fit() api over using a data generator with fit_generator()?

Finally, there's a fastai post mentioning dask at this post

  1. Is dask better than bcolz?

Thanks!

1

There are 1 answers

0
Littleone On
  1. Keras data generator's flow_from_directory(directory) takes in ' PNG, JPG, BMP or PPM' images only, ofc you could extend it but bcolz is a quick fix. Which is why bcolz is perfect for pre-computed convolution features. Thus, save those features as bcolz arrays and load them into batches for fit_generator.
  2. fit_generator() with data generator (could be bcolz datagenerator) would be quicker than fit on just bcolz.

Is Dask better than bcolz? Dask isn't strictly an alternative for bcolz, Dask can work with bcolz arrays. And in tasks with huge datasets, it can provde a speed up because it has great support for parallelism. Bcolz is a nice compressed data container and I'd suggest using dask on top of bcolz if you need that speed up.