What is the best way to feed training data from parquet file to a Tensorflow/Keras model?

635 views Asked by At

I have a training dataset stored on S3 in parquet format. I wish to load this data into a notebook (on databricks cluster) and train a Keras model on it. There are few ways that I can think of to train Keras model on this dataset:

  • read parquet file from S3 in batches (maybe using Pandas) and feed these batches to the model
  • using Tensorflow IO APIs (this might require to copy parquet from S3 to local env on notebook)
  • using Petastorm package (from Uber) - this also might require to copy parquet from S3 to local notebook's environment

What is the best way to train a model in such case, such that it would be easier to scale the training to larger training datasets?

0

There are 0 answers