Preprocessing large data in databricks community edition

385 views Asked by At

I have 16 GB dataset and want to use it in databricks. However, in community edition DBFS limit is 10 GB. May you please assist me to preprocess the data to be able to move it from driver to DBFS.

1

There are 1 answers

1
Alex Ott On

The simplest way for that is not to use DBFS (it's designed only for temporary data), but host data & results in your own environment, like, AWS S3 bucket or ADLS (could be a higher transfer costs).

If you can't use it, then solution depends on other factors - what is the input file format, like, is it compressed/uncompressed, etc.