I have multiple .csv.gz files which I'm trying to read into a dask dataframe, I was able to achive this using this code :
file_paths = glob.glob(file_pattern)
@delayed
def read_csv(file_paths):
return dd.read_csv(file_paths, compression='gzip', blocksize=None,dtype=None)
dfs=[delayed(pd.read_csv)(fn) for fn in file_paths]
df = dd.from_delayed(dfs)
The problem is that when i tried converting the dask dataframe into pandas dataframe using
df=df.compute()
I get the error message: "EmptyDataError: No columns to parse from file" I would really appreciate any help with this
The below worked for me: