Combining multiple CSV files into one dataframe

39 views Asked by At

I am trying to concatenate csv files (with same headers) from folder in one df, but I get an empty list when reading the files:

from pathlib import Path  

path = '//Users//Directory//'
files = Path(path + "dirtest//").glob('*.csv')

If I print file names, I get them right:

for f in files:
     print(f.name)

filename1.csv
filename2.csv
filename3.csv

But when I try to read them and append sem into a list, it returns an empty list:

dfs = []
dfs = [pd.read_csv(f) for f in files]
dfs

[]

The next steps would be to concatenate the list into a df, but I couldn't get to it because the list is empty:

base_renov = pd.concat(dfs, ignore_index=True)

Can someone help me out?

1

There are 1 answers

2
mozway On

Path.glob returns a generator. Once you print the values in the list, this exhausts the generator:

for f in files:
     print(f.name)

You should directly pass this to read_csv:

files = Path(path + "dirtest//").glob('*.csv')

base_renov = pd.concat(map(pd.read_csv, files), ignore_index=True)

Or, as a one-liner:

base_renov = pd.concat(map(pd.read_csv, Path('//Users//Directory//').glob('*.csv')), ignore_index=True)

If you really want to use a loop (for example if you have other operations to perform), first convert the generator to list:

files = list(Path(path + "dirtest//").glob('*.csv'))

for f in files:
     print(f.name)
dfs = [pd.read_csv(f) for f in files]

base_renov = pd.concat(dfs, ignore_index=True)

Or, creates dfs in your loop:

files = list(Path(path + "dirtest//").glob('*.csv'))

dfs = []
for f in files:
     print(f.name)
     dfs.append(pd.read_csv(f))

base_renov = pd.concat(dfs, ignore_index=True)