How to make a csv file with one column as file name and other as folder name

1.2k views Asked by At

I am making a dataset, which is in a way such that for one thing like "apple" there is a folder named "apples" in the root folder(contains multiple folders) that contains only images of apples and so on.

I want to make a csv file in which it has all the filenames as one column and other as the folder name.

I tried this but it is entering data row-wise

from PIL import Image
import csv
import os
subdirs = [x[0] for x in os.walk('Training images')]
print(subdirs)
data=[]
with open('images.csv', 'w', newline='') as writeFile:
    writer = csv.writer(writeFile)
    for i in range(len(subdirs)):
        for filename in os.listdir(subdirs[i]):
            data.append(filename)
            writer.writerow(data)
            data=[]
writeFile.close()
2

There are 2 answers

0
Evgeny Nozdrev On BEST ANSWER

As written here, writerow() function can be used with lists. In your example, data=[] is the list and it is putted into writerow().

You append only one item: data.append(filename). Just append another: data.append(dirname).

Or without temp variable data at all (recommended, less code = simpler to understand):

    writer.writerow([filename, dirname])
0
Sabito stands with Ukraine On

The following code simply creates a directory structure for testing:

import os

os.mkdir("root")
os.mkdir("root/apples")
os.mkdir("root/oranges")
os.mkdir("root/bananas")

for foldername in ["apples","oranges","bananas"]:
    for i in range(0,10):
        with open(os.path.join("root",foldername,f"{i}.txt"),'w') as f:
            f.write("test")

Now I loop through all folders in the root directory and append the names of the files within them along with their folder names into a list:

list_ = []
for folder in os.listdir("root"):
    list_of_file_names = os.listdir(os.path.join("root",folder))
    list_ = list_ + list(zip([folder]*len(list_of_file_names), list_of_file_names))

This is what list_ looks like:

[('apples', '0.txt'),
 ('apples', '1.txt'),
 ('apples', '2.txt'),
 ('apples', '3.txt'),
 ('apples', '4.txt'),
 ('apples', '5.txt'),
 ('apples', '6.txt'),
 ('apples', '7.txt'),
 ('apples', '8.txt'),
 ('apples', '9.txt'),
 ('bananas', '0.txt'),
 ('bananas', '1.txt'),
 ('bananas', '2.txt'),
 ('bananas', '3.txt'),
 ('bananas', '4.txt'),
 ('bananas', '5.txt'),
 ('bananas', '6.txt'),
 ('bananas', '7.txt'),
 ('bananas', '8.txt'),
 ('bananas', '9.txt'),
 ('oranges', '0.txt'),
 ('oranges', '1.txt'),
 ('oranges', '2.txt'),
 ('oranges', '3.txt'),
 ('oranges', '4.txt'),
 ('oranges', '5.txt'),
 ('oranges', '6.txt'),
 ('oranges', '7.txt'),
 ('oranges', '8.txt'),
 ('oranges', '9.txt')]

Finaly, I convert the above list to a pandas dataframe and save it as csv to a test file:

df = pd.DataFrame(list_)
df.to_csv("test.csv",index=False)

The contents of the csv:

enter image description here