Im trying to create Tensorflow dataset to train my model. I have a folder full of tagged photos, tagging is part of the files names.
do you have a reasonable way to load the dataset for training without splitting it to different directories?
example: for files:
- ./dataset/path/img0_cat.bmp
- ./dataset/path/img1_dog.bmp
- ./dataset/path/img2_horse.bmp
- ./dataset/path/img3_cat.bmp
- ./dataset/path/img4_dog.bmp
- ./dataset/path/img5_horse.bmp
- ./dataset/path/img6_dog.bmp
- ./dataset/path/img7_cat.bmp
- ./dataset/path/img8_horse.bmp
- ./dataset/path/img9_cat.bmp
- ./dataset/path/img10_dog.bmp
expected output: tf.Dataset labeled as one hot for (cat, dog, horse)
You can try assigning an ID to each path and gather paths based on whatever IDs you're using on your training set.
If you're using Tensorflow, the Dataset documentation has informative methods in loading data. Specifically,
dataset_dog = tf.data.Dataset.list_files("./dataset/path/*dog.bmp)