I have a subset of ImageNet data contained in sub-folders locally, where each sub-folder represents a class of images. There are potentially hundreds of classes, and therefore sub-folders, and each subfolder can contain hundreds of images. Here is an example of this structure with a subset of folders. I want to train a classification model in tensorflow, but I am not sure how to format and load the data given this structure of different image classes in different folders and the class label being the name of the folder. Normally I've just used datasets that already exist in tensorflow like mnist or cifar10 which are formatted and easy to use.
How to load data in tensorflow from subdirectories
2.8k views Asked by Jane Sully At
2
There are 2 answers
2
Gerry P
On
you can us ImageDataGenerator.flow_from_directory. Documentation is here. Assume your sub directories reside in a directory called main_dir. Set the size of the images you want to process, below I used 224 X 224, also specified color images. class_mode is set to 'categorical' so when you compile your model use categorical cross entropy as the loss. Then use the code below.
train_gen=ImageDataGenerator(validation_split=.2,rescale=1/255)
train_gen=train_gen.flow_from_directory(main_dir, target_size=(256, 256),
color_mode="rgb", class_mode="categorical", batch_size=32, shuffle=True,
seed=123, subset='training)
valid_gen=train_gen.flow_from_directory(main_dir, target_size=(224, 224),
color_mode="rgb", class_mode="categorical", batch_size=32, shuffle=False,
seed=123, subset='validation)
# make and compile your model then fit the model per below
history=model.fit(x=train_gen, epochs=20, verbose=1, validation_data=valid_gen,
shuffle=True, initial_epoch=0)
Related Questions in TENSORFLOW
- A deterministic GPU implementation of fused batch-norm backprop, when training is disabled, is not currently available
- Keras similarity calculation. Enumerating distance between two tensors, which indicates as lists
- Does tensorflow have a way of calculating input importance for simple neural networks
- How to predict input parameters from target parameter in a machine learning model?
- Windows 10 TensorFlow cannot detect Nvidia GPU
- unable to use ignore_class in SparseCategoricalCrossentropy
- Why is this code not working? I've tried everything and everything seems to be fine, but no
- Why convert jpeg into tfrecords?
- ValueError: The shape of the target variable and the shape of the target value in `variable.assign(value)` must match
- The kernel appears to have died. It will restart automatically. whenever i try to run the plt.imshow() and plt.show() function in jupyter notebook
- Pneumonia detection, using transfer learning
- Cannot install tensorflow ver 2.3.0 (distribution not found)
- AttributeError: module 'keras._tf_keras.keras.layers' has no attribute 'experimental'
- Error while loading .keras model: Layer node index out of bounds
- prediction model with python tensorflow and keras, gives error when predicting
Related Questions in KERAS
- Keras similarity calculation. Enumerating distance between two tensors, which indicates as lists
- How to predict input parameters from target parameter in a machine learning model?
- What is the alternative to module: tf.keras.preprocessing?
- My MSE and MAE are low, but my R2 is not good, how to improve it?
- No module named 'keras.layers.core
- AttributeError: 'Sequential' object has no attribute 'predict_classes'. Did you mean: 'predict_step'?
- AttributeError: module 'keras._tf_keras.keras.layers' has no attribute 'experimental'
- Error while loading .keras model: Layer node index out of bounds
- prediction model with python tensorflow and keras, gives error when predicting
- Recommended way to use Gymnasium with neural networks to avoid overheads in model.fit and model.predict
- Keras OCR - Getting different results from Keras
- No gradients provided for any variable in R
- Error Encountered: InvalidArgumentError: Graph execution error using Keras and Transformers
- How to import logsumexp from keras.backend?
- Keras predict/predict_on_batch giving different answers than predict_step/__call__()
Related Questions in SUBDIRECTORY
- Dockerfile and package-json are in different folders
- Github actions to deploy subdirectory flask project to Azure Web App
- how to rename multiple files from different subfolders in R, while keeping them in their original folder
- Write a shell script that has a function that displays the number of files in the present working directory, but not subdirectories
- ModuleNotFoundError: No module named 'src' , trying to import from a sub package
- Switch r settings to not have to create directory when saving files into new subfolders
- Not able to import custom functions from other directory to use in unittest
- How to run this code in all subdirectories?
- How do I push a file into its subfolder in git repository?
- In python how do you get the "last" directory in a path string?
- React import all markdown files in subdirectory and generating routes
- css file being read regardless of changing url (css NOT cached)
- Correctly create the path without taking the whole path name but only from the current one in Powershell
- Editing a Wordpress Multisite Main Site URL
- How do I source an image that is from images sub folder to html subfolder
Related Questions in TENSORFLOW-DATASETS
- TypeError: 'NoneType' object is not subscriptable when training with tf.dataset
- How does Tensorflow.data.Dataset.load() handle the loading of multiple Datasets saved with the same name via Tensorflow.data.Dataset.save()
- Solution for high ram consumption for loading large NPZ file?
- Fine tune resnet-50
- How can I tackle Attribute Error when I import tensorflow_datasets?
- Illustrator file to single word Dataset
- Issue Accessing .bin Files in React Native App
- Why is KerasCV augmentation layer destructively editing image labels?
- In Tensorflow how do I make a dataset with objects & corresponding labels from an image
- How to load individual .npz files as samples into a Tensorflow Dataset
- Food101 Tensorflow Dataset Requested split "validation" does not exist
- tf.data generates training results that do not improve
- "Correct" way of preparing and feeding multiple pandas dataframes containing multivariate time series into a LSTM in tensorflow? (data pipeline)
- How to use downloaded dataset to train Tensorflow image recognition model
- Overlapping window using tf.data pipelines
Related Questions in LOADDATA
- Liquibase 4.26 while importing the csv file it's taking long time
- DolphinDB: Fail to correctly parse the Chinese column names in a CSV file using loadText
- How can I temporarily disable events in a WPF solution?
- Load data functions not working correctly on CSV(Comma Delimited) file
- Problem with using Read10x function of Seurat
- Mysql data load disabled/restrictions
- import a CSV file with JSONB into postgresql 14
- Error when using manage.py loaddata to load Django JSON file created with dumpdata
- Pulling the firebase documents those that comes before/above the specified document
- How to proper load async data
- Trying to save and load data in unity- why does a file exist in a certain path even though ive never created it? details below
- Loading data with Provider: Flutter
- Unable to load fixture for through model django
- Android WebView .loadData Trims the text
- Django dumpdata-loaddata error serializing custom class: ' Syntax error near "F" '
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)

You can use
tf.keras.preprocessing.image_dataset_from_directory().Your directory structure would be something like this but with many more classes:
I would suggest you split the dataset before this step as I think the data is split here randomly and not by stratified sampling(if your datasets are imbalanced then do this first and do not use the validation split to do it for you as I am not sure of the nature of how splitting is done as there is no mention of it).
Example:
Important things you have to set:
Labels must be inferred where the labels of the images are generated based on the directory structure so it follows the order of the classes.
Label mode has to be set to "categorical" which encodes the labels as a categorical vector.
Class names you can set this yourself where you would have to list the order of the folders in the directory otherwise the order is based on alphanumeric ordering. What you can do here as you have lots of folders is use
os.walk(directory)to get the list of the directories in the order that they are.Image size you can resize the images to be of the same size. Do so according to the model that you are using i.e., MobileNet takes in (224,224) so you can set this to (224,224).
More information here.