I am having a problem with loading the data that make up the dataset. My previous (working) approach was to use a pandas DataFrame, but for larger datasets, the training process gets killed, as the data takes up too much memory. So I decided to use TensorFlow's Dataset class to overcome this problem, but I cannot load the individual files.
Specifically, I tried loading the individual file paths of the individual .npz files as samples, and then using the map method of the Dataset class to individually load the contents of the .npz files. Each .npz file is a numpy array of shapes (1, x, x, z) and it's contained within the folder that specifies its label name.
This is the method I use for loading the dataset:
IMAGE_SUPPORTED_EXTENSIONS = ('.jpg', '.jpeg', '.png')
def load_dataset(self):
data = []
for label in self.labels:
folder = self.main_folder / label
file_paths = [str(file_path) for file_path in folder.glob('*') if file_path.suffix in TENSOR_SUPPORTED_EXTENSIONS]
latenst_spaces = [DatasetLoader.load_tensor(file_path) for file_path in folder.glob('*') if file_path.suffix in TENSOR_SUPPORTED_EXTENSIONS]
dataset = tf.data.Dataset.from_tensor_slices(file_paths)
# Zip dataset with labels
dataset = dataset.map(lambda x: (x, label))
dataset = dataset.map(map_function)
data.append(dataset)
# Concatenate datasets from different labels
dataset = data[0]
for i in range(1, len(data)):
dataset = dataset.concatenate(data[i])
return dataset
And this is the function passed to the map method:
def map_function(element):
file_path, label = element
npz_data = DatasetLoader.load_tensor(file_path)
return (npz_data, label)
@staticmethod
def load_tensor(file_path):
file_path = tf.get_static_value(tf_tensor)
file_path = Path(file_path)
if file_path.suffix not in ('.npy', '.npz'):
raise ValueError(f"Extension {file_path.suffix} not suppported.")
try:
with np.load(file_path) as tensor:
if file_path.suffix == ".npz":
for _, item in tensor.items():
tensor = item
return np.array(tensor).squeeze()
except Exception as e:
print(f"Error loading {file_path.stem} file: {str(e)}.", "\nFile path: ", file_path)
raise RuntimeError(f"Error loading {file_path.stem} file: {str(e)}.") from e
Here's an example of creating a
npzwith 2 arrays, and then loading them:And the load:
your error?
If I try your
tensor=itemI get an errorI get this error even I saved only one array to the
npz.Changing a variable inside a context or for loop is dangerous, either not working or producing an error.