I am using
- nVidia GeForce GTX 780 (Kepler)
- Driver Version: 470.223.02
- CUDA Toolkit v11.4.0
- cuDNN v8.2.4
- TensorFlow and Keras v2.8.0
- AutoKeras v1.0.17
- Ubuntu 20.04
=======================
I have two directories, train_data_npy
and valid_data_npy
where there are 3013 and 1506 *.npy
files, respectively.
Each *.npy
file has 12 columns of float types, of which the first nine columns are features and the last three columns are one-hot-encoded labels of three classes.
The following Python script's task is to load those *.npy
files in chunks so that the memory is not overflowed while searching for a neural network model.
However, the script is failing.
What exactly is the issue with the given script?
Why is the script failing?
Or, is it not about the script but rather about the installation issues of CUDA, TF, or AutoKeras?
# File: cnn_search_by_chunk.py
import numpy as np
import tensorflow as tf
import os
import autokeras as ak
N_FEATURES = 9
BATCH_SIZE = 100
def get_data_generator(folder_path, batch_size, n_features):
"""Get a generator returning batches of data from .npy files in the specified folder.
The shape of the features is (batch_size, n_features).
"""
def data_generator():
files = os.listdir(folder_path)
npy_files = [f for f in files if f.endswith('.npy')]
for npy_file in npy_files:
data = np.load(os.path.join(folder_path, npy_file))
x = data[:, :n_features]
y = data[:, n_features:]
y = np.argmax(y, axis=1) # Convert one-hot-encoded labels back to integers
for i in range(0, len(x), batch_size):
yield x[i:i+batch_size], y[i:i+batch_size]
return data_generator
train_data_folder = '/home/my_user_name/original_data/train_data_npy'
validation_data_folder = '/home/my_user_name/original_data/valid_data_npy'
train_dataset = tf.data.Dataset.from_generator(
get_data_generator(train_data_folder, BATCH_SIZE, N_FEATURES),
output_signature=(
tf.TensorSpec(shape=(None, N_FEATURES), dtype=tf.float32),
tf.TensorSpec(shape=(None,), dtype=tf.int32) # Labels are now 1D integers
)
)
validation_dataset = tf.data.Dataset.from_generator(
get_data_generator(validation_data_folder, BATCH_SIZE, N_FEATURES),
output_signature=(
tf.TensorSpec(shape=(None, N_FEATURES), dtype=tf.float32),
tf.TensorSpec(shape=(None,), dtype=tf.int32) # Labels are now 1D integers
)
)
clf = ak.StructuredDataClassifier(overwrite=True, max_trials=1, seed=5)
clf.fit(x=train_dataset, validation_data=validation_dataset, batch_size=BATCH_SIZE)
print(clf.evaluate(validation_dataset))
my_user_name@192:~/my_project_name_v2$ python3 cnn_search_by_chunk.py
2023-11-29 20:05:53.532005: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Using TensorFlow backend
2023-11-29 20:05:55.467804: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1960] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
Search: Running Trial #1
Hyperparameter |Value |Best Value So Far
structured_data...|True |?
structured_data...|2 |?
structured_data...|False |?
structured_data...|0 |?
structured_data...|32 |?
structured_data...|32 |?
classification_...|0 |?
optimizer |adam |?
learning_rate |0.001 |?
Epoch 1/1000
33143/33143 [==============================] - 149s 4ms/step - loss: 0.0670 - accuracy: 0.9677 - val_loss: 0.0612 - val_accuracy: 0.9708
Epoch 2/1000
33143/33143 [==============================] - 146s 4ms/step - loss: 0.0625 - accuracy: 0.9697 - val_loss: 0.0598 - val_accuracy: 0.9715
Epoch 3/1000
33143/33143 [==============================] - 147s 4ms/step - loss: 0.0617 - accuracy: 0.9702 - val_loss: 0.0593 - val_accuracy: 0.9717
Epoch 4/1000
33143/33143 [==============================] - 146s 4ms/step - loss: 0.0614 - accuracy: 0.9703 - val_loss: 0.0591 - val_accuracy: 0.9718
Epoch 5/1000
33143/33143 [==============================] - 147s 4ms/step - loss: 0.0612 - accuracy: 0.9705 - val_loss: 0.0590 - val_accuracy: 0.9719
Epoch 6/1000
33143/33143 [==============================] - 145s 4ms/step - loss: 0.0610 - accuracy: 0.9707 - val_loss: 0.0588 - val_accuracy: 0.9721
Epoch 7/1000
33143/33143 [==============================] - 147s 4ms/step - loss: 0.0608 - accuracy: 0.9707 - val_loss: 0.0586 - val_accuracy: 0.9721
Epoch 8/1000
33143/33143 [==============================] - 147s 4ms/step - loss: 0.0607 - accuracy: 0.9709 - val_loss: 0.0585 - val_accuracy: 0.9723
Epoch 9/1000
33143/33143 [==============================] - 146s 4ms/step - loss: 0.0605 - accuracy: 0.9710 - val_loss: 0.0584 - val_accuracy: 0.9723
Epoch 10/1000
33143/33143 [==============================] - 146s 4ms/step - loss: 0.0604 - accuracy: 0.9710 - val_loss: 0.0583 - val_accuracy: 0.9724
Epoch 11/1000
33143/33143 [==============================] - 148s 4ms/step - loss: 0.0603 - accuracy: 0.9711 - val_loss: 0.0583 - val_accuracy: 0.9724
Epoch 12/1000
33143/33143 [==============================] - 146s 4ms/step - loss: 0.0602 - accuracy: 0.9712 - val_loss: 0.0582 - val_accuracy: 0.9724
Epoch 13/1000
33143/33143 [==============================] - 147s 4ms/step - loss: 0.0601 - accuracy: 0.9712 - val_loss: 0.0582 - val_accuracy: 0.9724
Epoch 14/1000
33143/33143 [==============================] - 148s 4ms/step - loss: 0.0601 - accuracy: 0.9712 - val_loss: 0.0582 - val_accuracy: 0.9724
Epoch 15/1000
33143/33143 [==============================] - 146s 4ms/step - loss: 0.0600 - accuracy: 0.9713 - val_loss: 0.0582 - val_accuracy: 0.9724
Epoch 16/1000
33143/33143 [==============================] - 146s 4ms/step - loss: 0.0600 - accuracy: 0.9713 - val_loss: 0.0581 - val_accuracy: 0.9725
Epoch 17/1000
33143/33143 [==============================] - 147s 4ms/step - loss: 0.0600 - accuracy: 0.9713 - val_loss: 0.0581 - val_accuracy: 0.9725
Epoch 18/1000
33143/33143 [==============================] - 147s 4ms/step - loss: 0.0599 - accuracy: 0.9713 - val_loss: 0.0582 - val_accuracy: 0.9724
Epoch 19/1000
33143/33143 [==============================] - 145s 4ms/step - loss: 0.0599 - accuracy: 0.9713 - val_loss: 0.0581 - val_accuracy: 0.9724
Epoch 20/1000
33143/33143 [==============================] - 144s 4ms/step - loss: 0.0599 - accuracy: 0.9713 - val_loss: 0.0582 - val_accuracy: 0.9724
Epoch 21/1000
33143/33143 [==============================] - 147s 4ms/step - loss: 0.0599 - accuracy: 0.9713 - val_loss: 0.0582 - val_accuracy: 0.9724
Epoch 22/1000
33143/33143 [==============================] - 144s 4ms/step - loss: 0.0599 - accuracy: 0.9713 - val_loss: 0.0581 - val_accuracy: 0.9724
Epoch 23/1000
33143/33143 [==============================] - 146s 4ms/step - loss: 0.0600 - accuracy: 0.9713 - val_loss: 0.0582 - val_accuracy: 0.9724
Epoch 24/1000
33143/33143 [==============================] - 145s 4ms/step - loss: 0.0599 - accuracy: 0.9714 - val_loss: 0.0581 - val_accuracy: 0.9725
Epoch 25/1000
33143/33143 [==============================] - 147s 4ms/step - loss: 0.0599 - accuracy: 0.9714 - val_loss: 0.0581 - val_accuracy: 0.9724
Epoch 26/1000
33143/33143 [==============================] - 147s 4ms/step - loss: 0.0599 - accuracy: 0.9713 - val_loss: 0.0581 - val_accuracy: 0.9724
Trial 1 Complete [01h 16m 38s]
val_accuracy: 0.9724819660186768
Best val_accuracy So Far: 0.9724819660186768
Total elapsed time: 01h 16m 38s
WARNING:tensorflow:Detecting that an object or model or tf.train.Checkpoint is being deleted with unrestored values. See the following logs for the specific values in question. To silence these warnings, use `status.expect_partial()`. See https://www.tensorflow.org/api_docs/python/tf/train/Checkpoint#restorefor details about the status object returned by the restore function.
WARNING:tensorflow:Detecting that an object or model or tf.train.Checkpoint is being deleted with unrestored values. See the following logs for the specific values in question. To silence these warnings, use `status.expect_partial()`. See https://www.tensorflow.org/api_docs/python/tf/train/Checkpoint#restorefor details about the status object returned by the restore function.
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.1
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.1
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.2
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.2
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.3
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.3
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.4
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.4
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.5
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.5
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.6
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.6
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.7
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.7
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.8
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.8
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.9
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.9
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.10
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.10
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.11
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.11
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.12
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.12
2023-11-29 21:23:57.450991: W tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES failed at lookup_table_op.cc:929 : FAILED_PRECONDITION: Table not initialized.
2023-11-29 21:23:57.451029: W tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES failed at lookup_table_op.cc:929 : FAILED_PRECONDITION: Table not initialized.
2023-11-29 21:23:57.451059: W tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES failed at lookup_table_op.cc:929 : FAILED_PRECONDITION: Table not initialized.
2023-11-29 21:23:57.451091: W tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES failed at lookup_table_op.cc:929 : FAILED_PRECONDITION: Table not initialized.
2023-11-29 21:23:57.451123: W tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES failed at lookup_table_op.cc:929 : FAILED_PRECONDITION: Table not initialized.
2023-11-29 21:23:57.451157: W tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES failed at lookup_table_op.cc:929 : FAILED_PRECONDITION: Table not initialized.
2023-11-29 21:23:57.451185: W tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES failed at lookup_table_op.cc:929 : FAILED_PRECONDITION: Table not initialized.
2023-11-29 21:23:57.451213: W tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES failed at lookup_table_op.cc:929 : FAILED_PRECONDITION: Table not initialized.
2023-11-29 21:23:57.451250: W tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES failed at lookup_table_op.cc:929 : FAILED_PRECONDITION: Table not initialized.
Traceback (most recent call last):
File "cnn_search_by_chunk.py", line 50, in <module>
print(clf.evaluate(validation_dataset))
File "/home/my_user_name/.local/lib/python3.8/site-packages/autokeras/tasks/structured_data.py", line 187, in evaluate
return super().evaluate(x=x, y=y, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/autokeras/auto_model.py", line 492, in evaluate
return utils.evaluate_with_adaptive_batch_size(
File "/home/my_user_name/.local/lib/python3.8/site-packages/autokeras/utils/utils.py", line 68, in evaluate_with_adaptive_batch_size
return run_with_adaptive_batch_size(
File "/home/my_user_name/.local/lib/python3.8/site-packages/autokeras/utils/utils.py", line 101, in run_with_adaptive_batch_size
history = func(x=x, validation_data=validation_data, **fit_kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/autokeras/utils/utils.py", line 70, in <lambda>
lambda x, validation_data, **kwargs: model.evaluate(
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/my_user_name/.local/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 53, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.FailedPreconditionError: Graph execution error:
Detected at node 'model/multi_category_encoding/string_lookup_15/None_Lookup/LookupTableFindV2' defined at (most recent call last):
File "cnn_search_by_chunk.py", line 50, in <module>
print(clf.evaluate(validation_dataset))
File "/home/my_user_name/.local/lib/python3.8/site-packages/autokeras/tasks/structured_data.py", line 187, in evaluate
return super().evaluate(x=x, y=y, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/autokeras/auto_model.py", line 492, in evaluate
return utils.evaluate_with_adaptive_batch_size(
File "/home/my_user_name/.local/lib/python3.8/site-packages/autokeras/utils/utils.py", line 68, in evaluate_with_adaptive_batch_size
return run_with_adaptive_batch_size(
File "/home/my_user_name/.local/lib/python3.8/site-packages/autokeras/utils/utils.py", line 101, in run_with_adaptive_batch_size
history = func(x=x, validation_data=validation_data, **fit_kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/autokeras/utils/utils.py", line 70, in <lambda>
lambda x, validation_data, **kwargs: model.evaluate(
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/engine/training.py", line 2200, in evaluate
logs = test_function_runner.run_step(
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/engine/training.py", line 4000, in run_step
tmp_logs = self._function(dataset_or_iterator)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/engine/training.py", line 1972, in test_function
return step_function(self, iterator)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/engine/training.py", line 1956, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/engine/training.py", line 1944, in run_step
outputs = model.test_step(data)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/engine/training.py", line 1850, in test_step
y_pred = self(x, training=False)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/engine/training.py", line 569, in __call__
return super().__call__(*args, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/engine/base_layer.py", line 1150, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler
return fn(*args, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/engine/functional.py", line 512, in call
return self._run_internal_graph(inputs, training=training, mask=mask)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/engine/functional.py", line 669, in _run_internal_graph
outputs = node.layer(*args, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/engine/base_layer.py", line 1150, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler
return fn(*args, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/autokeras/keras_layers.py", line 91, in call
for input_node, encoding_layer in zip(split_inputs, self.encoding_layers):
File "/home/my_user_name/.local/lib/python3.8/site-packages/autokeras/keras_layers.py", line 92, in call
if encoding_layer is None:
File "/home/my_user_name/.local/lib/python3.8/site-packages/autokeras/keras_layers.py", line 100, in call
output_nodes.append(
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/engine/base_layer.py", line 1150, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler
return fn(*args, **kwargs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/layers/preprocessing/index_lookup.py", line 756, in call
lookups = self._lookup_dense(inputs)
File "/home/my_user_name/.local/lib/python3.8/site-packages/keras/src/layers/preprocessing/index_lookup.py", line 792, in _lookup_dense
lookups = self.lookup_table.lookup(inputs)
Node: 'model/multi_category_encoding/string_lookup_15/None_Lookup/LookupTableFindV2'
Table not initialized.
[[{{node model/multi_category_encoding/string_lookup_15/None_Lookup/LookupTableFindV2}}]] [Op:__inference_test_function_5785123]
2023-11-29 21:23:57.618149: W tensorflow/core/kernels/data/generator_dataset_op.cc:108] Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized. The process may be terminated.
[[{{node PyFunc}}]]
2023-11-29 21:23:57.618266: W tensorflow/core/kernels/data/generator_dataset_op.cc:108] Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized. The process may be terminated.
[[{{node PyFunc}}]]
2023-11-29 21:23:57.618360: W tensorflow/core/kernels/data/generator_dataset_op.cc:108] Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized. The process may be terminated.
[[{{node PyFunc}}]]
2023-11-29 21:23:57.618434: W tensorflow/core/kernels/data/generator_dataset_op.cc:108] Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized. The process may be terminated.
[[{{node PyFunc}}]]
my_user_name@192:~/my_project_name_v2$
Going through your error log it is either your data or your GPU setup and installation.
GPU Libraries Warning: Cannot dlopen some GPU libraries
2023-11-29 20:05:55.467804: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1960] Cannot dlopen some GPU libraries.
This warning suggests that TensorFlow is having trouble loading some GPU libraries. Make sure you have the necessary GPU drivers and libraries installed. You can refer to the official TensorFlow GPU installation guide for details on setting up GPU support: TensorFlow GPU Support Guide.
2023-11-29 21:23:57.451123: W tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES failed at lookup_table_op.cc:929 : FAILED_PRECONDITION: Table not initialized.
There might be an issue with how the model is defined or how the data is processed. From the comments I judge that the model is defined correctly however without access to your data we can only scaffold (as has been done successfully with MNIST). Possible steps to resolve this: Check the input data to ensure it is correctly formatted for the model.