i'm trying to train a deep learning model on vs code so i would like to use the GPU for that. I have cuda 11.6 , nvidia GeForce GTX 1650, TensorFlow-gpu==2.5.0 and pip version 21.2.3 for windows 10. The problem is whenever i run this part of code i've got this error : Mixed precision training with AMP or APEX (--fp16
or --bf16
) and half precision evaluation (--fp16_full_eval
or --bf16_full_eval
) can only be used on CUDA devices.
from transformers import TrainingArguments
training_args = TrainingArguments(
output_dir=new_output_models_dir,
#output_dir="dev/",
group_by_length=True,
per_device_train_batch_size=16,
gradient_accumulation_steps=2,
#dataloader_num_workers = 1,
dataloader_num_workers = 0,
evaluation_strategy="steps",
num_train_epochs=40,
fp16=True,
save_steps=400,
eval_steps=400,
logging_steps=400,
learning_rate=1e-4,
warmup_steps=500,
save_total_limit=2,
)
I've also tested whether tensorflow can access a gpu and whether tensorflow was built with cuda gpu support using tf.config.list_physical_devices('GPU') and tf.test.is_built_with_cuda() and both of them return TRUE . How to slove this issue ? and why i'm getting this error ? Any ideas !
The above error suggests that it does not accept fp16=True/bf16=True in non-GPU mode. Perhaps Cuda 11.6 might be an issue here which has stability issues.
Test with Cuda 11.2 and CudNN 8.1 . If that does not work you can go with fp16=False parametre.
Ref - https://www.tensorflow.org/install/source#gpu