Run code llama from Hugging Face locally with GPU

1.6k views Asked by At

I have trying to host the Code Llama from Hugging Face locally and trying to run it. It runs soley on CPU and it is not utilizing GPU available in the machine despite having Nvidia Drivers and Cuda toolkit.

from transformers import AutoTokenizer
import transformers

model = "codellama/CodeLlama-7b-hf"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=None,
    device_map = "cuda:0"
)

prompt = "Write python code to reverse a string"

sequences = pipeline(
    prompt,
    do_sample=True,
    top_k=10,
    temperature=0.1,
    top_p=0.95,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=200,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

The code above runs the LLM locally but in case we use cuda for the device, it gives the following error

File "C:\Users\winuser3\Desktop\GENAI-App\venv\lib\site-packages\transformers\modeling_utils.py", line 3333, in from_pretrained
    ) = cls._load_pretrained_model(
  File "C:\Users\winuser3\Desktop\GENAI-App\venv\lib\site-packages\transformers\modeling_utils.py", line 3723, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "C:\Users\winuser3\Desktop\GENAI-App\venv\lib\site-packages\transformers\modeling_utils.py", line 744, in _load_state_dict_into_meta_model
    set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
  File "C:\Users\winuser3\Desktop\GENAI-App\venv\lib\site-packages\accelerate\utils\modeling.py", line 317, in set_module_tensor_to_device
    new_value = value.to(device)
  File "C:\Users\winuser3\Desktop\GENAI-App\venv\lib\site-packages\torch\cuda\__init__.py", line 289, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

1

There are 1 answers

0
AKX On BEST ANSWER

You need to install a version of Torch that is CUDA enabled; see https://pytorch.org/get-started/locally.

Namely, You click on "Windows", "CUDA 11.8" and you get the installation instruction pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118.

If you had already installed torch or the other packages, you may need to pip uninstall them first.