Run code llama from Hugging Face locally with GPU

Question

Run code llama from Hugging Face locally with GPU

1.6k views Asked by Tabish Javed At 24 October 2023 at 06:40

I have trying to host the Code Llama from Hugging Face locally and trying to run it. It runs soley on CPU and it is not utilizing GPU available in the machine despite having Nvidia Drivers and Cuda toolkit.

from transformers import AutoTokenizer
import transformers

model = "codellama/CodeLlama-7b-hf"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=None,
    device_map = "cuda:0"
)

prompt = "Write python code to reverse a string"

sequences = pipeline(
    prompt,
    do_sample=True,
    top_k=10,
    temperature=0.1,
    top_p=0.95,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=200,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

The code above runs the LLM locally but in case we use cuda for the device, it gives the following error

File "C:\Users\winuser3\Desktop\GENAI-App\venv\lib\site-packages\transformers\modeling_utils.py", line 3333, in from_pretrained
    ) = cls._load_pretrained_model(
  File "C:\Users\winuser3\Desktop\GENAI-App\venv\lib\site-packages\transformers\modeling_utils.py", line 3723, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "C:\Users\winuser3\Desktop\GENAI-App\venv\lib\site-packages\transformers\modeling_utils.py", line 744, in _load_state_dict_into_meta_model
    set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
  File "C:\Users\winuser3\Desktop\GENAI-App\venv\lib\site-packages\accelerate\utils\modeling.py", line 317, in set_module_tensor_to_device
    new_value = value.to(device)
  File "C:\Users\winuser3\Desktop\GENAI-App\venv\lib\site-packages\torch\cuda\__init__.py", line 289, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Original Q&A

There are 1 answers

**AKX** · Accepted Answer · 2023-10-26T08:55:39+00:00

You need to install a version of Torch that is CUDA enabled; see https://pytorch.org/get-started/locally.

Namely, You click on "Windows", "CUDA 11.8" and you get the installation instruction pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118.

If you had already installed torch or the other packages, you may need to pip uninstall them first.

TechQA.

Run code llama from Hugging Face locally with GPU

There are 1 answers

Related Questions in PYTHON

Related Questions in PYTORCH

Related Questions in LARGE-LANGUAGE-MODEL

Related Questions in LLAMA

Related Questions in GOOGLE-GENERATIVEAI

Popular Questions

Popular Tags

Trending Questions