unable to run downloaded llama model

41 views Asked by At

I'm trying to setup my local environment to use my llama model but get an error saying "Failed to load model from file"

from llama_cpp import Llama
llm = Llama(model_path='/Users/rem/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-Ensemble-v5-GGUF',
            gqa=8)
output = llm(
      "Q: Name the planets in the solar system? A: ", # Prompt
      max_tokens=32, # Generate up to 32 tokens, set to None to generate up to the end of the context window
      stop=["Q:", "\n"], # Stop generating just before the model would generate a new question
      echo=True # Echo the prompt back in the output
) 

The models are located in the same path, here is the output of scan-cache

(ms-nlp) REMs-MacBook-Pro:m3g prem$ huggingface-cli scan-cache
REPO ID                               REPO TYPE SIZE ON DISK NB FILES LAST_ACCESSED  LAST_MODIFIED  REFS LOCAL PATH                                                                        
------------------------------------- --------- ------------ -------- -------------- -------------- ---- --------------------------------------------------------------------------------- 
TheBloke/Llama-2-13B-Ensemble-v5-GGUF model              0.0        0 10 minutes ago 29 minutes ago main /Users/rem/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-Ensemble-v5-GGUF 
TheBloke/Llama-2-13B-GGUF             model              0.0        0 10 minutes ago 3 hours ago    main /Users/rem/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-GGUF             
TheBloke/Llama-2-7b-Chat-GGUF         model              0.0        0 3 hours ago    3 hours ago    main /Users/rem/.cache/huggingface/hub/models--TheBloke--Llama-2-7b-Chat-GGUF         

Done in 0.0s. Scanned 3 repo(s) for a total of 0.0.
0

There are 0 answers