I'm trying to setup my local environment to use my llama model but get an error saying "Failed to load model from file"
from llama_cpp import Llama
llm = Llama(model_path='/Users/rem/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-Ensemble-v5-GGUF',
gqa=8)
output = llm(
"Q: Name the planets in the solar system? A: ", # Prompt
max_tokens=32, # Generate up to 32 tokens, set to None to generate up to the end of the context window
stop=["Q:", "\n"], # Stop generating just before the model would generate a new question
echo=True # Echo the prompt back in the output
)
The models are located in the same path, here is the output of scan-cache
(ms-nlp) REMs-MacBook-Pro:m3g prem$ huggingface-cli scan-cache
REPO ID REPO TYPE SIZE ON DISK NB FILES LAST_ACCESSED LAST_MODIFIED REFS LOCAL PATH
------------------------------------- --------- ------------ -------- -------------- -------------- ---- ---------------------------------------------------------------------------------
TheBloke/Llama-2-13B-Ensemble-v5-GGUF model 0.0 0 10 minutes ago 29 minutes ago main /Users/rem/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-Ensemble-v5-GGUF
TheBloke/Llama-2-13B-GGUF model 0.0 0 10 minutes ago 3 hours ago main /Users/rem/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-GGUF
TheBloke/Llama-2-7b-Chat-GGUF model 0.0 0 3 hours ago 3 hours ago main /Users/rem/.cache/huggingface/hub/models--TheBloke--Llama-2-7b-Chat-GGUF
Done in 0.0s. Scanned 3 repo(s) for a total of 0.0.