Converting a TinyStories Llama model to GGUF for llama.cpp

332 views Asked by At

I would like to run a TinyLlama model trained by Karpathy using llama.cpp: https://huggingface.co/karpathy/tinyllamas/tree/main

In theory this should work since its the Llama model architecture, however I do not know how to convert it to the quantized GGUF format in order to run it with llama.cpp Any pointers on how to do so?

I tried using the convert.py script but it seems like its missing the vocabulary. I dont have a good grasp on how the conversion really works and what constituent elements it needs so would appreciate an explanation of that too. Thank you!

0

There are 0 answers