I would like to run a TinyLlama model trained by Karpathy using llama.cpp: https://huggingface.co/karpathy/tinyllamas/tree/main
In theory this should work since its the Llama model architecture, however I do not know how to convert it to the quantized GGUF format in order to run it with llama.cpp Any pointers on how to do so?
I tried using the convert.py script but it seems like its missing the vocabulary. I dont have a good grasp on how the conversion really works and what constituent elements it needs so would appreciate an explanation of that too. Thank you!