I download a pytorch model:
huggingface-cli download "mistralai/Mistral-7B-v0.1"
I can see these files locally. I can clearly see this is a pytorch model by the name of the .bin files:
ls ~/.cache/huggingface/hub/models--mistralai--Mistral-7B-v0.1/snapshots/5e9c98b96d071dce59368012254c55b0ec6f8658
README.md generation_config.json pytorch_model-00002-of-00002.bin special_tokens_map.json tokenizer.model
config.json pytorch_model-00001-of-00002.bin pytorch_model.bin.index.json tokenizer.json tokenizer_config.json
I've installed pytorch serve and it's dependencies. I'd like to serve this model with the pytorch serve cli. I try running:
torchserve --foreground --model-store ~/.cache/huggingface/hub/models--mistralai--Mistral-7B-v0.1/snapshots/5e9c98b96d071dce59368012254c55b0ec6f8658
This runs a server, but no models seem to be loaded. I can do health check and see the basic API description JSON with:
curl http://127.0.0.1:8080/ping
curl http://127.0.0.1:8080/api-description
How do I load and serve the actual model? Can this be done with the torchserve CLI? Do I need to write some basic Python code? Or is this much more involved?