I'm experimenting on my puny pc (laptop with Intel i7, 32GB Ram and Nvidia RTXA2000) with the mixtrail 7B model an notice incredible performance differences, when running the huggingface transformer pipeline, something like such
from transformers import pipeline
# The model from https://huggingface.co/mistralai/Mistral-7B-v0.1/tree/main is in the same directory
pipe = pipeline("conversational", model="./", tokenizer='./')
conversation = Conversation('Hi, how are you?')
pipe(conversation)
even after one hour of runtime, I don't get any results. However, when running GPT4All, the performance is rather acceptable, just a couple of seconds (5.5s). Here the code
from gpt4all import GPT4All
model = GPT4All(model_name="mistral-7b-v0.1.Q6_K.gguf", model_path="./", allow_download=False)
with model.chat_session():
response = model.generate(prompt="Hello, how are you?")
print(response)
How come the performance is so incredible different? Is it the format gguf that crazily improves the performance? Is it the gpt4all vs. huggingfaces' pipeline?