I'm trying to use Databricks Dolly model from HuggingFace repo to create embeddings. My 16GB GPU is running out of memory even when I'm using 3B version of the model so I'm trying to load it in 8 bit:
embeddings = HuggingFaceEmbeddings(model_name="databricks/dolly-v2-3b", model_kwargs={'load_in_8bit':True})
Looks like load_in_8bit
kwarg is not permitted here but I know it's possible to load a model this way when instantiating a pipeline. Is there a way to do the same for embeddings? Couldn't find anything helpful in langchain docs about this