Can I use lora to just reduce the size and run inference?

Question

Can I use lora to just reduce the size and run inference?

218 views Asked by Darom At 19 October 2023 at 07:56

So, lora basically can make finetune a model really easy right, but I want just to test a language model, in my case Flan-t5 , can I use lora to make it small so it can fit in my gpu ? , I’ve seen tutorials that train the model with HF but I just want it to run as inference, how can I do that, I was just trying with hugging face

peft_config = LoraConfig(task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1) model_name_or_path = “google/flan-t5-xl”

model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path,device_map=‘auto’)

model = get_peft_model(model, peft_config)

to then just save it , but Im not sure if this is the right thing Thanks

Original Q&A

There are 1 answers

**NLP from scratch** · Answer 1 · 2023-10-20T14:54:15+00:00

If you just want to do inference, not training / fine-tuning, you want model quantization via GPTQ, see the blog post from Hugging Face here: Making LLMs lighter with AutoGPTQ and transformers

More practically, you should look for an already quantized version of the model you wish to try out, e.g. for FLAN-T5 here is one: https://huggingface.co/limcheekin/flan-t5-xl-ct2

TechQA.

Can I use lora to just reduce the size and run inference?

There are 1 answers

Related Questions in NLP

Related Questions in HUGGINGFACE-TRANSFORMERS

Related Questions in HUGGINGFACE-TOKENIZERS

Related Questions in HUGGINGFACE-DATASETS

Popular Questions

Popular Tags

Trending Questions