BentoML's handling of LoRA loading and unloading

39 views Asked by At

question/problem on BentoML's handling of LoRA loading and unloading poses limitations in our specific use case. We generate images from text using Huggingface's DiffusionPipeline and refine the images using StableDiffusionXLImg2ImgPipeline. In the SDXl pipeline, we manually execute the loading and unloading of Lora weights to enable multiple usages. Our image generation process involves two text encoders, U-net and VAE. However, in the refiner stage, we employ only one text encoder, which includes both U-net and VAE. The issue arises when attempting to use BentoML, as its diffusers.py automatically unloads Lora weights. This automatic unloading leads to errors due to the limitation it imposes on text encoding usage. We seek a solution that allows us to manually load and unload LoRA weights, providing an option to override the automatic process implemented in BentoML. Currently, Huggingface's functionality allows for this manual control, and we aim to replicate this flexibility within BentoML

0

There are 0 answers