I am trying to deploy Generative AI solution built using Langchain (obviously with LLM at it's core) and Sagemaker. So, the code is not just an inference script but inference pipeline (challenge is that this one will be using LLM). How can I achieve this? Also, I want to add streaming.
Deploy LLM using Sagemaker and Langchain
845 views Asked by akshat garg At
2
There are 2 answers
0
On
The usual architecture pattern is to separate the LLM from the client code (Langchain). Where the LLM is hosted in a SageMaker endpoint and the client is running in EC2, container or a Lambda function.
The advantages is much faster deployment (you'll update the app more often than the LLM), and an ability to scale out each of the components individually.
So, A much easier path to solution would be to deploy one of the LLMs available today in SageMaker Jumpstart (open-source or commercials), and deploy the application separately.
If you have good reasons to need full control of LLM, then you can try to build on this LLAMA2/SageMaker example (container, etc).
Then, if you want total control, you can build it all on top of your custom docker.
LLM's are huge and running in hundreds of GB. So, it is better to deploy the LLM's separately (since here we are trying to work in AWS, sagemaker endpoint makes sense) i.e. your app (using langchain) should call this endpoint (sagemaker endpoint within langchain) and consume predictions. Now, sagemaker endpoint cannot be simple sagemaker endpoint as some LLM's are huge and model optimization strategies have to be applied, with strong synergy between hardware and software is required. This is possible by the use of Large Model Inference Containers of Sagemaker. These containers run DJL serving+ Model Optimization Frameworks + LLM (Complete list here --> https://github.com/aws/deep-learning-containers/blob/master/available_images.md#large-model-inference-containers). Without optimization, don't deploy LLM's. But before taking this path, do give a check into Jumpstart models list and Bedrock (will save you a lot of time).