How to deploy my fast api with llama 2 on app engine

47 views Asked by At

Hi i am new to gcp and i have this project using fastapi with llama 2 chat. I cant run it on my computer as it is took long to response and i tried to deploy it on app engine but it gets error 502. This is my code and also i am new to llama.

import copy
from fastapi import FastAPI, HTTPException
from llama_cpp import Llama
from pydantic import BaseModel

# loading model

print("loading model...")
llm = Llama(model_path=r"C:\Users\Harry\Project\models\llama-2-7b-chat.Q2_K.gguf")
print("model loaded!")
app = FastAPI()

class InputMessage(BaseModel):
    message: str

class OutputMessage(BaseModel):
    response: str

@app.get("/")
def read_root():
    return {"message": "Welcome to the chatbot API!"}

@app.post("/chat", response_model=OutputMessage)
def chat_post(input_message: InputMessage):
    try:
        global llm
        if llm is None:
            raise HTTPException(status_code=500, detail="Chatbot model not loaded")
        user_message = input_message.message

        # Use your chatbot model to generate a response
        bot_response_dict = llm(user_message,
                                max_tokens=-1,
                                echo=False,
                                temperature=0.1,
                                top_p=0.9)

        # Extract the response from the dictionary
        bot_response = bot_response_dict.get('response', 'Default response')

        # Print the response content to the console
        print(f"bot_response_dict: {bot_response_dict}")
        return OutputMessage(response=bot_response)

    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

I tried to run locally but it took so long to response and i dont know how to solve it. My questions is how to run my fast api and how to deploy it on gcp

0

There are 0 answers