How to run Vercel AI SDK on AWS Lambda and API Gateway

252 views Asked by At

I am trying to host my NextJS Vercel AI SDK App on AWS via Cloudfront, Lambda and API Gateway.

I want to modify the useChat() function to include the API of my Lambda Function, which does the connection and returns the StreamingTextResponse from OpenAI.

However, the StreamingTextResponse body always has a undefined stream.

What are things I can do to fix this?

Any help is appreciated, thanks.

page.tsx

"use client";

import { useChat } from "ai/react";
import { useState, useEffect } from "react";

export default function Chat() {
  
  const { messages, input, handleInputChange, handleSubmit, data } = useChat({api: '/myAWSAPI'});
 
...

Lambda Function


const OpenAI = require('openai')
const { OpenAIStream, StreamingTextResponse } = require('ai');
const prompts = require('./prompts')
const { roleplay_prompt } = prompts

// Create an OpenAI API client (that's edge friendly!)
const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY || '',
});



exports.handler = async function(event, context, callback) {
  
  
  // Extract the `prompt` from the body of the request
  
  const { messages } = event.body;
    
  const messageWithSystem = [


    {role: 'system', content: roleplay_prompt},
    ...messages // Add user and assistant messages after the system message
  ]

  console.log(messageWithSystem)

  // Ask OpenAI for a streaming chat completion given the prompt
  const response = await openai.chat.completions.create({
    model: 'gpt-3.5-turbo',
    stream: true,
    messages: messageWithSystem,
  });
  
  // Convert the response into a friendly text-stream
  const stream = OpenAIStream(response);
  // Respond with the stream
  const chatResponse = new StreamingTextResponse(stream);

  // body's stream is always undefined

 
  console.log(chatResponse)
  
  return chatResponse
}

1

There are 1 answers

0
mobob On

After digging through this on my own for a bit, I've discovered there is a fundamental difference regarding how the NextJS handlers work compared to Lambda, at least as far as streaming goes (at least as of Jan 2024).

This is the simplest guide I've found that gets you from 0->1 with a streaming handler on Lambda: https://docs.aws.amazon.com/lambda/latest/dg/response-streaming-tutorial.html

And this is the article from April 2023 on them introducing the capability. https://aws.amazon.com/blogs/compute/introducing-aws-lambda-response-streaming/

In short, there is a few methods, the main one you're missing is awslambda.streamifyResponse, as streaming Lambdas require this magic to convert the handler and pass through the responseStream.

Here is my conversion of Vercel's chat response handler (similar to https://github.com/vercel/ai/blob/main/examples/next-openai/app/api/chat/route.ts) for lambda that seems to work:

import { OpenAIStream } from 'ai';
import OpenAI from 'openai';

import stream from 'stream';
import util from 'util';
const { Readable } = stream;
const pipeline = util.promisify(stream.pipeline);

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

export const chatHandler: awslambda.StreamifyHandler = async (event, responseStream, _context) => {
  console.log(`chat processing event: ${JSON.stringify(event)}`);

  const { messages } = JSON.parse(event.body || '');

  console.log(`chat processing messages: ${JSON.stringify(messages)}`);

  // Ask OpenAI for a streaming chat completion given the prompt
  const response = await openai.chat.completions.create({
    model: 'gpt-3.5-turbo',
    stream: true,
    messages,
  });

  // Convert the response into a friendly text-stream
  const stream = OpenAIStream(response, {
    onStart: async () => {
      // This callback is called when the stream starts
      // You can use this to save the prompt to your database
      // await savePromptToDatabase(prompt);
      console.log(`Started stream, latest message: ${JSON.stringify(messages.length > 0 ? messages[messages.length - 1] : '<none>')}`);
    },
    onToken: async (token: string) => {
      console.log(`chat got token: ${token}`);
      // This callback is called for each token in the stream
      // You can use this to debug the stream or save the tokens to your database
      // console.log(token);
    },
    onCompletion: async (completion: string) => {
      // This callback is called when the stream completes
      // You can use this to save the final completion to your database
      // await saveCompletionToDatabase(completion);
      console.log(`Completed stream with completion: ${completion}`);
    },
    onFinal: async (final: string) => {
      console.log(`chat got final: ${final}`);
    },
  });

  // Respond with the stream
  // NOPE! Not in a lambda
  //return new StreamingTextResponse(stream);

  // this is how we chain things together in lambda
  // @ts-expect-error this seems to be ok, but i'd like to do this safely
  await pipeline(stream, responseStream);
};

// see https://github.com/astuyve/lambda-stream for better support of this
export const handler = awslambda.streamifyResponse(chatHandler);

I mention a helper type lib above, I used the below .d.ts in my source that does the job at least for those awslambda. node types. I'd love to understand how the Vercel ReadableStream is just a Readable, as when i asked copilot it said it wouldn't work

import { APIGatewayProxyEventV2, Context, Handler } from 'aws-lambda';
import { Writable } from 'stream';

declare global {
  namespace awslambda {
    export namespace HttpResponseStream {
      function from(writable: Writable, metadata: any): Writable;
    }

    export type ResponseStream = Writable & {
      setContentType(type: string): void;
    };

    export type StreamifyHandler = (event: APIGatewayProxyEventV2, responseStream: ResponseStream, context: Context) => Promise<any>;

    export function streamifyResponse(handler: StreamifyHandler): Handler<APIGatewayProxyEventV2>;
  }
}

I'm sure AWS will improve this at some point, but for now it feels pretty fringe. Would love to hear if anyone is doing this in production, the lack of APIGW support is pretty limiting.