I'm using Amazon IVS real-time streaming and I would like server-side code (Lambda or EC2) to join a stage as a participant so that it can stream pre-recorded audio, and process individual participant's streams server-side.
Is there a way to do this?
The IVS real-time streaming Web broadcast SDK is browser-only and doesn't work in a Node.js environment.
I've read about Server-Side Composition but this appears to mix audio and video from all stage participants and then sends this mixed video to an IVS channel. I want to process participant's streams individually server-side. Server-Side Composition also appears to send video to an IVS channel, and my participant streams will be audio only.
Perhaps there's a completely different solution that would be easier? I'm trying to create a real-time audio chatbot similar to ChatGPT. The steps I'm trying to reproduce are as follows:
- Stream audio from a browser to a server
- Transcribe audio
- Send transcription to an LLM
- Convert the response to audio
- Stream the audio back to the browser
To be honest the difficult part seems to be the first step (real-time streaming from browser to server), which is why I started looking at WebRTC and then AWS IVS real-time streaming. I'd rather not have to create my own WebRTC and Websockets servers and wire everything up manually.