Best way to stream or load audio files into S3 bucket (contact centre recordings)

1.8k views Asked by At

What is the best way to with reliability get our client to send audio files to our S3 bucket that will process the audio files (ML processes that will do speech-to-text-insights)?

The files could be in .wav / mp3 other such audio formats. Also, some files may be larger in size.

Love to get best ideas? (e.g. API Gateway / Lambda / S3 ?) Would love to hear from anyone who may have done this before.

Some questions and answers to give context:

How do users interface with your system? We are looking for API based approach vs. a browser based approach. We can get browser based approach to work but not sure if that is the right technical/architectural / scalable approach

Do you require a bulk upload method? Yes. We would need bulk upload functionality and some individual files may be larger as well

Will it be controlled by a human, or do you want it to upload automatically somehow? Certainly want it automatically

ultimately, we are building a SaaS solution that will take the audio files and meta data and perform analytics on it and deliver results of our analysis through an API back to the App. So the approach we are looking for is something that will work within this context

1

There are 1 answers

0
AbstactVersion On

I have a similar scenario.

If you intend to use Api Gateway/Lambda/s3 then you should know that there is a limit on the payload size that Gateway & Lambda can accept. Specifically, Api Gateway accepts payloads till 10 MB & Lambda till 6MB.

There is a workaround for this issue though. You can upload your files directly on an s3 bucket and attach a lambda trigger on object creation.

I'll leave some articles that may point you to the right direction :

  1. Uploading a file using presigned URLs : https://docs.aws.amazon.com/AmazonS3/latest/userguide/PresignedUrlUploadObject.html
  2. Lambda trigger on s3 object creation: https://medium.com/analytics-vidhya/trigger-aws-lambda-function-to-store-audio-from-api-in-s3-bucket-b2bc191f23ec
  3. A holistic view of the same issue: https://sookocheff.com/post/api/uploading-large-payloads-through-api-gateway/
  4. Related GitHub issue : https://github.com/serverless/examples/issues/106

So from my pov, regarding uploading files, the best way would be to return a pre-signed URL, then have the client upload the file directly to S3. Otherwise, you'll have to implement uploading the file in chunks.