How to Retrieve Metadata of AWS-Stored Audio Files in GitHub Actions?

58 views Asked by At

I'm facing an issue with accessing metadata from audio files stored in AWS S3 when running my script as a GitHub Action. The code works perfectly when executed locally on my Mac but encounters problems in the GitHub Actions environment.

Here's the relevant part of my code:

from pydub import AudioSegment
from pydub.utils import mediainfo
from io import BytesIO
import boto3

def _load_audio_from_s3(self, s3_bucket: str, s3_key: str):
    """Open an audio file from S3 and load all attributes of the Audio class
    This method loads audio as a pydub AudioSegment.
    It generates a temporary presigned URL for the S3 object (url_lifetime = 3600s)
    to access the metadata without downloading the file locally."""

    s3 = boto3.client("s3")
    response = s3.get_object(Bucket=s3_bucket, Key=s3_key)
    audio_data = BytesIO(response["Body"].read())
    audio = AudioSegment.from_file(audio_data)
    try:
        url_lifetime = 3600
        url = s3.generate_presigned_url(
            "get_object",
            Params={"Bucket": s3_bucket, "Key": s3_key},
            ExpiresIn=url_lifetime,
        )
    except ClientError as e:
        raise ValueError(f"Error generating URL for {s3_key}: {e}") from e
    metadata = mediainfo(url)
    compression = metadata.get("codec_name")
       

The process is as follows:

Load the audio using pydub's AudioSegment. Generate a presigned URL so that mediainfo can access the metadata from the audio file, such as its "codec_name". The issue arises specifically with the mediainfo function in GitHub Actions. While connecting to AWS and retrieving the audio as a pydub.AudioSegment works, accessing the metadata fails. This is not an AWS access key issue, as my credentials are correctly set up in GitHub Secrets and function for audio retrieval.

When printed, the generated presigned URL appears as follows (with sensitive information redacted):

https://my-s3-bucket.s3.amazonaws.com/test/audio.wav?AWSAccessKeyId=***&Signature=***&Expires=1699793746

However, pydub's mediainfo function fails to read the metadata and returns an empty dictionary (metadata={}). I also tested the url accessibility in my action, and got "HTTP Status Code: 400" indicating an error.

I'm looking for insights or solutions to address this discrepancy between local execution and GitHub Actions. Any assistance or suggestions would be greatly appreciated.

0

There are 0 answers