Getting audio sound level from FLTP audio stream

160 views Asked by At

I need to get audio level or even better, EQ data from NDI audio stream in C++. Here's the struct of a audio packet:

// This describes an audio frame.
typedef struct NDIlib_audio_frame_v3_t {
    // The sample-rate of this buffer.
    int sample_rate;

    // The number of audio channels.
    int no_channels;

    // The number of audio samples per channel.
    int no_samples;

    // The timecode of this frame in 100-nanosecond intervals.
    int64_t timecode;

    // What FourCC describing the type of data for this frame.
    NDIlib_FourCC_audio_type_e FourCC;

    // The audio data.
    uint8_t* p_data;

    union {
        // If the FourCC is not a compressed type and the audio format is planar, then this will be the
        // stride in bytes for a single channel.
        int channel_stride_in_bytes;

        // If the FourCC is a compressed type, then this will be the size of the p_data buffer in bytes.
        int data_size_in_bytes;
    };

    // Per frame metadata for this frame. This is a NULL terminated UTF8 string that should be in XML format.
    // If you do not want any metadata then you may specify NULL here.
    const char* p_metadata;

    // This is only valid when receiving a frame and is specified as a 100-nanosecond time that was the exact
    // moment that the frame was submitted by the sending side and is generated by the SDK. If this value is
    // NDIlib_recv_timestamp_undefined then this value is not available and is NDIlib_recv_timestamp_undefined.
    int64_t timestamp;

#if NDILIB_CPP_DEFAULT_CONSTRUCTORS
    NDIlib_audio_frame_v3_t(
        int sample_rate_ = 48000, int no_channels_ = 2, int no_samples_ = 0,
        int64_t timecode_ = NDIlib_send_timecode_synthesize,
        NDIlib_FourCC_audio_type_e FourCC_ = NDIlib_FourCC_audio_type_FLTP,
        uint8_t* p_data_ = NULL, int channel_stride_in_bytes_ = 0,
        const char* p_metadata_ = NULL,
        int64_t timestamp_ = 0
    );
#endif // NDILIB_CPP_DEFAULT_CONSTRUCTORS
} NDIlib_audio_frame_v3_t;

Problem is that unlike video frames I have absolutely no idea how binary audio is packed and there's much less information about it online. The best information I found so far is this project:

https://github.com/gavinnn101/fishing_assistant/blob/7f5fcd73de1e39336226b5969cd1c5ca84c8058b/fishing_main.py#L124

It uses PyAudio however which I'm not familiar with and they use 16 bit audio format while mine seems to be 32bit and I can't figure out the struct.unpack stuff either because "%dh"%(count) is telling it some number then h for short which I don't understand how it would interpret.

Is there any C++ library that can take pointer to the data and type then has functions to extract sound level, sound level at certain hertz etc?

Or just some good information on how I would extract this myself? :)

I've searched the web a lot while finding very little. I've placed a breakpoint when the audio frame is populated but given up once I realize there's too many variables to think of that I don't have a clue about like sample rate, channels, sample count etc.

1

There are 1 answers

0
Akandesh On

Got it working using

// This function calculates the RMS value of an audio frame
float calculateRMS(const NDIlib_audio_frame_v2_t& frame)
{
   // Calculate the number of samples in the frame
   int numSamples = frame.no_samples * frame.no_channels;

   // Get a pointer to the start of the audio data
   const float* data = frame.p_data;

   // Calculate the sum of the squares of the samples
   float sumSquares = 0.0f;
   for (int i = 0; i < numSamples; ++i)
   {
       float sample = data[i];
       sumSquares += sample * sample;
   }

   // Calculate the RMS value and return it
   return std::sqrt(sumSquares / numSamples);
}

called as

    // Keep receiving audio frames and printing their RMS values
    NDIlib_audio_frame_v2_t audioFrame;
    while (true)
    {
        // Wait for the next audio frame to be received
        if (NDIlib_recv_capture_v2(pNDI_recv, NULL, &audioFrame, NULL, 0) != NDIlib_frame_type_audio)
            continue;

        // Print the RMS value of the audio frame
        std::cout << "RMS: " << calculateRMS(audioFrame) << std::endl;

        NDIlib_recv_free_audio_v2(pNDI_recv, &audioFrame);
    }

Shoutout to chatGPT for explaining and feeding me with possible solutions until I managed to get a working solution :--)