How to convert byte array to audio file?

2.3k views Asked by At

I have written a program that gets SIP packets in real time from the network and I want to use the SDP information embedded in the packets to capture the audio conversation from two VOIP soft phones.

Once I retrieve the binary data from the RTP protocol how should I go about converting it into a sound file?

c++ preferred.

2

There are 2 answers

2
tomrtc On BEST ANSWER

Hi Adrian and welcome,

You are right, we cannot directly put the RTP payloads in a file concatenated one after another and then reading this file as an audio file, let's say a ".wav".

The missing part that you are looking for is a piece of code that re-assemble, decode and play-out the rtp flow of packets into voice samples; for the sake of simplicity, consider the wellknown G.711 or PCM codec because all SIP phone support this codec. You need to implement a Playout buffer (logically an infinite buffer but a ring buffer with wrap around is ok).

The packet itself contains audio data in small payload of 20ms duration. Each chunks of audio data is preceded with a RTP header, which indicates the type of encoding (This is related to the SDP information and you have a good understanding of that part).

For each packet:

  1. Decode the 8-bits values into 16 bits samples at the right rate usually 8,000 times per second for G.711;

  2. Compute from the RTP header the play-out point, it is the index in the play-out buffer array. Take into account jitter and re-ordering based on RTP timestamp

  3. Write the samples into a .wav or play it to an audio device.

From a pragmatical point of view, you may do that in several ways:

  • You collect all the UDP/RTP packets in a capture file and use wireshark to do the hard work;
  • Use an existing tool, like playSIP A command-line SIP session recorder;
  • Grab a library or write existing code for that purpose but that is not an easy task. You can think about handling packet loss for instance.
0
mail2subhajit On

if your requirement is only from the audio recording point of view

( .wav file - audio codec used in the call is a-law /u-law)

This approach you can take without coding .

Use Wireshark to capture the network packets ( in pcap file)

Wireshark-> Telephony -> Stream Analysis

In Stream Analysis windows -> Save ( drop down menu - select Forward/reverse stream Audio)

Save it in .raw file format.

Open the .raw file format in Audacity and convert it to .wav file.

I hope it helps you.