I'm trying to read the data from a .wav file.
import wave
wr = wave.open("~/01 Road.wav", 'r')
# sample width is 2 bytes
# number of channels is 2
wave_data = wr.readframes(1)
print(wave_data)
This gives:
b'\x00\x00\x00\x00'
Which is the "first frame" of the song. These 4 bytes obviously correspond to the (2 channels * 2 byte sample width) bytes per frame, but what does each byte correspond to?
In particular, I'm trying to convert it to a mono amplitude signal.
If you want to understand what the 'frame' is you will have to read the standard of the wave file format. For instance: https://web.archive.org/web/20140221054954/http://home.roadrunner.com/~jgglatt/tech/wave.htm
From that document:
To convert to mono you could do something like this,
There is no clear-cut way to go from stereo to mono. You could just drop one channel. Above, I am averaging the channels. It all depends on your application.