I am using javax.sound.sampled
and JLayer
to play a MP3
file. I am trying to analyze the audio input stream to determine when the song starts and when it ends (based on the audio levels in the beginning and end of the MP3). A 4 minute song may only have 3 minutes and 55 seconds of actual music while the rest is silence, which is why I am determining this.
I thought I could determine this information by finding the first and last non-zero bytes in the stream.
Problem: The issue is that when I adjust the buffer size, the position of the first non-zero byte changes. Why is this, and shouldn't it remain constant no matter the buffer size?
E.g. At a buffer size of 16, the startFrame correlates to the 17th byte. With a buffer size of 64, the startFrame correlates to the 65th byte.
Here is the code:
byte[] buffer;
int pos = 0;
short silenceThreshold = 1;
startFrame = 0;
endFrame = -1;
boolean startFrameSet = false;
buffer = new byte[16];
byte prevVal = 0;
for (int n = 0; n != -1; n = audioInputStream.read(buffer, 0,
buffer.length)) {
for (int i = 0; i < buffer.length; i++) {
if (buffer[i] >= silenceThreshold || buffer[i] <= -silenceThreshold) {
// Is not silent
if (!startFrameSet) {
startFrame = (pos * buffer.length) + i;
startFrameSet = true;
}
} else {
// Silence
// If the previous value is > 0 or < 0, set endFrame
if (prevVal >= silenceThreshold || prevVal <= silenceThreshold) {
endFrame = (pos * buffer.length) + i;
}
}
prevVal = buffer[i];
}
pos++;
}
//If last byte is not within silence threshold (song doesn't end in silence).
if (prevVal >= silenceThreshold || prevVal <= silenceThreshold) {
// last frame is not silent
endFrame = -1;
}
I figure I misunderstood how the audio input stream and audio in general works.
Your outer
for
loop does not read from the audio input stream on the first pass through the loopis equivalent to:
so on the first loop the buffer is just the zero initialized array from
new byte[16]
.You should not assume the read fills the whole buffer, use the value returned by the read.