I have around 20,000 .wav files (all voice lines) that I need to strip the silence from the start AND end of.
The "silence" isn't pure silence, so I'll need to set a threshold.
I'd also like to leave a little "silence" before the actual sound/voice starts, so each file would get trimmed but .X seconds of the original silence remains.
I've tried various commands and can't get it to set a threshold correctly. I've seen a lot of internet comments about doing this, so I must be using the command wrong.
I also can't figure out how to leave .X seconds of silence.
I assume sox can do this, or at least most of it?
Trimming silence at the start and end
One solution would be (based on this Digital Cardboard blog post) to call sox like this:
X
is the minimum duration (in seconds) of a sound in order to be interpreted as non-silence by sox. For example there might be a loud clicking sound at the beginning of the audio that is 0.15 seconds long. If we set0.2
forX
then this loud but short click will be interpreted as silence and will be removed. If forX
we set0.1
then the click will be interpreted by sox as the start of the non-silence part, meaning everything before the click will be removed but not the click itself.Also note that a trailing zero should be used if the duration is a whole number, so
1.0
should be used instead of1
to avoid unexpected behavior.Y
defines a loudness threshold. Everything below it will be interpreted as silence, no matter how long or short it is. So some long rumbling sound at the beginning, that is not very loud might fall below the threshold and thus gets interpreted as silence and thus is not removed. Everything that is loud enough to be above the threshold will be interpreted as the start of non-silence if its duration is long enough (seeX
).Note that digitalcardboard states that the smallest value to be used should be
0.1%
instead of0
.1
simply specifies to remove silence only at the beginning. To trim silence at the end we use the same but reverse the audio first. Why this approach is correct for trimming the end should become apparent below, where I analyze what the solutions of the other answers do further below.Leaving a certain amount of silence at the beginning
The simple answer is: sox does not support this.
But we can try to work around this by trimming the silence and then add a fixed amount of silence at the beginning. This can be done with:
X
is the duration (in seconds) of the silence that we want to prepend.0
in this position means that no padding should be added at the end.Of course this is not the same as keeping some duration of the original silence (if present), because that would also allow result files that don't have any silence at the beginning if the input also doesn't have any silence at the beginning. Still, trimming + padding is the best I could come up with.
Other answers
So far all the answers here are no solution for the question. OP wanted to remove silence from the start and the end. Here is what the previous solutions do instead, for the interested:
Oh, and all of those answers provide no solution for keeping some of the silence at the beginning as asked by the OP.