Algorithm to remove vocal from sound track with bytes in sample code

1k views Asked by At

I want to remove vocals from mp3 sound tracks(remove signer voice from the song file), I turned the song file into byte lists but don't know how to remove it's vocal with bytes. does any body knows the algorithm of removing with bytes ?(I would be happy if you explain with a sample code with any languages [I work with dart]). I read this article but the bytes haven't left and right :

Algorithm to remove vocal from sound track

1

There are 1 answers

5
Wisblade On

Removing a voice isn't so simple. Usually, it's a combination of several tricks, like band-stop filters, spectrographic analysis (i.e. you'll need to use a FFT, Fast-Fourier Transform to switch to frequencies), and so on.

Simply "substracting" the two channels (i.e. phase cancellation) can't work if the original song wasn't properly recorded in studio, with voices being the ONLY centered track. If anything else (like drums or bass) is ALSO centered, you're dead.

Also, no algorithm would work "out-of-the-box": you'll need to set some parameters in order to let it work properly.

For example, to setup band-stop filters:

  • Common human voices (spoken): 500-1000Hz
  • Male reference range: 64-523Hz
  • Female reference range: 160-1200Hz
  • Male bass: 82-392Hz
  • Male baritone: 123-493Hz
  • Male tenor: 164-698Hz
  • Female bass: 82-392Hz
  • Female mezzo-soprano: 123-493Hz
  • Female soprano: 220-1100

So if your song's singers are both a male bass and a female soprano, you'll need to cut all frequencies from 82 to 392 Hz (male) and from 220 to 1100 Hz (female). So finally, everything between 82 to 1100 Hz... That won't let so much instruments left! So you'll need to put markers on your timeline, when each singer is singing, and cut bands ONLY during these short periods - so you won't damage too much instruments.

The "right" way should be to try most of these tricks, on the tiniest possible durations (i.e. when a human is singing). You should first start to tag all these intervals so that you can try each algorithm on each sound sequence, and keep each time only the best one.

But if you're already lost by a "simple" phase cancellation, you may never be able to properly clean your song from its vocals. It's a quite advanced signal processing, and it will be even harder to apply if you don't know anything about signal processing.