I have audio recordings of telephone conversations,
I used Resemblyzer it clusters audio based on speakers. the output is labelling
, which is basically a dictionary of which person spoke when (speaker_label, start_time, end_time)
I need to segments audio out speaker wise based on the time in labelling. I've been working on this for a week.
from resemblyzer import preprocess_wav, VoiceEncoder
from pathlib import Path
import pickle
import scipy.io.wavfile
from spectralcluster import SpectralClusterer
audio_file_path = 'C:/Users/...'
wav_fpath = Path(audio_file_path)
wav = preprocess_wav(wav_fpath)
encoder = VoiceEncoder("cpu")
_, cont_embeds, wav_splits = encoder.embed_utterance(wav, return_partials=True, rate=16)
print(cont_embeds.shape)
clusterer = SpectralClusterer(
min_clusters=2,
max_clusters=100,
p_percentile=0.90,
gaussian_blur_sigma=1)
labels = clusterer.predict(cont_embeds)
def create_labelling(labels, wav_splits):
from resemblyzer.audio import sampling_rate
times = [((s.start + s.stop) / 2) / sampling_rate for s in wav_splits]
labelling = []
start_time = 0
for i, time in enumerate(times):
if i > 0 and labels[i] != labels[i - 1]:
temp = [str(labels[i - 1]), start_time, time]
labelling.append(tuple(temp))
start_time = time
if i == len(times) - 1:
temp = [str(labels[i]), start_time, time]
labelling.append(tuple(temp))
return labelling
labelling = create_labelling(labels, wav_splits)
this code helps a lot: first add a time_stamps.txt file containg time stamps to trim the audio on(time_stamps.txt file should be comma separated). then add the audio file name and it's format and it does the job. I find this on github, https://github.com/raotnameh/Trim_audio