Speaker Diarization using Resemblyzer

1.1k views Asked by At

I am new to Speaker Diarization and was exploring Resemblyzer library and have a few questions. I looked at the diarization demo here: demo02_diarization.py

Use live audio stream instead of static audio files: I see that the demo uses a static mp3 file although in my use-case, I will be working with a realtime audio stream. Does Resemblyzer support streaming input for speech diarization? If so, is there somewhere I could find some resource/sample code to look into for reference?

Number of speakers unknown in the beginning of the audio stream: Unlike in the given "demo code" where the total number of speakers is pre-decided, in my usecase - I will be trying to stream audio from a live meeting which means that the total number of users might not be known in advance (yes, we know how many people were sent an invite to the meeting but not all might join necessarily). In that case, how can I enable Resemblyzer to not only be able to detect when a particular speaker is talking but also detect that there is a new user who is speaking if he has not spoken before? Does Resemblyzer support that feature? Where can I find some reference for that?

Pre-trained english model for diarization: I want to work with an already existing model and am okay using some pre-trained diarization model as long as it can detect a new speaker real-time. How can I find some pre-trained diarization models that I can just use right out of the box and see how well that model performs?

Thanks!

0

There are 0 answers