I'm building something like a "brainstorming" tool: A group of people can shout terms into a microphone. The input is translated into text (google speech to text) and displayed in a word cloud. The word cloud groups the same words (or terms). But I can't identify the individual terms correctly. Google can only split the input if a long silence is between them. If two people shout short after each other the different ideas are handled as one single idea. Thats not what I want. Any ideas? E.g. one person says "dark blue" and one person says "dark red". Google gives me one output "dark blue dark red".

1 Answers

Nikolay Shmyrev On

They have experimental speaker diarization function, it does not work very reliably though. Speaker separation is supported by other toolkits and APIs too.