How can I get live transcription on OS X (without audio files)?

191 views Asked by At

I'm working on an app for people stuck in superfluous meetings who need to know when someone asks them a question.

My plan is:

  1. Stream the audio of the meeting (what normally comes out of my speakers) into a speech-to-text program
  2. Stream that into something that watches for my name and/or rising intonation for questions
  3. Have the program "ding" when someone asks me a question. Then I can quickly read the text and answer.

The hard part is step (1). All the speech-to-text programs I found accept audio files as input, and cannot just stream from whatever channel goes to the speakers/headphones. Assistive programs I found, on the other hand, take over keyboard input. Ideally, users will be able to do productive work by typing in other apps during the meeting, so that kind of solution won't work.

So I'm looking for something I can use on OS X that will either handle step (1) or even better do most of the steps above for me.

I've done research into solutions and can't find anything for step (1). I'm including the other steps because there may be a more creative solution for the overall program (such as some other assistive technology not for dictation) that I don't know about.

2

There are 2 answers

0
Nikolay Shmyrev On BEST ANSWER

You can use many APIs, for example the streaming API from Google, it is not totally free though.

If you tolerate lower accuracy you can use open source software like CMUSphinx.

The problem is also how to get audio stream from the voip software, you have to hack it yourself. Or you have to re-record what is played on speakers, it is not always a good idea.

0
Rampartisan On

1) I have used LoopBack for inter-app audio routing, essentially a virtual mixer that pipes audio from 1 app into another. It shows up as an audio input device and also allows monitoring - so you can listen as well as stream to another app.

2 and 3) Not really my area of expertise, but I would probably investigate any google API's (as Nikolay said) to start my research.