I'm trying to use PocketSphynx to find words inside a wav file. It's actually a real challenge, since the documentation is really poor (null sometimes).
import os
from pocketsphinx import AudioFile
from pocketsphinx import Pocketsphinx, LiveSpeech, get_model_path, get_data_path
import speech_recognition as sr
# Frames per Second
fps = 100
r = sr.Recognizer()
framerate = 100
with sr.AudioFile("audiotestcorto.wav") as source:
audio = r.record(source)
decoder = r.recognize_sphinx(audio, language = "en-US", show_all=True)
for s in decoder.seg():
print('| %4s s | %4s s | %8s |' % (s.start_frame , s.end_frame , s.word))
this is my code, and it doesn't throw any error (already an achievement, after some dark days :) )
The problem is that the timestamps of the words are wrong.
why timestamps are wrong? the error is around 2 - 3 seconds, and it doesn't get better even if I multiply or divide by the framerate.