convert sound to list of phonemes in python

Question

convert sound to list of phonemes in python

22.4k views Asked by Roman At 08 June 2015 at 09:02

How do I convert any sound signal to a list phonemes?

I.e the actual methodology and/or code to go from a digital signal to a list of phonemes that the sound recording is made from.
eg:

lPhonemes = audio_to_phonemes(aSignal)

where for example

from scipy.io.wavfile import read
iSampleRate, aSignal = read(sRecordingDir)

aSignal = #numpy array for the recorded word 'hear'
lPhonemes = ['HH', 'IY1', 'R']

I need the function audio_to_phonemes

Not all sounds are language words, so I cannot just use something that uses the google API for example.

Edit
I don't want audio to words, I want audio to phonemes. Most libraries seem to not output that. Any library you recommend needs to be able to output the ordered list of phonemes that the sound is made up of. And it needs to be in python.

I would also love to know how the process of sound to phonemes works. If not for implementation purposes, then for interest sake.

Original Q&A

There are 4 answers

Marcus Müller On 08 June 2015 at 09:06

I need to create the function audio_to_phonemes

You're basically saying:

I need to re-implement 40 years of speech recognition research

You shouldn't be implementing this yourself (unless you're about to be a professor in the field of speech recognition and have a revolutionary new approach), but should be using one of the many existing frameworks. Have a look at sphinx / pocketsphinx!

ruoho ruotsi On 09 October 2020 at 01:54

Have a look at Allosaurus, a universal (~2000 lang) phone recognizer to give you IPA phonemes. On a sample wave file, I did downloaded the latest model and tried this in Python3.

$ python -m allosaurus.bin.download_model -m latest
$ python -m allosaurus.run -i sample.wav
æ l u s ɔ ɹ s

Sivakumar On 08 January 2024 at 07:30

You could this Pretrained transoformer. This will return list of phonemes for the audio file.

**Nikolay Shmyrev** · Accepted Answer · 2015-06-08T17:15:52+00:00

Accurate phoneme recognition is not easy to achieve because phonemes themselves are pretty loosely defined. Even in good audio the best possible systems today have about 18% phoneme error rate (you can check LSTM-RNN results on TIMIT published by Alex Graves).

In CMUSphinx phoneme recognition in Python is done like this:

from os import environ, path

from pocketsphinx.pocketsphinx import *
from sphinxbase.sphinxbase import *

MODELDIR = "../../../model"
DATADIR = "../../../test/data"

# Create a decoder with certain model
config = Decoder.default_config()
config.set_string('-hmm', path.join(MODELDIR, 'en-us/en-us'))
config.set_string('-allphone', path.join(MODELDIR, 'en-us/en-us-phone.lm.dmp'))
config.set_float('-lw', 2.0)
config.set_float('-beam', 1e-10)
config.set_float('-pbeam', 1e-10)

# Decode streaming data.
decoder = Decoder(config)

decoder.start_utt()
stream = open(path.join(DATADIR, 'goforward.raw'), 'rb')
while True:
  buf = stream.read(1024)
  if buf:
    decoder.process_raw(buf, False, False)
  else:
    break
decoder.end_utt()

hypothesis = decoder.hyp()
print ('Phonemes: ', [seg.word for seg in decoder.seg()])

You need to checkout latest pocketsphinx from github in order to run this example. Result should look like this:

  ('Best phonemes: ', ['SIL', 'G', 'OW', 'F', 'AO', 'R', 'W', 'ER', 'D', 'T', 'AE', 'N', 'NG', 'IY', 'IH', 'ZH', 'ER', 'Z', 'S', 'V', 'SIL'])

TechQA.

convert sound to list of phonemes in python

There are 4 answers

Related Questions in PYTHON

Related Questions in SIGNAL-PROCESSING

Related Questions in VOICE-RECOGNITION

Related Questions in PHONEME

Popular Questions

Trending Questions