Is there a way to get the monophone probability using HTK?

80 views Asked by At

Ideally what I am looking for is a way to get a vector of probability that a particular segment of an audio file is a certain phone. Something like:

input:

  • wavfile
  • start position (e.g. @1.4 sec)
  • duration (e.g. 500 ms)

output:

  • SIL 2.324*10^-3
  • AA 1.514*10^-4
  • AE 1.482*10^-2
  • ...
  • ZH 5.03*10^-5
1

There are 1 answers

1
Dmytro Prylipko On BEST ANSWER

You can obtain the scores running HVite in forced alignment mode. I am afraid you have to run this for every phoneme you have:

HVite -A -D -T 1 -l '*' -o NTW -C HTK.cfg -a \
    -H macros \
    -H hmmdefs \
    -i acoustic_score_AA.mlf \
    -y lab \
    -I AA.mlf \
    -S index.scp \
    words phones

The output file acoustic_score_AA.mlf will contain the result. I

The contents of words vocabulary file should be like:

AA AA
AE AE
....
ZH ZH

and the phones has to contain the list of the phonemes (HMM models), as far as I remember.

The trick here is the content of the input .mlf file. For instance, AA.mlf should be like:

#!MLF!#
"*/S0001.lab"
AA
.

This will force HVite to apply the AA model for the whole utterance. Chunking of the audio file has to be performed in advance.