0 .17 .29 d ih
1 .29 .73 k l ay n d
1 .73 .84 g ih
This is a sample of the .txt file that I am working on.
I have tried using the np.loadtxt() to extract the last column,
syl_array = []
try:
fid = open(syl_file, 'r')
syl_array = np.loadtxt(fid, usecols=(0, 1, 2, 3), dtype={'names': ('a', 'b', 'c', 'd'), 'formats': ('i4', 'f4', 'f4', 'U10')})
fid.close
except:
print('File does not exist')
return
labels = syl_array['a']
spurtStartTimes = syl_array['b']
spurtEndTimes = syl_array['c']
syllables = syl_array['d']
This code gives the following output,
--['d' 'k' 'g']--
But the output I want is,
--['d ih', 'k l ay n d', 'g ih']--
I want each group of syllables from the same row to be one element in the array. How do I achieve this?
If you have control over how the file itself is generated, what you are missing is a meaningful delimiter. The problem here is that there is no way for any standard parser to know that the space between
0and.17means you want those values to be in different column, whereas the space betweendandihdoes NOT mean this.If you replace the spaces that represent columns with delimiters other than space (i.e. comma or tab), you can get numpy to do what you want.
However, if you truly have no control over how
syl_fileis generated, then you will need to write your own custom parser. Depending on how big the file is, you could write something as simple as: