I have a ttml file that contains video captions, I want to fetch thru all the pairs time\caption and place them into a JSON file, I have tried https://www.npmjs.com/package/ttml?activeTab=readme but it did not work this one. Any ideas ? Thank you
How to browse thru TTML and get all the time\captions into JSON file
438 views Asked by Lydia halls AtThere are 2 answers

Try looking at https://github.com/sandflow/imscJS for code that extracts the Intermediate Synchronic Documents (ISDs) - e.g. the file isd.js may be relevant.
By the way, it's worth noting that the data model in TTML doesn't exactly match the idea of a mapping between pairs of times and individual captions. You may get duplications.
Each ISD is a snapshot between two moments on the timeline in which the presented content does not change.
This is an important distinction because in TTML it is possible to have the same "caption" appear at times that overlap with other captions appearing and disappearing, for example:
...
<div begin="10s" end="20s">
<p>This text appears at 10s and disappears by 20s</p>
<p end="5s">This text appears at 10s and disappears by 15s</p>
<p begin="5s">This text appears at 15s and disappears by 20s</p>
</div>
...
So the result in ISDs is:
0->10s [nothing]
10s->15s
This text appears at 10s and disappears by 20s
This text appears at 10s and disappears by 15s
15s->20s
This text appears at 10s and disappears by 20s
This text appears at 15s and disappears by 20s
20s-> [nothing]
As you can see that first line appears in two ISDs. It's up to you in your application how you deal with this, of course.
For folks that prefer Python, ttconv can split TTML/IMSC documents into a series of Intermediate Synchronic Documents (ISDs), each one corresponding to a period of time where the contents of the TTML/IMSC document is static.
ttconv also supports conversion from TTML/IMSC to SRT, which is a simple text-based format. All styling information is lost however.