As discussed in a previous question, I have built a prototype (using MVC Web API, NAudio and NAudio.Lame) that is streaming live low quality audio after converting it to mp3. The source stream is PCM: 8K, 16-bit, mono and I'm making use of html5's audio tag.
On both Chrome and IE11 there is a 15-34 second delay (high-latency) before audio is heard from the browser which, I'm told, is unacceptable for our end users. Ideally the latency would be no more than 5 seconds. The delay occurs even when using the preload="none" attribute within my audio tag.
Looking more closely at the issue, it appears as though both browsers will not start playing audio until they have received ~32K of audio data. With that in mind, I can affect the delay by changing Lame's MP3 'bitrate' setting. However, if I reduce the delay (by sending more data to the browser for the same length of audio), I will introduce audio drop-outs later.
Examples:
- If I use Lame's V0 encoding the delay is nearly 34 seconds which requires almost 0.5 MB of source audio.
- If I use Lame's ABR_32 encoding, I can reduce the delay to 10-15 seconds but I will experience pauses and drop-outs throughout the listening session.
Questions:
- Any ideas how I can minimize the start-up delay (latency)?
- Should I continue investigating various Lame 'presets' in hopes of picking the "right" one?
- Could it be that MP3 is not the best format for live streaming?
- Would switching to Ogg/Vorbis (or Ogg/OPUS) help?
- Do we need to abandon HTML5's audio tag and use Flash or a java applet?
Thanks.
You can not reduce the delay, since you have no control on the browser code and buffering size. HTML5 specification does not enforce any constraint, so I don't see any reason why it would improve.
You can however implement a solution with webaudio API (it's quite simple), where you handle streaming yourself.
If you can split your MP3's chunk in fixed size (so that each MP3 chunks size is known beforehand, or at least, at receive time), then you can have a live streaming in 20 lines of code. The chunk size will be your latency.
The key is to use AudioContext::decodeAudioData.