Simple compressed audio encoder for experimenting

377 views Asked by At

Mostly for my own experience and curiosity, I am trying to investigate formats of files and data to get a good enough understanding of them to output files that can be recognized by corresponding programs.
For example, by finding specifications of their structures online, I have been able to write fairly simple programs to produce files with uncompressed contents, in the format of WAV for audio, BMP for images, and Y4M for moving picture. I have also been able to gather enough information from the Internet to write a Python program that can compress and encode an RGB image to a JPEG.
I do not expect to be able to implement an encoder for especially efficient or compact formats, or write a better implementation that what already exists, but what options, if any, are there, for audio codecs that can store data in fewer bytes than uncompressed WAV, lossy or lossless, for which complete enough documentation is available that I can write a simple encoder to experiment with the process.
From my search so far, for most codecs, the documentation is incomplete at best. For example, the documentation on FLAC states there are four encoding modes for each subframe: constant, verbatim, fixed, and LPC. LPC is stated to use up to a 32nd order FIR filter for prediction, though I cannot glean from the provided information how the coefficients for them are determined, nor exactly how residuals are coded with Rice coding. For this or any other format, is there some kind of resource with examples of individual steps/aspects of the encoding process that I can look at?

1

There are 1 answers

0
ktmf On

Have you seen the papers linked from the FLAC format specification webpage under prediction?

Fixed linear predictor. FLAC uses a class of computationally-efficient fixed linear predictors (for a good description, see audiopak and shorten). FLAC adds a fourth-order predictor to the zero-to-third-order predictors used by Shorten. Since the predictors are fixed, the predictor order is the only parameter that needs to be stored in the compressed stream. The error signal is then passed to the residual coder.

The paper linked under shorten is quite exhaustive and almost exactly how FLAC uses so called fixed predictors. How rice codes work is explained as well.

As to how the LPC coefficients can be chosen efficiently: this can be done in quite a few ways. libFLAC does this by using the Yule-Walker equations. These equations take the autocorrelation of the signal as input, and after solving the system return LPC coefficients that can predict the signal quite well.

There are other ways to find suitable LPC coefficients. There are a few mentioned here:

Another way of identifying model parameters is to iteratively calculate state estimates using Kalman filters and obtaining maximum likelihood estimates within expectation–maximization algorithms.

It is possible to use regression too.