Reading tag data for Ogg/Flac files

4.1k views Asked by At

I'm working on a C library that reads tag information from music files. I've already got ID3v2 taken care of, but I can't figure out how Ogg files are structured.

I opened a .ogg file in a hexeditor and I could find the tag data because that was all human readable. But everything from the beginning of the file to the tag data looked like garbage. How is this data encoded?

I don't need any help in the actual code, I just need help visualizing what a Ogg header looks like and what encoding it uses so I that I can read it. I'd like to use a non-hacky approach to reading Ogg files.

I've been looking at the Flac format, which has been helpful.

The Flac file I'm looking at has about 350 bytes between the "fLac" identifier and the human readable Comments section, and none of it is human readable in my hex editor, so I'm sure there has to be something important in there.

I'm using Linux, and I have no intention of porting to Windows or OS X. So if I need to use a glibc only function to convert the encoding, I'm fine with that.

2

There are 2 answers

3
jamessan On BEST ANSWER

As is described in the link you provided, the following metadata blocks can occur between the "fLaC" marker and the VORBIS_COMMENT metadata block.

  • STREAMINFO: This block has information about the whole stream, like sample rate, number of channels, total number of samples, etc. It must be present as the first metadata block in the stream. Other metadata blocks may follow, and ones that the decoder doesn't understand, it will skip.
  • APPLICATION: This block is for use by third-party applications. The only mandatory field is a 32-bit identifier. This ID is granted upon request to an application by the FLAC maintainers. The remainder is of the block is defined by the registered application. Visit the registration page if you would like to register an ID for your application with FLAC.
  • PADDING: This block allows for an arbitrary amount of padding. The contents of a PADDING block have no meaning. This block is useful when it is known that metadata will be edited after encoding; the user can instruct the encoder to reserve a PADDING block of sufficient size so that when metadata is added, it will simply overwrite the padding (which is relatively quick) instead of having to insert it into the right place in the existing file (which would normally require rewriting the entire file).
  • SEEKTABLE: This is an optional block for storing seek points. It is possible to seek to any given sample in a FLAC stream without a seek table, but the delay can be unpredictable since the bitrate may vary widely within a stream. By adding seek points to a stream, this delay can be significantly reduced. Each seek point takes 18 bytes, so 1% resolution within a stream adds less than 2k. There can be only one SEEKTABLE in a stream, but the table can have any number of seek points. There is also a special 'placeholder' seekpoint which will be ignored by decoders but which can be used to reserve space for future seek point insertion.

Just after the above description, there's also the specification of the format of each of those blocks. The link also says

All numbers used in a FLAC bitstream are integers; there are no floating-point representations. All numbers are big-endian coded. All numbers are unsigned unless otherwise specified.

So, what are you missing? You say

I'd like a non-hacky approach to reading Ogg files.

Why re-write a library to do that when they already exist?

0
Adisak On

The Ogg file format is documented here. There is a very nice graphical visualization as you requested with a detailed written description.

You may also want to look at libogg which is a open source BSD-licensed library for reading and writing Ogg files.