I have data stored in either a collection of files or in a single compound file. The compound file is formed by concatenating all the separate files, and then preceding everything with a header that gives the offsets and sizes of the constituent parts. I'd like to have a file-like object that presents a view of the compound file, where the view represents just one of the member files. (That way, I can have functions for reading the data that accept either a real file object or a "view" object, and they needn't worry about how any particular dataset is stored.) What library will do this for me?
The mmap
class looked promising since it's constructed from a file, a length, and an offset, which is exactly what I have, but the offset needs to be aligned with the underlying file system's allocation granularity, and the files I'm reading don't meet that requirement. The name of the MultiFile
class fits the bill, but it's tailored for attachments in e-mail messages, and my files don't have that structure.
The file operations I'm most interested in are read
, seek
, and tell
. The files I'm reading are binary, so the text-oriented functions like readline
and next
aren't so crucial. I might eventually also need write
, but I'm willing to forego that feature for now since I'm not sure how appending should behave.
I know you were searching for a library, but as soon as I read this question I thought I'd write my own. So here it is:
And I wrote another script to generate the "test.txt" file:
It worked for me. The files I tested on are not binary files like yours, and they're not as big as yours, but this might be useful, I hope. If not, then thank you, that was a good challenge :D
Also, I was wondering, if these are actually multiple files, why not use some kind of an archive file format, and use their libraries to read them?
Hope it helps.