When reading files off of a hard drive, mmap is generally regarded as a good way to quickly get data into memory. When working with optical drives, accesses take more time and you have a higher latency to worry about. What approach/abstraction do you use to hide/eliminate as much latency and/or overall load time of the optical drive as possible?
What approach works best for quickly reading files off of optical drives?
697 views Asked by spate AtThere are 5 answers
Slow drives are going to be slow. Sorry. However, optical drive hardware will normally be optimized to do sequential reads, so if you can make your code work that way you might see some improvement. I doubt you'll see much difference between mmap()
, fread()
, et al, for sequential access. You might also be able to tune your read buffer size to be a multiple of the drive's block size, if your OS isn't already doing that for you. Optical drive can have large block sizes compared to hard drives, and if your buffers aren't large enough you're paying a price.
First you must keep in mind, that modern optical drives are quite fast reading sequential data, but seeking data is still a lot slower than on HDs. So if you must seek a lot within a big file (e.g. jump randomly around within a 500+ MB file), it might actually be faster to first copy the whole 500 MB to HD (into a temporary file), which will be done in sequential, fast reads, perform the operation on the temp file (much faster since much faster access times on HD) and delete the file again if you are done with it.
The same of above applies to little big vs many small files as well. Working with a couple of big files is much faster than with many small files, since every time you switch from one small file to another one the huge seeking time will give you headaches again. This is the reason why many games that ship on optical media packs game data in huge archive files (e.g. all textures of one level are in one huge file instead of having one small file per texture), so try keeping data well structured in big files you can read as sequential as possible.
HD caching itself is a good technique. There is this game I remember, though I forgot the title, that always kept the 3D data of your environment on HD. While you were moving through the world, it was constantly copying data from DVD to HD. Thus the surrounding 3D landscape was always available on HD for fast access, however not the whole DVD was copied, only about 200-300 MB were temporarily cached on HD to save HD space. The only annoying thing about that was that you often had DVD access "noise" while playing the game, however most of the time the whole process was happening only during CPU idle times, so it did not really affect game play. Only if you ran very fast constantly within the same direction it could happen that the DVD drive was falling back and all of a sudden the game stopped with a loading indicator for a couple of seconds. However I've been playing this games for days and maybe saw this loading indicator three times within a single week. If you were moving slow or not constantly into the same direction, there never was a loading indicator.
I'm not sure that there is a lot that you can do by the time that you are reading it. You could look at the create file API -- you can pass some hints to Windows that tell it that you are opening the file for Sequential or Random access. That is supposed to allow Windows to optimize the caching strategy used for the file.
You can tune the "chunks" that you bite off when reading your file to make them larger or smaller. You might get a slight improvement if you read in chunks that are multiples of the allocation unit size on the disk.
The hardware and media can make a difference. Say you have a DVD drive that reads at 16x. It will require media that is rated at 16x or higher, and some drives don't work well with some media brands. So even if the media meets the ratings, you might not be reading at the maximum speed. (usually a good hardware review on an optical drive will include details like this).
The layout of the files on the optical disk could be important. Was it burned all at once? Was it just mounted as a disk (like a packet-mode R/W?). I don't have experience with this, but given the longer seek times on an optical drive, fragmented files might have a greater impact than they do with a modern hard drive.
There's no real abstraction you can employ. Optical drives have very specific characteristics that must be optimized for to get the best performance.
Some tips:
The biggest killer on optical drives is seek time. Where possible make sure all the files you are reading are sequential on disc and as closely packed as possible. If you must seek then seek in one direction and as infrequently as possible.
Asynchronous reading can also massively improve performance. If you need to load and process files A,B & C then before processing A you should start reading file B, and while processing B you should be reading file C and so on.
Generally the more data you can read in one go the better, e.g avoid lots of little reads(). You will only get the theoretical throughput of a disc while reading large amounts of data. Some OS's /drivers will minimize the penalty of reading lots of little files by caching sectors, some will not.
Doing lots of exists(filename) checking can also be detrimental on some filesystems / OSs where only parts of the TOC are cached.
In our applications we usually pack files into one or more "lumped" files and have them ordered sequentially based on their access order. Some files (and directories) are compressed and read in their entirety before being decompressed in memory. This can be a win if you have a directory that contains a multitude of small files (e.g XML or scripts).
Basically lots of benchmarking and tweaking :)