Reading parts of large files from drive

15.1k views Asked by At

I'm working with large files in C# (can be up to 20%-40% of available memory) and I will only need small parts of the files to be loaded into memory at a time (like 1-2% of the file). I was thinking that using a FileStream would be the best option, but idk. I will need to give a starting point (in bytes) and a length (in bytes) and copy that region into a byte[]. Access to the file might need to be shared between threads and will be at random spots in the file (non-linear access). I also need it to be fast.

The project already has unsafe methods, so feel free to suggest things from the more dangerous side of C#

3

There are 3 answers

5
James King On BEST ANSWER

A FileStream will allow you to seek to the portion of the file you want, no problem. It's the recommended way to do it in C#, and it's fast.

Sharing between threads: You will need to create a lock to prevent other threads from changing the FileStream position while you're trying to read from it. The simplest way to do this:

//  This really needs to be a member-level variable;
private static readonly object fsLock = new object();

//  Instantiate this in a static constructor or initialize() method
private static FileStream fs = new FileStream("myFile.txt", FileMode.Open);


public string ReadFile(int fileOffset) {

    byte[] buffer = new byte[bufferSize];

    int arrayOffset = 0;

    lock (fsLock) {
        fs.Seek(fileOffset, SeekOrigin.Begin);

        int numBytesRead = fs.Read(bytes, arrayOffset , bufferSize);

        //  Typically used if you're in a loop, reading blocks at a time
        arrayOffset += numBytesRead;
    }

    // Do what you want to the byte array and return it

}

Add try..catch statements and other code as necessary. Everywhere you access this FileStream, put a lock on the member-level variable fsLock... this will keep other methods from reading/manipulating the file pointer while you're trying to read.

Speed-wise, I think you'll find you're limited by disk access speeds, not code.

You'll have to think through all the issues about multi-threaded file access... who intializes/opens the file, who closes it, etc. There's a lot of ground to cover.

1
Jonathan Wood On

I know nothing about the structure of these files, but reading a portion of a file with FileStream or similar sounds like the best and fastest way to do it.

You will not need to copy the byte[] since FileStream can read directly into a byte array.

It sounds like you might know more about the structure of the file, which could bring up additional techniques as well. But if you need to read only a portion of the file, then this would probably be the way to do it.

2
Mikael Svenson On

If you are using .Net 4 look into using memory mapped files in the System.IO.MemoryMappedFiles namespace.

They are perfect for reading small chunks out of large files. There are samples in the MSDN documentation.

You can also do this in earlier versions of .Net, but then you need to wrap the Win32 API (or use http://winterdom.com/dev/net),