How does a memory mapped file work for files larger than memory?

3.9k views Asked by At

I'm trying to work with data file that is larger than my memory.

My understanding so far is that it maps every byte in the file to an address in virtual memory. The data is only read to the real memory when you actually need it (for example accessing a specific entry), and it is read in chunks that are called pages.

But if I'm eventually going to process everything in that data file, doesn't that mean that everything needs to be read into the real memory eventually? Does the OS automatically decide which parts of the data already in memory to be freed to make room for extra data?

For this specific project I'm working with Python on Linux if that makes any difference. numpy.memmap

1

There are 1 answers

2
Kevin On

It depends.

Memory-mapped files work in almost exactly the same way as traditional paging works, except that instead of moving data between memory and the pagefile, the operating system moves data between memory and some arbitrary file that you specify.

So if you run out of physical memory (that is, the actual RAM chips that you have on your motherboard), that's fine. The operating system will just page out whichever parts of the file it thinks you're not going to use. If it guesses wrong, you'll have poor performance, but you won't crash or anything.

But if you run out of virtual memory, or address space, that's not fine. In this case, your program runs out of memory addresses and will no longer be able to allocate memory. You will also be unable to grow the memory-mapped region of the file. For a 32-bit program, the limit is somewhat smaller than 4 GB (the precise limit varies by operating system and programming environment, and depends on the overhead of those systems). For a 64-bit program, the limit is normally huge, though exactly how huge will depend on your architecture and operating system.