Can I prevent a file page eviction without using mmap?

457 views Asked by At

My understanding is that I can keep a file in memory by doing an mmap on the file and then calling mlock on the mapped memory.

Is there a way to keep file data in the page cache without doing an mmap? Specifically I want to ensure that when I'm appending data to a file the page I'm writing to never gets evicted.

I realize this is rare, but there are cases where I believe it could occur. For example, the application writes data, waits longer than dirty_writeback_centisecs (after which the page becomes clean and can be evicted) and then writes more data.

1

There are 1 answers

0
Fat-Zer On

I believe you are a bit wrong in your understanding what mlock does. It's intended usage is for:

  1. Assert that there will be no waits on reading from the memory due to data was not loaded from disk yet or swapped out (useful for performance reasons and crucial in real-time applications).
  2. Assert that the pages won't be swapped out (crucial for private data such as passwords or private keys in clear-text).

So it asserts that the pages are loaded into RAM and prevents them from being swapped out. There are no guaranties that it prevents write-back of dirty pages mapped from a file (and it actually doesn't, see the experiment bellow).

To hint the kernel that you are going to make some reads from an fd soon there is posix_fadvise(), so

posix_fadvise(fd, offset, len,  POSIX_FADV_RANDOM);

will probably load the requested part of the file to the page cache.


I can't claim that for sure, but I suppose that there is actually no way to forbid writing back the dirty pages for a specific file as for now. There might be some way to hint it, but I don't see any either.


An experiment with mmap/mlock

alexander@goblin ~/tmp $ cat mmap.c
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>

#define handle_error(msg) \
    do { perror(msg); exit(EXIT_FAILURE); } while (0)

int main(int argc, char *argv[]) {
    char *addr;
    int fd;
    struct stat sb;
    size_t length;

    if (argc != 2) {
        fprintf(stderr, "%s file\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    fd = open(argv[1], O_RDWR);
    if (fd == -1) {
        handle_error("open");
    }
    if (fstat(fd, &sb) == -1) {          /* To obtain file size */
        handle_error("fstat");
    }

    length = sb.st_size;

    addr = mmap(NULL, length , PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
    if (addr == MAP_FAILED) {
        handle_error("mmap");
    }

    if(mlock(addr, length)<0) {
        handle_error("mlock");
    }

    strcpy(addr, "hello world!");

    sleep(100);

    munmap(addr, length);
    close(fd);

    exit(EXIT_SUCCESS);
}
alexander@goblin ~/tmp $ grep . /proc/sys/vm/dirty_{expire,writeback}_centisecs
/proc/sys/vm/dirty_expire_centisecs:1000
/proc/sys/vm/dirty_writeback_centisecs:500
alexander@goblin ~/tmp $ dd if=/dev/zero of=foo bs=4k count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 8.1296e-05 s, 50.4 MB/s
alexander@goblin ~/tmp $ fallocate -l 4096 foo
alexander@goblin ~/tmp $ sync foo
alexander@goblin ~/tmp $ sudo hdparm --fibmap foo

foo:
 filesystem blocksize 4096, begins at LBA 0; assuming 512 byte sectors.
 byte_offset  begin_LBA    end_LBA    sectors
           0  279061632  279061639          8
alexander@goblin ~/tmp $ sudo dd if=/dev/mapper/vg_main-gentoo_home count=8 skip=279061632 iflag=nocache 2>/dev/null | hexdump -C
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000
alexander@goblin ~/tmp $ gcc mmap.c
alexander@goblin ~/tmp $ ./a.out foo &
[1] 26450
alexander@goblin ~/tmp $ sudo hdparm --fibmap foo

foo:
 filesystem blocksize 4096, begins at LBA 0; assuming 512 byte sectors.
 byte_offset  begin_LBA    end_LBA    sectors
           0  279061632  279061639          8
alexander@goblin ~/tmp $ sudo dd if=/dev/mapper/vg_main-gentoo_home count=8 skip=279061632 iflag=nocache 2>/dev/null | hexdump -C
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000
alexander@goblin ~/tmp $ sleep 10
alexander@goblin ~/tmp $ sudo hdparm --fibmap foo

foo:
 filesystem blocksize 4096, begins at LBA 0; assuming 512 byte sectors.
 byte_offset  begin_LBA    end_LBA    sectors
           0  279061632  279061639          8
alexander@goblin ~/tmp $ sudo dd if=/dev/mapper/vg_main-gentoo_home count=8 skip=279061632 iflag=nocache 2>/dev/null | hexdump -C
00000000  68 65 6c 6c 6f 20 77 6f  72 6c 64 21 00 00 00 00  |hello world!....|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000
alexander@goblin ~/tmp $ fg
./a.out foo
^C