Why fsync() takes much more time on Linux kernel 3.1.* than kernel 3.0

2.2k views Asked by At

I have a test program. It takes about 37 seconds on Linux kernel 3.1.*, but only takes about 1 seconds on kernel 3.0.18 (I just replace the kernel on the same machine as before). Please give me a clue on how to improve it on kernel 3.1. Thanks!

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>


int my_fsync(int fd)
{
    // return fdatasync(fd);
    return fsync(fd);
}


int main(int argc, char **argv)
{
    int rc = 0;
    int count;
    int i;
    char oldpath[1024];
    char newpath[1024];
    char *writebuffer = calloc(1024, 1);

    snprintf(oldpath, sizeof(oldpath), "./%s", "foo");
    snprintf(newpath, sizeof(newpath), "./%s", "foo.new");

    for (count = 0; count < 1000; ++count) {
    int fd = open(newpath, O_CREAT | O_TRUNC | O_WRONLY, S_IRWXU);
    if (fd == -1) {
        fprintf(stderr, "open error! path: %s\n", newpath);
        exit(1);
    }

    for (i = 0; i < 10; i++) {
        rc = write(fd, writebuffer, 1024);
        if (rc != 1024) {
        fprintf(stderr, "underwrite!\n");
        exit(1);
        }
    }

    if (my_fsync(fd)) {
        perror("fsync failed!\n");
        exit(1);
    }

    if (close(fd)) {
        perror("close failed!\n");
        exit(1);
    }

    if (rename(newpath, oldpath)) {
        perror("rename failed!\n");
        exit(1);
    }

    }

    return 0;
}


# strace -c ./testfsync
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 98.58    0.068004          68      1000           fsync
  0.84    0.000577           0     10001           write
  0.40    0.000275           0      1000           rename
  0.19    0.000129           0      1003           open
  0.00    0.000000           0         1           read
  0.00    0.000000           0      1003           close
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         3           brk
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         2           setitimer
  0.00    0.000000           0        68           sigreturn
  0.00    0.000000           0         1           uname
  0.00    0.000000           0         1           mprotect
  0.00    0.000000           0         2           writev
  0.00    0.000000           0         2           rt_sigaction
  0.00    0.000000           0         6           mmap2
  0.00    0.000000           0         2           fstat64
  0.00    0.000000           0         1           set_thread_area
------ ----------- ----------- --------- --------- ----------------
100.00    0.068985                 14099         1 total
2

There are 2 answers

1
qxia On

Found the reason. File system barriers enabled by default in ext3 for Linux kernel 3.1 (http://kernelnewbies.org/Linux_3.1). After disable barriers, it becomes much faster.

0
David Schwartz On

Kernel 3.1.* is actually doing the sync, 3.0.18 is faking it. Your code does 1,000 synchronized writes. Since you truncate the file, each write also enlarges the file. So you actually have 2,000 write operations. Typical hard drive write latency is about 20 milliseconds per I/O. So 2,000*20 = 40,000 milliseconds or 40 seconds. So it seems about right, assuming you're writing to a typical hard drive.

Basically, by syncing after each write, you give the kernel no ability to efficiently cache or overlap the writes and force worst-case behavior on every operation. Also, the hard drive winds up having to seek back and forth between where the data is written and where the metadata is written once for each write.