Can the data=journal mode of EXT4 avoid user data loss?

1.9k views Asked by At
  • journal mode

data=journal mode provides full data and metadata journaling. All new data is written to the journal first, and then to its final location.

In the event of a crash, the journal can be replayed, bringing both data and metadata into a consistent state. This mode is the slowest except when data needs to be read from and written to disk at the same time where it outperforms all others modes. Enabling this mode will disable delayed allocation and O_DIRECT support.

Here I have a few questions, please take a look at it:

  1. Configure data=journal, then the user calls write(), does the write() return after the data is successfully written to the journal, or does it return the user success after entering the pagecache? If it is the latter, it means that the journal is submitted asynchronously, so the meaning of the journal of ext4 is to ensure the consistency of the file system itself, and there is no guarantee that user data will not be lost?

  2. If ext4 submits the journal asynchronously, when will the journal be triggered?

  3. Is there any other file system that allows the journal to be synchronized before write() returns successfully?

According to the results of my local experiments, it is inferred that the journal should be submitted asynchronously. I used a separate ssd partition as journal_dev. When I used fio to test and write files, I found that the io of journal_dev was intermittent, not always having IO.

1

There are 1 answers

0
Anon On
  1. the write() will return the user success after it has entered the page cache (assuming you aren't using any extra options on open()).
  2. At least periodically (see commit= in https://www.kernel.org/doc/Documentation/filesystems/ext4.txt ) and probably before any pending sync/fsync etc are allowed to complete.
  3. No (otherwise it would defeat the point of buffering).

If you were to pass O_SYNC to open() or to do an additional fsync you will learn about when your write made it to stable media as far as the kernel can know.