Where does the OS get the needed disk address when page fault happens from?

2k views Asked by At

When a page table entry(PTE) is not marked as valid, it means the data needed is not in memory, but on the disk. So now page fault happens and the OS is responsible to load this page of data from the disk to memory.

My question is, how does the OS know the exact disk address?

3

There are 3 answers

0
user3344003 On

You are asking in a system dependent manner. A PTE not marked as valid may mean the address does not exist at all in the process address. A system may have another bit to indicate that the address is valid but logical to physical mapping does not exist.

The operating system needs to maintain a table of where it put the data.

The data can exist in a number of places. 1. It might be uninitialized data that has no mapping anywhere. Respond to the page fault by clearing a physical page and mapping it to the process address space.

  1. It might be in the page file.

  2. Some systems have a separate swap file.

  3. It might be in the executable or shared library file.

0
Nadav Har'El On

The answer given in 2014 is correct. All the processor knows is that the page is missing - or sometimes that it had incorrect permission (e.g., write to a read-only page). At that point the processor generates a "page fault" exception which the kernel gets and now has to handle.

In some cases, this page fault will need to be passed all the way to the application, in Linux as a SIGSEGV ("segmentation violation") signal, e.g., when the user uses a null pointer. But, as you said, more usually, the kernel should and can handle the page fault. The kernel keeps, in its own tables (not inside the page table which is a structure with a specific format dictated by the processor) information about what each virtual-memory page is supposed contain. The following are some of the things the kernel may realize about the faulting page by consulting its own tables. This is not an exhaustive list.

  1. This might be a page mmap()ed from disk. This case includes an application's explicit use of mmap(), but also happens when you run an executable, or use shared libraries - those are also mapped from disk - so the page fault can also happen when the processor executes instructions, not just when reading and writing. The kernel keeps a list of these mappings, so when it gets the page fault it can figure out where on disk it needs to read to get the missing page. So it reads from disk, and when getting the data it puts it in a new page in memory, and sets the page table entry (PTE) to point to this new page with the data, and resumes the application thread - where the faulting instruction is retried and now succeeds.

  2. This may have been a page swapped out to disk. Again, the kernel keeps a table of which pages were swapped out, and where in the swap partition (or swap file, or whatever) this page now lives.

  3. This might have been a write attempt to a "copy on write" page. The kernel needs to make a copy of the source page, and change the address in the page table to point to the new copy, and then allow the write. For example when you allocate a large area of memory, it can point to an existing "zero-filled" page, and only ever allocated when you first write to pages. Another example after fork() the new process's pages are all copy-on-write pages pointing to the original process's pages, and will only be actually copied when first written (by either process).

However, since you are looking for credible sources, maybe you want to read an explanation how the Linux kernel, specifically, does this, for example in: https://vistech.net/~champ/online-docs/books/linuxkernel2/060.htm.

0
moinmaroofi On

It is the same as virtual memory addressing.
The addresses that appear in programs are the virtual addresses or program addresses. For every memory access, either to fetch an instruction or data, the CPU must translate the virtual address to a real physical address. A virtual memory address can be considered to be composed of two parts: a page number and an offset into the page. The page number determines which page contains the information and the offset specifies which byte within the page. The size of the offset field is the log base 2 of the size of a page.
If the virtual address is valid, the system checks to see if a page frame is free. If no frames are free, the page replacement algorithm is run to remove a page.