How remap_pfn_range remaps kernel memory to user space?

27.3k views Asked by At

remap_pfn_range function (used in mmap call in driver) can be used to map kernel memory to user space. How is it done? Can anyone explain precise steps? Kernel Mode is a privileged mode (PM) while user space is non privileged (NPM). In PM CPU can access all memory while in NPM some memory is restricted - cannot be accessed by CPU. When remap_pfn_range is called, how is that range of memory which was restricted only to PM is now accessible to user space?

Looking at remap_pfn_range code there is pgprot_t struct. This is protection mapping related struct. What is protection mapping? Is it the answer to above question?

2

There are 2 answers

1
gby On

It's simple really, kernel memory (usually) simply has a page table entry with the architecture specific bit that says: "this page table entry is only valid while the CPU is in kernel mode".

What remap_pfn_range does is create another page table entry, with a different virtual address to the same physical memory page that doesn't have that bit set.

Usually, it's a bad idea btw :-)

4
Peter Teoh On

The core of the mechanism is page table MMU:

Related image1 http://windowsitpro.com/content/content/3686/figure_01.gif

or this:

Related image

Both picture above are characteristics of x86 hardware memory MMU, nothing to do with Linux kernel.

Below described how the VMAs is linked to the process's task_struct:

Related image http://image9.360doc.com/DownloadImg/2010/05/0320/3083800_2.gif

Related image
(source: slideplayer.com)

And looking into the function itself here:

http://lxr.free-electrons.com/source/mm/memory.c#L1756

The data in physical memory can be accessed by the kernel through the kernel's PTE, as shown below:

Image result for page protection flags linux kernel
(source: tldp.org)

But after calling remap_pfn_range() a PTE (for an existing kernel memory but to be used in userspace to access it) is derived (with different page protection flags). The process's VMA memory will be updated to use this PTE to access the same memory - thus minimizing the need to waste memory by copying. But kernel and userspace PTE have different attributes - which is used to control the access to the physical memory, and the VMA will also specified the attributes at the process level:

vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;