MMAP memory between kernel- and userspace
Allocate memory in kernel and let userspace map is sounds like an easy task, and sure it is. There are just a few things that is good to know about page mapping.
The MMU (Memory Management Unit) contains page tables with entries for mapping between virtual and physical addresses. These pages is the smallest unit that the MMU deals with.
The size of a page is given by the PAGE_SIZE macro in asm/page.h ans is typically 4k for most architectures.
There is a few more useful macros in asm/page.h:
PAGE_SHIFT: How many steps we should shift to left to get a PAGE_SIZE
PAGE_SIZE: Size of a page, defined as (1 << PAGE_SHIFT).
PAGE_ALIGN(len): Will round up the length to the closest alignment of PAGE_SIZE.
How does mmap(2) work?
Every page table entry has a bit that tells us if the entry is valid in supervisor mode (kernel mode) only. And sure, all memory allocated in kernel space will have this bit set.
What the mmap system call do is simply creating a new page table entry with a different virtual address that points to the same physical memory page. The difference is that this supervisor-bit is not set.
This let userspace access the memory as if it was a part of the application, for now it is!
The kernel is not involved in those accesses at all, so it is really fast.
Magic? Kind of.
The magic is called remap_pfn_range().
What remap_pfn_range() do is just essentially to update the processor’s specific page table with these new entries.
As said before, the smallest unit that the MMU handle is the size of PAGE_SIZE and the mmap(2) only works with full pages. Even if you just want to share only 100 bytes, a whole page frame will be remapped and must therefor be allocated in the kernel.
The allocated memory must also be page aligned.
One way to allocate pages is with __get_free_pages().
The gft_mask is commonly set to GFP_KERNEL in process/kernel context and GFP_ATOMIC in interrupt context. The order is the number of pages to allocate expressed in 2^order.
u8 *vbuf = __get_free_pages(GFP_KERNEL, size >> PAGE_SHIFT);
Allocated memory is freed with __free_pages().
A more common (and preferred) way to allocate virtual continuous memory is with vmalloc().
vmalloc() will allways allocate whole set of pages, no matter what. This is exactly what we want!
Read about vmalloc() in kmalloc(9):
allocates size bytes, and returns a pointer to the allocated memory. size becomes page aligned by vmalloc(), so the smallest allocated amount is 4kB. The allocated pages are mapped to the virtual memory space behind the 1:1 mapped physical memory in the kernel space. Behind every vmalloc’ed area there is at least one unmapped page. So writing behind the end of a vmalloc’ed area will not result in a system crash, but in a segmentation violation in the kernel space. Because memory fragmentation isn’t a big problem for vmalloc(), vmalloc() should be used for huge amounts of memory.
Allocated memory is freed with vfree().
If you need only one page, alloc_page() will give you that.
If this is the case, insead of using remap_pfn_range(), vm_insert_page() will do the work you for you.
Notice that vm_insert_page() apparently only works on order-0 (single-page) allocation. So if you want to allocate N pages, you will hace to call vm_insert_page() N times.
Now some code
priv->a_size = ATTRIBUTE_N * ATTRIBUTE_SIZE;
/* page align */
priv->a_size = PAGE_ALIGN(priv->a_size);
static int scan_mmap (struct file *file, struct vm_area_struct *vma)
struct mmap_priv *priv = file->private_data;
unsigned long start = vma->vm_start;
unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
unsigned long page;
size_t size = vma->vm_end – vma->vm_start;
if (size > priv->a_size)
page = vmalloc_to_pfn((void *)priv->a_area);
if (remap_pfn_range(vma, start, page, priv->a_size, PAGE_SHARED))
vma->vm_flags |= VM_RESERVED; /* avoid to swap out this VMA */