| Occasionally, a device driver will need to map an address range into a user
process's space.  This mapping can be done to give the process direct
access to a device's I/O memory area, or to the driver's DMA buffers.  2.6
features a number of changes to the virtual memory subsystem, but, for most
drivers, supporing mmap() will be relatively painless. 
 Using remap_page_range()There are two techniques in use for implementing mmap(); often the
simpler of the two is using remap_page_range().  This function
creates a set of page table entries covering a given physical address
range.  The prototype of remap_page_range() changed slightly in
2.5.3; the relevant virtual memory area (VMA) pointer must be passed as the
first parameter:
 
    int remap_page_range(struct vm_area_struct *vma, unsigned long from,
		         unsigned long to, unsigned long size, 
			 pgprot_t prot);
remap_page_range() is now explicitly documented as requiring that
the memory management semaphore (usually
current->mm->mmap_sem) be held when the function is called.
Drivers will almost invariably call remap_page_range() from their
mmap() method, where that semaphore is already held.  So, in other
words, driver writers do not normally need to worry about acquiring
mmap_sem themselves.  If you use remap_page_range() from
somewhere other than your mmap() method, however, do be sure you
have acquired the semaphore first.
 
Note that, if you are remapping into I/O space, you may want to use:
 
 
    int io_remap_page_range(struct vm_area_struct *vma, unsigned long from,
		            unsigned long to, unsigned long size, 
			    pgprot_t prot);
On all architectures other than SPARC, io_remap_page_range() is
just another name for remap_page_range().  On SPARC systems,
however, io_remap_page_range() uses the systems I/O mapping
hardware to provide access to I/O memory.
 
remap_page_range() retains its longstanding limitation: it cannot
be used to remap most system RAM.  Thus, it works well for I/O memory
areas, but not for internal buffers.  For that case, it is necessary to
define a nopage() method.  (Yes, if you are curious, the "mark
pages reserved" hack still works as a way of getting around this
limitation, but its use is strongly discouraged).
 
 Using vm_operationsThe other way of implementing mmap is to override the default VMA
operations to set up a driver-specific nopage() method.  That
method will be called to deal with page faults in the mapped area; it is
expected to return a struct page pointer to satisfy the fault.  The
nopage() approach is flexible, but it cannot be used to remap I/O
regions; only memory represented in the system memory map can be mapped in
this way.
The nopage() method made it through the entire 2.5 development
series without changes, only to be modified in the 2.6.1 release.  
 The prototype for that
function used to be:
 
 
    struct page *(*nopage)(struct vm_area_struct *area, 
                           unsigned long address, 
			   int unused);
As of 2.6.1, the unused argument is no longer unused, and the
prototype has changed to:
 
 
    struct page *(*nopage)(struct vm_area_struct *area, 
	                   unsigned long address, 
			   int *type);
The type argument is now used to return the type of the page
fault; VM_FAULT_MINOR would indicate a minor fault - one where the
page was in memory, and all that was needed was a page table fixup.  A
return of VM_FAULT_MAJOR would, instead, indicate that the page
had to be fetched from disk.  Driver code using nopage() to
implement a device mapping would probably return VM_FAULT_MINOR.
In-tree code checks whether type is NULL before assigning
the fault type; other users would be well advised to do the same.
 
There are a couple of other things worth mentioning.  One is that the
vm_operations_struct is rather smaller than it was in 2.4.0; the 
protect(),
swapout(),
sync(),
unmap(), and
wppage()
methods have all gone away (they were actually deleted in 2.4.2).  Device
drivers made little use of these methods, and should not be affected by
their removal.
 
There is also one new vm_operations_struct method:
 
 
    int (*populate)(struct vm_area_struct *area, unsigned long address, 
                    unsigned long len, pgprot_t prot, unsigned long pgoff, 
		    int nonblock);
The populate() method was added in 2.5.46; its purpose is to
"prefault" pages within a VMA.  A device driver could certainly implement
this method by simply invoking its nopage() method for each page
within the given range, then using:
 
 
    int install_page(struct mm_struct *mm, struct vm_area_struct *vma, 
                     unsigned long addr, struct page *page, 
		     pgprot_t prot);
to create the page table entries.  In practice, however, there is no real
advantage to doing things in this way.  No driver in the mainline (2.5.67)
kernel tree implements the populate() method.
 
Finally, one use of nopage() is to allow a user process to map a
kernel buffer which was created with vmalloc().  In the past, a
driver had to walk through the page tables to find a struct page
corresponding to a vmalloc() address.  As of 2.5.5 (and 2.4.19),
however, all that is needed is a call to:
 
 
    struct page *vmalloc_to_page(void *address);
This call is not a variant of vmalloc() - it allocates no memory.
It simply returns a pointer to the struct page associated with an
address obtained from vmalloc(). No comments have been posted.
Post one now
 |