LWN.net Logo

Sponsored Link

E-Commerce & credit card processing - the Open Source way!

Summary page
Return to the Kernel page
Recent Features

LWN.net Weekly Edition for March 18, 2004

LWN.net Weekly Edition for March 11, 2004

The annotated SCO stock price chart

A grumpy editor's calendar search

LWN.net Weekly Edition for March 4, 2004

Printable page


Driver porting: Zero-copy user-space access

This article is part of the LWN Porting Drivers to 2.6 series.
The kiobuf abstraction was introduced in 2.3 as a low-level way of representing I/O buffers. Its primary use, perhaps, was to represent zero-copy I/O operations going directly to or from user space. A number of problems were found with the kiobuf interface, however; among other things, it forced large I/O operations to be broken down into small chunks, and it was seen as a heavyweight data structure. So, in 2.5.43, kiobufs were removed from the kernel.

This article looks at how to port drivers which used the kiobuf interface in 2.4. We'll proceed on the assumption that the real feature of interest was direct access to user space; there wasn't much motivation to use a kiobuf otherwise.

Zero-copy block I/O

The 2.6 kernel has a well-developed direct I/O capability for block devices. So, in general, it will not be necessary for block driver writers to do anything to implement direct I/O themselves. It all "just works."

Should you have a need to perform zero-copy block operations, it's worth noting the presence of a useful helper function:

    struct bio *bio_map_user(struct block_device *bdev, 
                             unsigned long uaddr,
			     unsigned int len,
			     int write_to_vm);

This function will return a BIO describing a direct operation to the given block device bdev. The parameters uaddr and len describe the user-space buffer to be transferred; callers must check the returned BIO, however, since the area actually mapped might be smaller than what was requested. The write_to_vm flag is set if the operation will change memory - if it is a read-from-disk operation. The returned BIO (which can be NULL - check it) is ready for submission to the appropriate device driver.

When the operation is complete, undo the mapping with:

    void bio_unmap_user(struct bio *bio, int write_to_vm);

Mapping user-space pages

If you have a char driver which needs direct user-space access (a high-performance streaming tape driver, say), then you'll want to map user-space pages yourself. The modern equivalent of map_user_kiobuf() is a function called get_user_pages():

    int get_user_pages(struct task_struct *task, 
                       struct mm_struct *mm,
		       unsigned long start, 
		       int len, 
		       int write, 
		       int force,
		       struct page **pages, 
		       struct vm_area_struct **vmas);

task is the process performing the mapping; the primary purpose of this argument is to say who gets charged for page faults incurred while mapping the pages. This parameter is almost always passed as current. The memory management structure for the user's address space is passed in the mm parameter; it is usually current->mm. Note that get_user_pages() expects that the caller will have a read lock on mm->mmap_sem. The start and len parameters describe the user-buffer to be mapped; len is in pages. If the memory will be written to, write should be non-zero. The force flag forces read or write access, even if the current page protection would otherwise not allow that access. The pages array (which should be big enough to hold len entries) will be filled with pointers to the page structures for the user pages. If vmas is non-NULL, it will be filled with a pointer to the vm_area_struct structure containing each page.

The return value is the number of pages actually mapped, or a negative error code if something goes wrong. Assuming things worked, the user pages will be present (and locked) in memory, and can be accessed by way of the struct page pointers. Be aware, of course, that some or all of the pages could be in high memory.

There is no equivalent put_user_pages() function, so callers of get_user_pages() must perform the cleanup themselves. There are two things that need to be done: marking of modified pages, and releasing them from the page cache. If your device modified the user pages, the virtual memory subsystem may not know about it, and may fail to write the pages to permanent storage (or swap). That, of course, could lead to data corruption and grumpy users. The way to avoid this problem is to call:

    SetPageDirty(struct page *page);

for each page in the mapping. Current (2.6.3) kernel code checks to ensure that pages are not reserved first with code like:

    if (!PageReserved(page))

But pages mapped from user space should not, normally, be marked reserved in the first place.

Finally, every mapped page must be released from the page cache, or it will stay there forever; simply pass each page structure to:

    void page_cache_release(struct page *page);

After you have released the page, of course, you should not access it again.

For a good example of how to use get_user_pages() in a char driver, see the definition of sgl_map_user_pages() in drivers/scsi/st.c.

Post a comment

  Driver porting: Zero-copy user-space access
(Posted Feb 13, 2004 14:34 UTC (Fri) by grisu1976) (Post reply)

I don't really understand why the kiobuf interface does not exist anymore. In linux kernel 2.4 the kiobuf interface used get_user_pages, or am i wrong? The kiobuf interface was easier to use than get_user_pages - that's my opinion

  Driver porting: Zero-copy user-space access
(Posted Mar 3, 2004 7:55 UTC (Wed) by bhepple) (Post reply)

Hmmm, a quick recursive grep through the 2.6.3 driver source and include files showed exactly 0 users of set_page_dirty_lock() and 1 user of put_page() (in drivers/char/agp/generic.c)

There _is_ a
#define page_cache_release(page) put_page(page)
in include/linux/pagemap.h and it is quite a popular little chap in the device driver code with 13 hits in the entire tree.

Am I missing something or should we be using page_cache_release instead of put_page and is it (and set_page_dirty_lock) _really_ needed after all - I can hardly believe all those drivers are causing "data corruption and grumpy users"...

Copyright (©) 2003, Eklektix, Inc.
Linux (®) is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.