LWN.net Logo

 


 
Summary page
Return to the Kernel page
 
Recent Features

LWN.net Weekly Edition for March 18, 2004

LWN.net Weekly Edition for March 11, 2004

The annotated SCO stock price chart

A grumpy editor's calendar search

LWN.net Weekly Edition for March 4, 2004

Printable page
 

 

Driver porting: the gendisk interface

This article is part of the LWN Porting Drivers to 2.6 series.
The 2.4 kernel gendisk structure is used almost as an afterthought; its main purpose is to help in keeping track of disk partitions. In 2.6, the gendisk is at the core of the block subsystem; if you need to work with or find something out about a disk, struct gendisk probably has what you need. This article will cover the details of the gendisk structure from a disk driver's perspective. If you have not already read them, a quick look at the LWN block driver overview and simple block driver articles is probably worthwhile.

Gendisk initialization

The best way of looking at the contents of a gendisk structure from a block driver's point of view is to examine what that driver must do to set the structure up in the first place. If your driver makes a disk (or disk-like) device available to the system, it will have to provide an associated gendisk structure. (Note, however, that it is not necessary - or correct - to set up gendisk structures for disk partitions).

The first step is to create the gendisk structure itself; the function you need is alloc_disk() (which is declared in <linux/genhd.h>):

    struct gendisk *alloc_disk(int minors);

The argument minors is the maximum number of minor numbers that this disk can have. Minor numbers correspond to partitions, of course (except the first, which is the "whole disk" device), so the value passed here controls the maximum number of partitions. If a single minor number is requested, the device cannot be partitioned at all. The return value is a pointer to the gendisk structure; the allocation can fail, so this value should always be checked against NULL before proceeding.

There are several fields of the gendisk structure which must be initialized by the block driver. They include:

int major;
The major number of this device; either a static major assigned to a specific driver, or one that was obtained dynamically from register_blkdev()

int first_minor;
The first minor device number corresponding to this disk. This number will be determined by how your driver divides up its minor number space.

char disk_name[32];
The name of this disk (i.e. hda). This name is used in places like /proc/partitions and in creating a sysfs directory for the device.

struct block_device_operations *fops;
The device operations (open, release, ioctl, media_changed, and revalidate_disk) for this device. Each disk has its own set of operations in 2.6.

struct request_queue *queue;
The request queue which will handle the list of pending operations for this disk. The queue must be created and initialized separately.

int flags;
A set of flags controlling the management of this device. They include GENHD_FL_REMOVABLE for removable devices, GENHD_FL_CD for CDROM devices, and GENHD_FL_DRIVERFS which certainly means something interesting, but which is not actually used anywhere.

void *private_data;
This field is reserved for the driver; the rest of the block subsystem will not touch it. Usually it holds a pointer to a driver-specific data structure describing this device.

The gendisk structure also holds the size of the disk, in sectors. As part of the initialization process, the driver should set that size with:

    void set_capacity(struct gendisk *disk, sector_t size);

The size value should be in 512-byte sectors, even if the hardware sector size used by your device is different. For removable disks, setting its capacity to zero indicates to the block subsystem that there is currently no media present in the device.

Manipulating gendisks

Once you have your gendisk structure set up, you have to add it to the list of active disks; that is done with:

    void add_disk(struct gendisk *disk);

After this call, your device is active. There are a few things worth keeping in mind about add_disk():

  • add_disk() can create I/O to the device (to read partition tables and such). You should not call add_disk() until your driver is sufficiently initialized to handle requests.

  • If you are calling add_disk() in your driver initialization routine, you should not fail the initialization process after the first call.

  • The call to add_disk() increments the disk's reference count; if the disk structure is ever to be released, the driver is responsible for decrementing that count (with put_disk()).

Should you need to remove a disk from the system, that is accomplished with:

    void del_gendisk(struct gendisk *disk);

This function cleans up all of the information associated with the given disk, and generally removes it from the system. After a call to del_gendisk(), no more operations will be sent to the given device. Your driver's reference to the gendisk object remains, though; you must explicitly release it with:

    void put_disk(struct gendisk *disk);

That call will cause the gendisk structure to be freed, as long as no other part of the kernel retains a reference to it.

Should you need to set a disk into a read-only mode, use:

    void set_disk_ro(struct gendisk *disk, int flag);

If flag is nonzero, all partitions on the disk will be marked read-only. The kernel can track read-only status individually for each partition, but no utility function has been exported to manipulate that status for single partitions.

Partition management is handled within the block subsystem in 2.6; drivers need not worry about partitions at all. Should the need arise, the functions add_partition() and delete_partition() can be used to manipulate the (in-kernel) partition table directly. These functions are used in the generic block ioctl() code; there should be no need for a block driver to call them directly.

Registering block device number ranges

A call to add_disk() implicitly allocates the a set of minor numbers (under the given major number) from first_minor to first_minor+minors-1. If your driver must only respond to operations to disks that exist at initialization time, there is no need to worry further about number allocation. Even the traditional call to register_blkdev() is optional, and may be removed soon. Some drivers, however, need to be able to claim responsibility for a larger range of device numbers at initialization time.

If this is your case, the answer is to call blk_register_region(), which has this rather involved prototype:

    void blk_register_region(dev_t dev, 
                             unsigned long range, 
                             struct module *module,
                             struct kobject *(*probe)(dev_t, int *, void *),
                             int (*lock)(dev_t, void *), 
                             void *data);

Here, dev is a device number (created with MKDEV()) containing the major and first minor number of the region of interest; range is the number of minor numbers to allocate, module is the loadable module (if any) containing the driver, probe is a driver-supplied function to probe for a single disk, lock is a driver-supplied locking function, and data is a driver-private pointer which is passed to probe() and lock().

When blk_register_region() is called, it simply makes a note of the desired region and returns. Note that there can be more than one registration within a specific region! At lookup time, the most "specific" registration (the one with the smallest range) wins.

At some point in the future, an attempt may be made to access a device number within the allocated region. At that point, there will be a call to the lock() function (if it was not passed as NULL) with the device number of interest. If lock() succeeds, probe() will be called to find the specific disk of interest. The full prototype of the probe function is:

    struct kobject *(*probe)(dev_t dev, int *partition, void *data);

Here, dev is the device number of interest, partition is a pointer to a partition number (sort of), and data is the driver-private pointer passed to blk_register_region(). The partition number is actually just the offset into the allocated range; it's the minor number from dev with the beginning of the range subtracted.

The probe() function should attempt to identify a specific gendisk structure which corresponds to the requested number. If it is successful, it should return a pointer to the kobject structure contained within the gendisk. Kobjects are covered in a separate article; for all, all you really need to know is that you should call get_disk() with the gendisk structure as the argument, and return the value from get_disk() to the caller. The probe() function can also modify the partition number so that it corresponds to the actual partition offset in the returned device. If the function cannot handle the request at all, it can return NULL.

Some probe() functions do not, themselves, locate and initialize the device of interest. Instead, they call some other function to set in motion that whole process. For example, a number of probe() functions simply call request_module() in an attempt to load a module which can handle the device. In this mode of operation, the function should return NULL, which will cause the block layer to look at the device number allocations one more time. If a "better" allocation (with a smaller range) has happened in the mean time, the probe() function for the new driver will be called. So, for example, if a module is loaded which allocates a smaller device number range corresponding to the devices it actually implements, its probe() routine will be called on the next iteration.

Of course, there is the usual assocated unregister function:

    void blk_unregister_region(dev_t dev, unsigned long range);

The next step

Once you have a handle on how the gendisk structure works, the next thing to do is to learn about BIO structures.


No comments have been posted. Post one now

Copyright (©) 2003, Eklektix, Inc.
Linux (®) is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.