> http://bugme.osdl.org/show_bug.cgi?id=1806 > > Summary: disks stats not kept for DM (device mapper) devices > Kernel Version: 2.6.0 > Status: NEW > Severity: normal > Owner: axboe@suse.de > Submitter: slpratt@us.ibm.com > > > Distribution:all > Hardware Environment:all > Software Environment:all > Problem Description: > Disk stats as reported through sysfs are empty for all DM (device mapper) > devices. This appears to be due to the fact that the stats are traced via > request structs which are not generated until below the device mapper > layer. It seems it would be possible to add code to device mapper to track > the stats since the actual location of the stats is in the gendisk entry > which does exsist for DM deivices. Only problem I see is in tracking ticks > for IO since in the non DM case this is done by storing a start time in the > request struct on driving the request. Since DM has no request struct > (only the BIO) it has no place to record the start time. > > Steps to reproduce: > create a DM device using dmsetup, lvm2 or EVMS. Do IO to device, look at > /sys/block/dm-xxx/stat. Steve and I noticed this behavior this morning. I poked around in ll_rw_blk.c and genhd.[ch] and some of the individual block drivers to get an idea for how I/O statistics are managed. I eventually came up with this patch for Device-Mapper. Some things to note: - In the lower-level drivers (IDE, SCIS, etc), statistics are calculated based on request structs. DM never uses/sees request structs, so these statistics are based on bio structs. Meaning...a "reads" count of 1 means DM completed 1 bio, not 1 request. - The statistics calculations are done when a bio is completed. As some brief background, when DM receives a bio, it holds on to that bio, and creates one or more clones (in case the original bio needs to be split across some internal DM boundary). DM manually submits each of those clones, and when all clones complete, DM completes the original bio. I've added the statistics calculations at this completion point. It could just as easily be done before the clones are submitted. I'm not sure if this is the desired behavior, but it seemed to be based on examining the other places where the statistics are updated. One side-effect of this decision is that an I/O that causes an error within DM may not show up at all in the statistics. E.g., if the DM device currently has no active mapping, DM will simply call bio_io_error() on that bio and thus never update the stats for the device - which may actually be the desired behavior. - Device-Mapper doesn't do much of anything with the request_queue struct that's attached to its gendisk entry. However, all of the code that updates the I/O stats has comments that say the request_queue must be locked before the stats can be updated. So this patch allocates a spinlock for each DM request_queue, and this lock is taken while updating the stats. This introduces some new contention on DM's I/O completion path. I don't know yet whether it will be a significant amount. I'm not sure if this is the best way to do this, but it's probably the simplest. On a related note, MD/Software-RAID also doesn't currently track I/O statistics. If we decide if/how to track statistics for DM, doing the same in MD ought to be pretty easy. This patch is against 2.6.1-rc2. I also have a slightly different version for Joe's 2.6.0-udm3 patchset. -- Kevin Corry kevcorry@us.ibm.com http://evms.sourceforge.net/ Update I/O statistics before completing the original, incoming bio. --- a/drivers/md/dm.c 2004-01-07 13:53:55.000000000 -0600 +++ b/drivers/md/dm.c 2004-01-07 13:59:30.000000000 -0600 @@ -25,6 +25,7 @@ int error; struct bio *bio; atomic_t io_count; + unsigned long start_time; }; struct deferred_io { @@ -44,7 +45,7 @@ unsigned long flags; - request_queue_t *queue; + struct request_queue *queue; struct gendisk *disk; /* @@ -243,6 +244,29 @@ return sector << SECTOR_SHIFT; } +static inline void update_io_stats(struct dm_io *io) +{ + unsigned long flags; + unsigned long duration = jiffies - io->start_time; + + spin_lock_irqsave(io->md->queue->queue_lock, flags); + + switch (bio_data_dir(io->bio)) { + case READ: + disk_stat_inc(dm_disk(io->md), reads); + disk_stat_add(dm_disk(io->md), read_sectors, bio_sectors(io->bio)); + disk_stat_add(dm_disk(io->md), read_ticks, duration); + break; + case WRITE: + disk_stat_inc(dm_disk(io->md), writes); + disk_stat_add(dm_disk(io->md), write_sectors, bio_sectors(io->bio)); + disk_stat_add(dm_disk(io->md), write_ticks, duration); + break; + } + + spin_unlock_irqrestore(io->md->queue->queue_lock, flags); +} + /* * Decrements the number of outstanding ios that a bio has been * cloned into, completing the original io if necc. @@ -259,6 +283,8 @@ } if (atomic_dec_and_test(&io->io_count)) { + update_io_stats(io); + if (atomic_dec_and_test(&io->md->pending)) /* nudge anyone waiting on suspend queue */ wake_up(&io->md->wait); @@ -462,6 +488,7 @@ atomic_set(&ci.io->io_count, 1); ci.io->bio = bio; ci.io->md = md; + ci.io->start_time = jiffies; ci.sector = bio->bi_sector; ci.sector_count = bio_sectors(bio); ci.idx = bio->bi_idx; @@ -607,6 +634,14 @@ return NULL; } + md->queue->queue_lock = kmalloc(sizeof(spinlock_t), GFP_KERNEL); + if (!md->queue->queue_lock) { + free_minor(minor); + blk_put_queue(md->queue); + kfree(md); + return NULL; + } + md->queue->queuedata = md; blk_queue_make_request(md->queue, dm_request); @@ -614,6 +649,7 @@ mempool_free_slab, _io_cache); if (!md->io_pool) { free_minor(minor); + kfree(md->queue->queue_lock); blk_put_queue(md->queue); kfree(md); return NULL; @@ -623,6 +659,7 @@ if (!md->disk) { mempool_destroy(md->io_pool); free_minor(minor); + kfree(md->queue->queue_lock); blk_put_queue(md->queue); kfree(md); return NULL; @@ -649,6 +686,7 @@ mempool_destroy(md->io_pool); del_gendisk(md->disk); put_disk(md->disk); + kfree(md->queue->queue_lock); blk_put_queue(md->queue); kfree(md); } _______________________________________________ dm-devel mailing list dm-devel@sistina.com http://lists.sistina.com/mailman/listinfo/dm-devel