Linux per-process I/O performance: measuring the wrong thing

A while back, I linkblogged about “iotop”, a very useful top-like UNIX utility to show which processes are initiating the most I/O bandwidth.

Teodor Milkov left a comment which is well worth noting, though:

Definitely iotop is a step in the right direction.

Unfortunately it’s still hard to tell who’s wasting most disk IO in too many situations.

Suppose you have two processes – dd and mysqld.

dd is doing massive linear IO and its throughput is 10MB/s. Let’s say dd reads from a slow USB drive and it’s limited to 10MB/s because of the slow reads from the USB.

At the same time MySQL is doing a lot of very small but random IO. A modern SATA 7200 rpm disk drive is only capable of about 90 IO operations per second (IOPS).

So ultimately most of the disk time would be occupied by the mysqld. Still iotop would show dd as the bigger IO user.

He goes into more detail on his blog. Fundamentally, iotop works based on what the Linux kernel offers for per-process I/O accounting, which is I/O bandwidth per second, not I/O operations per second. Most contemporary storage in desktops and low-end server equipment is IOPS-bound (‘A modern 7200 rpm SATA drive is only capable of about 90 IOPS’). Good point! Here’s hoping a future change to the Linux per-process I/O API allows measurement of IOPS as well…

This entry was posted in Uncategorized and tagged , , , , , , , . Bookmark the permalink. Both comments and trackbacks are currently closed.

7 Comments

  1. Posted April 15, 2009 at 11:29 | Permalink

    Operations per second would be problematic to interpret given the scheduling used by various disk elevators. Also IOPS are becoming relatively less important with the advent of SSDs. I do agree it would be useful to present though.

  2. Posted April 15, 2009 at 13:33 | Permalink

    Padraig — SSDs are by no means useful for most workloads yet, nor will they be for a few years, IMO…

  3. Posted April 15, 2009 at 14:10 | Permalink

    Hi there :-)

    I think it’s not that important to measure IOPS – the important thing is the time it takes to serve a request. There is already such a metric, but it’s only system wide. If you look at http://www.mjmwired.net/kernel/Documentation/iostats.txt there is a field (#10) that shows the number of milliseconds spent doing I/Os. Each disk drive is 100% saturated when doing 1000 ms of I/O per second, isn’t it? And it doesn’t matter if this I/O is random or linear. Most of the counters described in iostats.txt are extremely useful. I just dream of some way to relate some of these counters to specific processes/threads/cgroups.

  4. Mark Seger
    Posted April 17, 2009 at 16:55 | Permalink

    When I added process I/O stats to collectl, I included all of them. One pair tracks I/O to buffer cache, a second to disk and a 3rd pair tracks system calls. Perhaps with all 3 sets of data you can get closer to understanding what’s really going on:

    More details here: http://collectl.sourceforge.net/Process.html

    -mark

  5. Posted April 20, 2009 at 10:05 | Permalink

    wow Mark, collectl looks nifty! I must give that a go next time I need to monitor a server.

  6. Mark Seger
    Posted April 20, 2009 at 12:27 | Permalink

    Why wait until you need it? By then it might be too late and you’ll have missed the data you needed to diagnose a problem. Seriously, as with sar I’ll bet most system admins running collectl probably never look at the data it collects but know it’s there if/when they do – by default it keeps the most recent week’s worth of data.

    In any event, when you do eventually get around to trying it out let me know what you think.

    -mark

  7. Posted May 29, 2012 at 11:19 | Permalink

    God damn, I’m currently trying to nail why one of our redhat 5.4 el servers IOs on /var have slowly been rising to 10k over the last month but we’re running 2.6.18 !!!

    guess ill have to try stopping every process for 10 minutes to see if the graphs dip :D