Measuring peak disk use of a process

894 views Asked by At

I am trying to benchmark a tool I'm developing in terms of time, memory, and disk use. I know /usr/bin/time gives me basically what I want for the first two, but for disk use I came to the conclusion I would have to roll my own bash script that periodically extracts the 'bytes written' contents from /proc/<my_pid>/io. Based on this script, here's what I came up with:

"$@" &
pid=$!
status=$(ps -o rss -o vsz -o pid | grep $pid)
maxdisk=0
while [ "${#status}" -gt "0" ];
do
    sleep 0.05
    delta=false
    disk=$(cat /proc/$pid/io | grep -P '^write_bytes:' | awk '{print $2}')
    disk=$(disk/1024)
    if [ "0$disk" -gt "0$maxdisk" ] 2>/dev/null; then
        maxdisk=$disk
        delta=true
    fi
    if $delta; then
        echo disk: $disk
    fi
    status=$(ps -o rss -o vsz -o pid | grep $pid)
done
wait $pid
ret=$?
echo "maximal disk used: $maxdisk KB"

Unfortunately, I am running into two problems:

  • The first is that I am piping the output of this script along with that of the tool I would like to benchmark to a file, and it seems occasionally these streams interfere, leading me to see 0 or too low disk use reported at the bottom of this file.
  • The second problem is that I don't know what to do about processes that delete temporary files as part of their process. In this case I think the fair benchmark would be to record the maximum net disk use (i.e., the peak in bytes written - bytes erased), but I don't know where the second part of this difference can be found.

How can I resolve these problems?

3

There are 3 answers

0
Maxim Egorushkin On BEST ANSWER

You may like to have a look at filetop from BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more:

tools/filetop: File reads and writes by filename and process. Top for files.

This script works by tracing the vfs_read() and vfs_write() functions using kernel dynamic tracing, which instruments explicit read and write calls. If files are read or written using another means (eg, via mmap()), then they will not be visible using this tool.

Brendan Gregg gives good talks and demos about Linux Performance Tools, they are quite instructive.

1
roro On

I eventually found this similar question: How do I measure net used disk space change due to activity by a given process in Linux?.

Based on the answers there, this seams to be a thorny problem due to the difficulty in tracking all the different types of changes that may be initiated by a given process.

Dtrace is also mentioned there, but as I understand it, it is proprietary to Sun (or I guess Oracle now?) and thus available by only on Solaris by default. Eventually I found this Github repo, aiming to close that gap for Linux users.

0
Tomachi On

You could think of it differently, not worry at all about deleted files, by using multiple timestamps in your records, giving you:

  • Disk writes delta over time. eg 8 GB/day. Doesn't matter if all of it to /tmp. Each time it is run a new average saved to disc, with a counter, to keep a rolling average. So if each hour you errant process does 2 GB, then 1 GB, then 0 GB, each hour, thats 1 GB/hour (for the time period)
  • For each snapshot, you pick the highest, record that, in this case 2 GB for the first hour of operation. If you run the script each hour and it's always 0 GB, it will report 2 GB in the first hour. Then if in the wee smalls it kicks up and puts down 5 GB, you "peak" will show that at 3am say, with average of 333 MB/hour.