Measuring storage performance on Linux systems can be done using multiple tools. However, there is one set of tools that comes stock with Linux. These are blktrace and blkparse. These tools trace SCSI commands and data flow through the kernel from request to final write. These tools even show states most people do not know about, such as IO splits. Here is a short tutorial on how to use these tools to collect a trace. If you upload that trace to The Other Other Operation’s workload repository they’ll analyze it and post some results. You’ll help The Other Other Operation make better benchmarks and learn about how your application actually uses storage.
First, you need to install the blktrace package using your package manager of choice. On CentOS/RHEL, you can use the following command:
sudo yum install blktrace
On Debian, you can use:
sudo apt-get install blktrace
Second, you need to determine how to run the commands. Since the trace command needs root access, the use of sudo is recommended. You can log in as root, but it is not required if you have sudo access. You can also grant access to the appropriate parts of the /proc directory tree, but that is quite a bit of work. Sudo logs everything, so you know when and how many times the commands were issued.
Automated Data Collection
I have built a small script that will run the following simple data collection commands as well as parse the data. It is located at https://github.com/Texiwill/aac-lib/tree/master/blktrace. This script combines all the commands in one easy place. The hardest part of tracing is knowing what you need to trace. With so many filesystem types and devices, it is often better to specify the filesystem by directory instead of by device. This script allows for specification by device and directory name. This script will also prompt you to enter your user password if appropriate as it uses sudo. A good tutorial on sudo can be found here.
If you’re trying to trace a workload, it is far easier to trace logging for MySQL by specifying /var/log/mysql (or whichever directory the logging has been pointed to). This script does some checking and translation for you.
Run the script using:
$ ./runblktrace [-h]|[-w seconds] directory [directory [...]]"
or to see help:
$ ./runblktrace -h -or- $ ./runblktrace
To specify runtime in seconds, the default is 3600 or one hour. The position of this argument is important: it must be before the directory list:
$ ./runblktrace -w 4800 directory [directory [...]]
Here is an example for MySQL:
$ ./runblktrace /var/log/mysql /var/lib/mysql
Simple Data Collection
Now, if you really know what you are doing or want to use some other features, you need to run the command to collect the data:
$ sudo blktrace /dev/sda /dev/sdb
It is very simple. That command will dump two files into whichever directory you ran from. I suggest that these files not be on the filesystem you are trying to trace. For our example, those files will be the following:
$ ls *.blktrace.* sda.blktrace.0 sdb.blktrace.0
If you run the command multiple times, the .0 at the end will increment to .1, .2, etc. These files are in a binary form, so you need to use another tool to read them.
If you instead have a single server collecting the data, you can run blktrace in server mode:
$ blktrace -l
Then, on the client, you can use the following. Ensure port 8462 is open between the client and server:
$ sudo blktrace -h servername sda sdb
The data is now ready to parse on the server.
The simplest form of the tool is the following, run either on the blktrace server or on the host on which blktrace was run:
$ blkparse sda sdb
This will translate the two binary blktrace files into one human-readable output. Now, if you want the output to be easily importable into a spreadsheet, you can use the following command:
$ blkparse -f "%M.%m,%c,%s,%T.%t,%p,%a,%d,%C\n" sda sdb -o sdasdb.csv
This command will produce a comma-separated output file of the data for /dev/sda and /dev/sdb. Now sdasdb.csv can be inputted directly into Excel or some other program, perhaps a database.
Granted, at the end of the file will be totals data that can also be parsed for a database or ignored outright. There are many format strings, but these cover the default set.
The runblktrace script was created to further simplify what can be a complex task: not the commands themselves, but the task of finding which device is associated with which filesystem. The script takes the directory and maps it back to a device that can be traced.
This truly simplifies the data gathering. Granted, it is not robust enough to place the files in a location not traced yet, but it is a beginning. The output is a CSV file.