Many performance meters in your computing world will tell you how busy things are. That’s nice, but to make sense of that data, you also need to know how much work the system is being asked to handle.
With workload data you see the performance meters with fresh eyes, as now you can evaluate how the system is responding to that workload and, with proper capacity planning or modelling, how the system will respond to a future peak workload.
To find or create a workload meter you and your co-workers have to agree on what the workload is and how to measure it. This will take some time as you’ve got to choose the transactions that will represent your workload from the many unique transactions users send in. Every company will settle on a different scheme and there is no perfect solution. Here are a few common ones I’ve encountered:
- Treat all incoming transactions the same. Simply count them and you have your workload number.
- Notice the vast majority of your incoming transactions do a similar thing, so count them as your workload number and ignore the others.
- Only count the transaction that was at the center of your last performance catastrophe as your workload. This may be an unwise choice as always swinging to hit the previous pitch you missed will not improve your batting average.
- Use the amount of money flowing into the company as the transaction meter. At $10K/min the CPU is 35% busy.
Whatever you decide to do will work fine as long as it passes the following simple test: Changes in workload should show proportional changes in the meters of key resources. What you are looking for is data like you see below.
Clearly as the measured workload increases the utilization of Resource X follows along with it. It is just fine that the lines don’t perfectly overlap. They never will. It is the overall shape that is important. You are looking for these values to move in synchrony – workload changes cause a proportional change in utilization. Now, let’s look at Resource Y below.
Resource Y is not experiencing any changes in utilization as the workload changes. This resource is not part of the transaction path for this workload. If it is supposed to be, perhaps you accidentally metered the wrong resource, or the right resource on the wrong system. I’ve made both of those mistakes. Now, let’s look at Resource Z below.
The utilization of Resource Z mostly tracks the workload meter (they rise and fall together) except for about an hour starting at 19:49 and ending at 20:39. Here you need to use some common sense, as either:
- The utilization spike could have been caused by something not related to the normal workload like a backup, or a software upgrade, or just the side effect of hitting some bug. In that case you can ignore the spike as you evaluate this workload meter.
- The utilization spike showed a dramatic increase in the workload, but your proposed workload meter did not see it. If your proposed workload meter missed a dramatic and sustained increase like we see here, then you need to search for a better workload meter.
Once you’ve decided what will serve as the workload meter, how do you get the data you need? It would be lovely if the application gave you an easy-to-access meter for that, but that rarely happens. Usually, you have to look in odd places. If XYZ transactions are going to be your workload indicator, then you need to find some part of the XYZ transaction path where you can find something to meter that uniquely serves that transaction. Here are some of the things I’ve done in the past to ferret out this key information:
- For every XYZ transaction, process Q does two reads to a given file. Take the number of reads in the last interval and divide them by two to get the transaction rate.
- For every 500 XYZ transactions, process Q burns one second of CPU. Take the number of CPU seconds consumed during the interval and multiply it by 500 to get the transaction rate.
- For every XYZ transaction file Z grows by 1200 bytes. Take the change in the file size during the interval and divide that by 1200 to get the transaction rate.
- For every XYZ transaction two packets are sent. Divide the packet count by two to get the transaction rate.
This list goes on, but the basic trick remains the same. First find some meter that closely follows the type of transaction you want to use as a workload meter. Then figure out how to adjust it mathematically so you get a transaction count.
That mathematical adjustment usually requires you to get multiple days worth of data and then, using whatever data you can get on your transaction, come up with the appropriate adjustment.
Reasonable people can argue that it is impossible to summarize a complex workload into one number. That may be true, but you can still do wildly useful things if you find a workload meter that approximately tracks the utilization of key components nicely. Every evening on the news they quote a major stock index (like the DOW, the FTSE, or the Nikkei) and we find that a useful gauge of the overall economy. When selecting the workload, don’t go for perfect, go for close enough.
Bob Wescott’s (LinkedIn), is semi-retired after a 30 year career in high tech that was mostly focused on computer performance work. Bob has done professional services work in the field of computer performance analysis, including: capacity planning, load testing, simulation modeling, and web performance. He has even written a book on the subject: The Every Computer Performance Book. Bob’s fundamental skill is explaining complex things clearly. He has developed and joyfully taught customer courses at four computer companies and I’ve been a featured speaker at large conferences. Bob’s goal is to be of service, explain things clearly, teach with joy, and lead an honorable life. His goal, at this stage of the game, is to pass on what we’ve learned to the next generation.