Howto Have Confidence In A Small Sample

It is often the case that we just sample the response times of a few transactions rather than metering all of them. When sampling, how do you know you’ve sampled enough to get an average response time that is representative of all the transactions?

If you make some change to the system, and the average response time falls from 10 seconds to 0.2 seconds, it doesn’t take a rocket scientist to know that is a real improvement. However, if the before and after numbers are reasonably close, it’s not as clear that that change was an improvement. We could have just gotten lucky in our sampling. So, how can we know anything without aljellybeanl the data? Think about a bowl of jellybeans for a minute.

Imagine you blindly and randomly select and eat two jellybeans from that bowl. You find one is orange and one is strawberry. You could at this point state that the bowl contains 50% orange and 50% strawberry jellybeans, but you wouldn’t be too confident about it. If the next ten randomly selected jellybeans confirmed the 50/50 ratio then your confidence would grow. However, to be absolutely certain of this ratio, you’d have to eat all the jellybeans in the bowl.

The same is true for any sampled data. The more sampled transactions you have, the more confident you are of your result. To be absolutely sure, you have to measure every transaction. But, how many samples is enough so you can be reasonably sure? For that we are going to have to use statistics. Please don’t panic. We are going to use a couple of simple Excel functions to do the math. Let’s work through an example.

Suppose you are comparing 10 samples of response time data before and 10 samples after an upgrade to see if things are better or worse. Before the upgrade the average response time of 10 transactions was 4.5 seconds and after it was 4.1 seconds. To be sure a small difference is a real difference, you need to calculate the confidence interval. This is a four-step process:

  1. Download/copy the individual samples into a column of an Excel spreadsheet. For this example there ten of them starting at cell A1 going through A10.
  2. Use the AVERAGE function to find the average value (arithmetic mean) of all the samples. This function takes one argument, which is a range of cells containing the response times. For this example AVERAGE(A1:A10) equals 4.5.
  3. Use the STDEV function to find the standard deviation of all of the samples. This function takes one argument, which is a range of cells containing the response times. For this example STDEV(A1:A10).
  4. Use the CONFIDENCE.NORM function to find the confidence interval. This function takes three arguments:
    • Alpha – This is a number between zero and one that tells the function how confident we want to be. The confidence level equals one minus the Alpha. In other words, an Alpha of 0.05 asks for a 95 percent confidence level, which is what we want here.
    • StandardDeviation –The value returned by the STDEV function in step 3.
    • Size – This is the count of individual test results in our sample. In this example the count is 10.

The CONFIDENCE.NORM function returns a number: 0.51. This tells us that we can be 95% confident that the average response time of all transactions during the studied interval before the upgrade (not just the ones we sampled) is 4.50 seconds ± 0.51 seconds.  In other words, we are 95% confident the average pre-upgrade response time is between 3.99 and 5.01 seconds.

Now, let’s say we calculated the confidence interval for the after-the-upgrade data, and the calculations showed we are 95% confident that the actual average response time of all transactions during the studied interval (not just the ones we sampled) is 4.10 seconds ± 0.49 seconds.

So what does this all mean?  If the confidence intervals overlap, there is no statistically significant improvement. As you can see below, they clearly overlap and, even though the after-the-upgrade response times numbers look better, statistics can offer no guarantee of any real improvement. The upgrade might have helped, but you can’t prove it with the data you have to the level of confidence (95%) you want.

beforeafter

This is the same calculation pollsters’ do when they randomly call ~1000 people and, from that small sample, predict how the nation will vote. When these polls are talked about, they rarely quote the ALPHA or the confidence interval. If they did, the lead story of some future newscast might be:

The latest polls are 95% confident that candidate X is polling at 53% and candidate Y is at 48%. The margin of error is ± 5 points so there is no statistically measurable difference and thus we really have no idea who is winning.

Now you might want to be absolutely 100% sure you are seeing an improvement. Statistics can’t help you here because, to be 100% confident, you need to have response time data from ALL the transactions, not just a sample of them. If you have 100% of the data, you don’t need statistics because you have 100% of the data. For most cases, a confidence level of 95% or 98% will do nicely.


Bob Wescott’s (LinkedIn), is semi-retired after a 30 year career in high tech that was mostly focused on computer performance work. Bob has done professional services work in the field of computer performance analysis, including: capacity planning, load testing, simulation modeling, and web performance. He has even written a book on the subject: The Every Computer Performance Book. Bob’s fundamental skill is explaining complex things clearly. He has developed and joyfully taught customer courses at four computer companies and I’ve been a featured speaker at large conferences. Bob’s goal is to be of service, explain things clearly, teach with joy, and lead an honorable life. His goal, at this stage of the game, is to pass on what we’ve learned to the next generation.

Every Computer Performance Book

Price: $19.99

4.4 out of 5 stars (31 customer reviews)

22 used & new available from $16.00

Related Posts

  • Howto Meter A Short Duration ProblemHowto Meter A Short Duration Problem Some performance problems come and go in a minute or two. Depending on the industry, the company goals, and the expectations of the users, these problems are either a big deal or ignored with a yawn. For short duration performance problems where you know when they will start (market […]
  • Applications Have A Usage Volume – Part IIApplications Have A Usage Volume – Part II About This Series - This series of articles is a how-to manual on applying performance modelling techniques without any needless discussion of internals or mathematics (with the exception of the necessary but simple Little’s formula which will be included in this article which happens to […]
  • System Resources, Infrastructure Capacity and Moore’s Law – Part IIISystem Resources, Infrastructure Capacity and Moore’s Law – Part III About This Series - This series of articles is a how-to manual on applying performance modelling techniques without any needless discussion of internals or mathematics (with the exception of the necessary but simple Little’s formula which will be included in this article which happens […]
  • Do Applications Have Performance DNA ? – Part IDo Applications Have Performance DNA ? – Part I About This Series - This series of articles is a how-to manual on applying performance modelling techniques without any needless discussion of internals or mathematics (with the exception of the necessary but simple Little’s formula which will be included in the 2nd installment of this […]