This is the Third of a Three Part Article by Alex Podelko on Performance Testing. For the last seventeen years Alex Podelko has worked as a performance engineer and architect for several companies. Currently he is Consulting Member of Technical Staff at Oracle, responsible for performance testing and optimization of Enterprise Performance Management and Business Intelligence (a.k.a. Hyperion) products.
Alex periodically talks and writes about performance-related topics, advocating tearing down silo walls between different groups of performance professionals. His collection of performance-related links and documents (including his recent papers and presentations) can be found at www.alexanderpodelko.com. He blogs at http://alexanderpodelko.com/blog and can be found on Twitter as @apodelko. Alex currently serves as a director for the Computer Measurement Group (CMG, http://cmg.org), an organization of performance and capacity planning professionals
Selecting Load Testing Tools: Classifying and evaluating load testing tools is not easy as they include different sets of functionality often crossing borders of whatever criteria are used. In most cases, any classification is either an oversimplification (which in some cases still may be useful) or a marketing trick to highlight advantages of specific tools. There are many criteria allowing to differentiate load testing tools and it is probably better to evaluate tools on each criterion separately.
First, as we discussed above, there are three main approaches to workload generation and every tool may be evaluated on which of them it supports and how exactly.
Protocol-level recording and the list of supported protocols: Does the tool support protocol-level recording and, if it does, what protocols it supports. With quick Internet growth and popularity of browser-based clients, most products support HTTP only or a few Web-related protocols. According to my knowledge, only HP LoadRunner and Microfocus SilkPerformer try to keep up with support of all popular protocols. So if you need recording of a special protocol, you will probably end up into looking at these two tools (unless you find a special niche tool supporting your specific protocol). That somewhat explains the popularity of LoadRunner at large corporations where you probably have almost all possible protocols used. The level of support of specific protocols differs significantly too. Some HTTP-based protocols are extremely difficult to correlate if there is no built-in support, so you may look for that kind of specific support. For example, Oracle Application Testing Suite may have better support for Oracle technologies.
UI-level recording: This option has been available for a long time, but it is much more viable now. For example, it was possible to use Mercury/HP WinRunner or QuickTest Professional (QTP) scripts in load tests, but you needed a separate machine for each virtual user (or at least a separate terminal session). That limited the level of load you could achieve drastically. Other known options were, for example, Citrix and RDP (Remote Desktop Protocol) protocols in LoadRunner – which always were the last resort when nothing else was working, but were notoriously tricky to playback. New UI-level tools for browsers, such as Selenium, extend the possibilities of the UI-level approach allowing multiple browsers to run per machine (so scalability is limited by resources available to run browsers). Moreover, there are UI-less browsers, such as HtmlUnit, which require significantly less resources than real browsers. There are multiple tools supporting this approach now – such as PushToTest directly harnessing Selenium and HtmlUnit for load testing or LoadRunner TruClient protocol and SOASTA CloudTest using more proprietary solutions to achieve low-overhead playback. Still questions of supported technologies, scalability, and timing accuracy remain largely undocumented, so the approach requires evaluation in every specific non-trivial case.
Programming: There are cases when you can’t (or can, but it is more difficult) use recording at all. In such cases using API calls from the script may be an option. Other variations of this approach are web services scripting and using unit testing scripts for load testing. Of course, you may need to add some logic to your recorded script. You program the script using whatever method you have and use the tool to execute scripts, coordinate their executions, report and analyze results. To do this, the tool should have ability to add code to (or invoke code from) your script. Of course, if tool’s language is different from the language of your API, you would need to figure out a way to plumb them. Tools, using standard languages such as C (e.g. LoadRunner) or Java (e.g. Oracle Application Testing Suite) may have an advantage here. However you should know all details of the communication between client and server. That is often very challenging.
Other important criteria are related to the environment:
Deployment Model : There are a lot of discussions about different deployment models: lab vs. cloud vs. service. There are advantages and disadvantages to each model. Depending on your goals and systems to test you may prefer one deployment model over another. For comprehensive performance testing you may really need both lab testing (with reproducible results for performance optimization) and realistic outside testing from around the globe (to check real-life issues that you can’t simulate in the lab). Doing both would be expensive and makes sense when you really care about performance and have a global system – but it is not rare and if you are not there yet, you can get there eventually. If there are such chances, it would be better to have a tool which supports different deployment models.
Either it is lab or cloud, an important question would be what kind of software / hardware / cloud the tool requires. Many tools use low-level system functionality, so it may be a unpleasant surprise when the platform of you chose or your corporate browser standard is not supported.
Scaling : When you have a few users to simulate, it usually is not a problem. The more users you need to simulate, the more important it becomes. The tools differ drastically on how many resources they need per simulated user and how well they may handle large volumes of information. It may differ significantly even for a specific tool depending on protocol used and specifics of your script. As soon as you get to thousands of users, it often becomes a major problem. For a very large number of users some automation, like automatic creation of a specified number of load generators across several clouds in SOASTA CloudTest, may be very handy. Load testing appliances (for example, Spirent Avalanche) can be useful for simulating a large number of simple Web users, but scripting is usually limited.
Two other important sets of functionality are monitoring of the environment and result analysis. While theoretically it is possible to do it using other tools, it significantly degrades productivity and may require building some plumbing infrastructure. While these two areas may look optional, integrated and powerful monitoring and result analysis are very important. The more complex system and tests, the more important they are.
Of course, non-technical criteria are important too:
Cost/Licensing Model : There are commercial tools (and license costs differ drastically) and free tools. And there are some choices in between: for example SOASTA has the CouldTest Light edition free up to 100 users. There are many free tools (some, as JMeter, are mature and well-known) [OPEN] and many inexpensive tools, but most of them are very limited in functionality.
Skills : Considering a large number of tools and a relatively small number of people working in the area, there is a labor market only for the most popular tools. Even for the second-tier tools there are few people around and few positions available. So if you don’t choose the market leaders, you can’t count on being able to find people with experience using the tool. Of course, an experienced performance engineer will learn any tool – but it may take some time until productivity gets to the expected level.
Support : Recording and load generation is very sophisticated and issues can arise in every area. Availability of good support can significantly improve productivity.
This is, of course, not a comprehensive list of criteria – rather a few starting points. Unfortunately, in most cases you can’t just rank tools on the better to worse scale. It may be that a simple tool will work quite well in your case. If your business is built around a single web site, it doesn’t use sophisticated technologies, and load is not extremely high – almost every tool will work for you. The further you are from this state, the more challenging it is to pick up the right tool, and you may need several tools.
While you can evaluate tools with the above mentioned criteria, it is not guaranteed that a specific tool will work with your specific product (unless it uses a well-known and straightforward technology). That actually means that if you have a few systems to test, you need to evaluate the tools you are considering using on your systems and see if the tools can handle them. If you have many systems, choosing a tool that supports multiple load generation options is probably a good idea (and, of course, check it with at least the most important systems).
On the other hand, it is always good to keep in mind that a load testing tool is only a tool. While you probably need a sophisticated set of tools to create a luxury furniture set, you need only a hammer to nail a picture to the wall.
Summary: Load testing is an important, integral part of the performance engineering process. It is possible that today the need for “performance testers” maybe less that it was during the load testing heyday due to better instrumenting, APM tools, continuous integration, etc. – but there is still a need for performance experts who are able to see the whole picture using all available tools and techniques.
There is no best approach to load generation or, moreover, best load testing tool. Some approaches or tools may be better in a particular context. It is quite possible that a combination of tools and approaches would be necessary in complex environments. Choosing the right strategy in load generation can be a challenging task. While digging deeply into details of particular projects and tools may be needed, it is good to see a bigger picture of what approaches and tools are available and what their advantages and disadvantages are.
[BARB06] S.Barber, “User Experience, Not Metrics” (2006). http://www.perftestplus.com/resources/UENM1.pdf
[BUKSH12] Jason Buksh. Performance Testing is hitting the wall, 2012. http://www.perftesting.co.uk/performance-testing-is-hitting-the-wall/2012/04/11/
[JAIN91] R. Jain, “The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation and Modeling”, Wiley (1991).
[MICR04] Improving .NET Application Performance and Scalability, Microsoft Press (2004). http://msdn.microsoft.com/en-us/library/ff649152.aspx
[MOLY09] I. Molyneaux,. The Art of Application Performance Testing. O’Reilly (2009).
[OPEN] Open Source Performance Testing Tools http://www.opensourcetesting.org/performance.php
[PERF] Performance Testing Citrix Applications Using LoadRunner: Citrix Virtual User Best Practices, Northway white paper. http://northwaysolutions.com/our-work/downloads
[PERF07] Performance Testing Guidance for Web Applications (2007) http://perftestingguide.codeplex.com/
[PODE01] A.Podelko, A.Sokk, L.Grinshpan, Custom Load Generation, CMG (2001).
[PODE05] A.Podelko Workload Generation: Does One Approach Fit All?, CMG(2005).
[SEGUE05] “Choosing a Load Testing Strategy”, Segue white paper (2005).http://www.iiquality.com/articles/load_testing.pdf
[SMITH02] C.U. Smith, L.G.Williams, “Performance Solutions”, Addison-Wesley (2002).
[STIR02] S.Stirling, “Load Testing Terminology”, Quality Techniques Newsletter, September (2002). http://www.soft.com/News/QTN-Online/qtnsep02.html
For the last seventeen years Alex Podelko has worked as a performance engineer and architect for several companies. Currently he is Consulting Member of Technical Staff at Oracle, responsible for performance testing and optimization of Enterprise Performance Management and Business Intelligence (a.k.a. Hyperion) products.
Alex periodically talks and writes about performance-related topics, advocating tearing down silo walls between different groups of performance professionals. His collection of performance-related links and documents (including his recent papers and presentations) can be found at www.alexanderpodelko.com. He blogs at http://alexanderpodelko.com/blog and can be found on Twitter as @apodelko. Alex currently serves as a director for the Computer Measurement Group (CMG, http://cmg.org), an organization of performance and capacity planning professionals.