Performance Testing: See The Bigger Picture – Part II

This is the Second of a Three Part Article by Alex Podelko on Performance Testing. For the last seventeen years Alex Podelko has worked as a performance engineer and architect for several companies. Currently he is Consulting Member of Technical Staff at Oracle, responsible for performance testing and optimization of Enterprise Performance Management and Business Intelligence (a.k.a. Hyperion) products.

 Alex periodically talks and writes about performance-related topics, advocating tearing down silo walls between different groups of performance professionals. His collection of performance-related links and documents (including his recent papers and presentations) can be found at He blogs at and can be found on Twitter as @apodelko. Alex currently serves as a director for the Computer Measurement Group (CMG,, an organization of performance and capacity planning professionals

Load testing is an important part of the performance engineering process. It remains the main way to ensure appropriate performance and reliability in production. Still it is important to see a bigger picture beyond stereotypical last-moment load testing. There are different ways to create load; a single approach may not work in all situations. Many tools allow you to use different ways of recording/playback and programming. This paper discusses pros and cons of each approach, when it can be used and what tool’s features we need to support it.
You can read the first part in the series by – Clicking Here.

 Record and Playback: Protocol Level : The mainstream approach of load testing (at least for business and Internet applications) is recording communication between two tiers of the system and playing back the automatically created script (usually, of course, after proper correlation and parameterization). Tools used for that are usually referred as “load testing tools” and users simulated by such tools are usually referred as “virtual users”. The real client-side software isn’t necessary to replay such scripts, so the number of simulated virtual users can be high; it is theoretically limited only by available hardware (each tool has specific hardware requirements depending on the type and complexness of scripts).

Fig.Record and playback approach, protocol level

Both recording and playback happen between the tiers, so the protocol used between the client and the server is extremely important. Other factors, such as what language was used to develop the system, what platform the server is deployed on, etc. are usually irrelevant for scripting (although they can give some hints about what protocol is used for communication).

The process is reasonably straightforward when you test a simple Web site or a simple Web application with a thin client. Even a beginner in load testing can quickly create a few scripts and run tests. That is one reason why the record and playback approach is so popular. However, there is a trap in that easiness: load testing really embraces much more. Load should be validated for correctness (if you don’t see errors in the load testing tool it doesn’t always mean that it works properly) and realism (using unrealistic scenarios is the easiest way to get misleading results). Moreover, load generation is only one step in load testing, there are many other important parts (like getting requirements and doing results analysis), as well as other related activities (like tuning or diagnostics).

Unfortunately, scripting can be challenging even for a Web application. Recording a script and making it work can be a serious research task, often including many try-and-fail iterations. A good load testing tool can help if it supports your protocol.

The protocol level record and playback approach has several serious limitations:

  • It usually doesn’t work for testing components and services.
  • Each particular load testing tool supports a limited number of technologies.
  • Some technologies require very time-consuming correlation and parametrization and some may be not supported at all.
  • The workload validity in case of sophisticated logic on the client side is not guaranteed 

These limitations are usually not a problem in the case of simple web applications using a browser as a client, but they become a serious problem when you need to test different protocols across the whole software lifecycle.

Each load testing tool supports a limited number of technologies (protocols). New or exotic technologies are not usually on the list. Vendors of load test tools add new supported protocols continually, but we often do not have time to wait for the specific protocol to be added – as soon as we get a new product we need to test it.

For example, back in 1999, we were not able to use recording for the SMB (Server Message Block) protocol, later succeeded by the Common Internet File System (CIFS) protocol, Microsoft DCOM (Distributed Component Object Model), or Java RMI (Remote Method Invocation). While some vendors claimed that their products support these protocols, it didn’t work in all environments.

Later there were issues with Java applets and ActiveX controls, which used serialization, encoding, or even proprietary protocols.

Today we are getting a new generation of Rich Internet Applications (RIA) and new web protocols, bringing these old challenges of protocol level recording back – so some authors started to talk about the crisis of performance testing (for example, [BUKSH12]). Still these issues don’t look more challenging that the issues we had 10-15 years ago – especially considering that many still use underlying standard web protocols, so we at least are able to record the communication.

Even if the protocol is supported, script recording and parameterization often are far from being straightforward and often required a good knowledge of system internals. The question of workload validation is also opened. Here is a simple example illustrating possible issues.

One function of the product we were testing was financial consolidations, which can take a long time. The client starts the operation on the server, then waits for it to finish, as a progress bar is shown on screen. When recorded, the script looks like (in LoadRunner pseudo-code):

“URL={URL}/Data/XMLDataGrid.asp?Action=EXECUTE&TaskID=1024&RowStart=1&ColStart=2&RowEnd=1&ColEnd=2&SelType=0&Format=JavaScript”,  LAST); 
web_custom_request(“XMLDataGrid.asp_8”, “URL={URL}/Data/XMLDataGrid.asp?Action=GETCONSOLSTATUS”,
web_custom_request(“XMLDataGrid.asp_9”, “URL={URL}/Data/XMLDataGrid.asp?Action=GETCONSOLSTATUS”,
web_custom_request(“XMLDataGrid.asp_9”, “URL={URL}/Data/XMLDataGrid.asp?Action=GETCONSOLSTATUS”,

Each request’s action is defined by the ?Action= part. The number of GETCONSOLSTATUS requests recorded depends on the processing time. In the example above, the request was recorded three times; it means the consolidation was done by the moment the third GETCONSOLSTATUS request was sent to the server. If you play back this script, it will work this way: the script submits the consolidation in the EXECUTE request and then calls GETCONSOLSTATUS three times. If we have a timer around these requests, the response time will be almost instantaneous, while in reality the consolidation may take many minutes or even hours. If we have several iterations in the script, we will submit several consolidations, which continue to work in the background, competing for the same data, while we report sub-second response times.

Consolidation scripts require creating an explicit loop around GETCONSOLSTATUS to catch the end of consolidation:

do {
        } while (strcmp(lr_eval_string("{abc_count}"),"1")==0);
Here the loop simulates the internal logic of the system, sending GETCONSOLSTATUS requests every three seconds until the consolidation is complete. Without such a loop, the script just checks the status and finishes the iteration while the consolidation continues for a long time after that.

So, it is possible that the record and playback approach won’t work in your environment, or that using the approach will be too time-consuming and inflexible (as it happened many times for us). When such problems are encountered, it is a good time to check other alternatives and add them to your arsenal.

Record and Playback: UI-Level : Another approach to simulating user activities is to record user interactions with Graphical User Interface (GUI) – such as keystrokes and mouse clicks – and then play them back. Users, simulated using such approach, are sometimes referred as GUI users. The tools using this approach simulate users in the most accurate way: they really take the place of a real user. You are supposed to get end-to-end response times identical to what users would see.

Originally such tools were mostly used for automated functional testing, although the option to use this approach for load testing was available for a long time. For load testing, these GUI tools were usually used in conjunction with the load testing tool from the same vendor, which coordinated execution of multiple GUI scripts and collected results.

Fig. Record and playback approach, GUI users

The main problem with such tools is that these tools drive an instance of client software and require a machine for each user, so it is almost impossible to use them for a large number of simulated users – you need the same number of physical boxes as the number of users being simulated. Some tools have the ability to run one user per Windows Terminal Server session that significantly increases scalability of the solution (probably up to low hundreds of users from a practical point of view).

Another known option is, for example, using the low-level graphical Citrix or Remote Desktop protocols – which always was the last resort when nothing else was working, but were notoriously tricky to playback [PERF]. It works fine when you indeed use Citrix or Remote Desktop. But using it as a workaround means that you test a significantly different setup than you would use in real life (with multiple clients parts running on a server) that may undermine the value of testing.

Nowadays most applications have Web-based interface and a new generation of UI-level tools for browsers extend possibilities of the UI-level approach allowing to run multiple browsers per machine (so scalability is limited by the machine resources available to run browsers). Perhaps we can refer to users simulated by such tools as browser users (because more low-level browser control is usually used).

Fig.Record and playback approach, browser users

Moreover, UI-less browsers were created, such as HtmlUnit, which require significantly less resources than real browsers. This drastically increased scalability of the UI-level approach and made it much more viable for load testing now, but the approach still remains less scalable than the protocol-level approach just because all these browsers (even the light-weight ones) still need to be run and all client-side application code be executed on the load generator machine.

Using the UI-level approach for load testing sounds very promising: we get end-user timing and do not depend on intricacies of the client-server communication. However questions of supported technologies, scalability, and timing accuracy remain largely undocumented, so the approach requires evaluation in every non-trivial case. So far the approach is mostly used to re-use existing functional testing scripts or when it is impossible to use protocol-level scripts. 

Manual : Manual load generation isn’t a real option if we want to simulate a large number of users. Still, in some cases, it can be a good option when we need load from a few users and don’t have proper tools available or face serious issues with scripting. Sometimes a manual test can be a good option on earlier stages of testing to verify that the system can support concurrent work or to diagnose, for example, locking problems.

One of the concerns with manual testing is that even when each user has an exact scenario, time variations can occur; so the tests are not exactly reproducible due to variations in human input times. Such an approach hardly can be recommended as a long term solution, even with few users.

It still could be useful to run one or few users manually in parallel to simulated virtual users’ workload to better understand what real users would experience. That is a good way to verify test results: if manual response times match what you see for the scripts, it is an indication that your scripts are correct.

Programming and Custom Test Harness : Programming is another approach to load generation. A straightforward way to create a multi-user workload is to develop a special program to generate workload. This program requires access to the API or source code and some programming work. It is often used to test components. No special testing tool is necessary (although some tools are available that can simplify work).

 In some simple cases it could be the best solution (from a cost perspective, especially if there is no purchased load testing tool). A starting version could be quickly created by a programmer familiar with the API. A simple test harness, for example, could spawn several threads and each thread, simulating a real user, could include the same sequence of API calls as the real software for that use case. No need to worry about what protocol is used for communication.

We successfully used this approach for component load testing in several projects (and, of course, this approach is widely used by developers). However, efforts to update and maintain the harness increase drastically as soon as you need to add such features as, for example:

  • Complex user scenarios
  • Centralized test management and result analysis
  • Coordinated test execution from several computers

 If you have numerous products, you really need to create something like a commercial load testing tool to assure all necessary performance and reliability testing. It probably isn’t the best choice for a small group of testers.

Programming and Using Load Testing Tools : Many advance load testing tools support one (or several) scripting languages allowing to program scripts in whatever way is necessary while using the tool to manage scripts executions, collect and analyze the results. It may be direct programming of server requests or using web services, or using Application Programming Interface (API). If using API, the approach may need lightweight custom software clients (client stubs) to create the correct workload.

Fig: Programming API using a Load Testing Tool

The implementation of this approach (we called it custom load generation) depends on the particular load testing tool. The original way was to create an external C dll (or shared library for UNIX) and then call functions defined in the dll from the tool’s native script language.

Another way to implement this approach appeared in the later versions of load testing tools: creating a script in a programming language (such as Java or Visual Basic) with the help of templates and special tool-supplied functions. 

These are the significant advantages of this custom load generation approach:

  • It eliminates dependency on the third-party tool to support specific protocols.
  • It leverages all the features of existing load testing tools and allows use of them as a test harness.
  • It takes away the need to implement multi-user support, data collection and analysis, reporting, scheduling, etc. This is inherent in the third-party tool.
  • It ensures that performance testing of current or future applications can be done for any protocol used to communicate among different tiers

Custom load generation may allow managing the workload in a more user-friendly way by simplifying parametrization.

For example, if you record socket-level traffic, recording and parametrization could take a lot of time. And if you need to change the workload (for example, use new queries), it is almost impossible to change the parametrized script to reflect the new workload. You probably need to re-record and re-parametrize the script.

When you implement custom load generation, the real query could be read from an input file. Changing the query becomes very easy: you just change the input file without any changes in the script.

The same is true if different builds of the software are tested. Small changes could impact a low-level protocol script, but the API is usually more stable. Just install the new build and run the test. There is no new recording and parametrization needed.

But, of course, there are some considerations to keep in mind for the custom load generation approach:

  • It requires access to API or source code.
  • It requires additional programming work.
  • It requires an understanding of internals (to re-create the sequence used by real users).
  • The client environment should be set up on all load generator machines.
  • It requires commercial tool licenses for the necessary number of virtual users.
  • The lowest level transaction that can be measured is an external function.
  • It usually requires more resources on client machines (since there is some custom software).
  • The results should be carefully interpreted (to insure that there is no contention between client stubs).

Programming may be a better solution in many cases, but it is not a full replacement of recording approaches. In cases when recording works well, it usually provides better and more efficient solutions. One of important advantages of recording is that that the tool records exactly whatever communication happens between user and server – while with programming it is often what the person creating script think the communication is. Unfortunately communication between user and server is often very complicated and difficult to reproduce programmatically. So the tools that support only programming (and not supporting recording) have rather limited area of application.

Custom Load Generation Examples : Two examples below are for Mercury LoadRunner – just because it is the tool we use most. Similar things can be done other tools. The first example is a multidimensional analytical engine. Originally the main way to access it was through the C API; many products use it, including Excel Add-in. It is possible to record a script using the Winsock protocol (a low-level protocol recording all network communication); Winsock scripts are quite difficult to parametrize and verify.

Here is a small extract of a correlated Winsock script:

lrs_create_socket(“socket0”, “TCP”, “LocalHost=0”,
“”,  lrsLastArg);
lrs_send(“socket0”, “buf0”, LrsLastArg);
lrs_receive(“socket0”, “buf1”, LrsLastArg);
lrs_send(“socket0”, “buf2”, LrsLastArg);
lrs_receive(“socket0”, “buf3”, LrsLastArg);
lrs_save_searched_string(“socket0”,    LRS_LAST_RECEIVED, “Handle1”,        “LB/BIN=\\x00\\x00\\v\\x00\\x04\\x00”,     “RB/BIN=\\x04\\x00\\x06\\x00\\x06”, 1, 0, -1);
lrs_send(“socket0”, “buf4”, LrsLastArg);
lrs_receive(“socket0”, “buf5”, LrsLastArg);

Another part of the script includes the content of each sent or received buffer:

send  buf22 26165

The script consists from many pages of such binary data. Correlating such scripts is very time-consuming and the resulting scripts are almost impossible to parameterize – if you need to change anything in the query (for example, run it for another city) you need to start from a scratch.

An external dll was made for major functions. Below is a script using this external dll:

pCTX = Init_Context();  
hr = Connect(pCTX, “ess01”, “user001″,”password”);

sprintf(report, “SELECT %s.children on columns,
   %s.children on rows FROM Shipment WHERE
   ([Measures].[Qty Shipped], %s, %s)”,
   lr_eval_string(“{day}”),  lr_eval_string(“{product}”),
hr = RunQuery(pCTX, report);

The lines above are almost the whole script (except a few technical lines) instead of many pages of binary data. An MDX query is generated using day, product, customer, and shipper as parameters, so we hit the different spots of the database and avoid artificial caching effects.

Another example is a middleware product (without GUI interface, only an administrative console). We were given functional test scripts in Java. The product can use HTTP (with major application servers) or TCP/IP (as a stand-alone solution). It is possible to run a test script and record HTTP traffic between the script and the server. It is HTTP, but it is just binary data inside the HTTP request body. You can’t do anything with them; you can only play them back as is. You need start from a scratch if you want to make a small change.

The solution that we finally used was the creation of LoadRunner scripts from the test script directly. Just put Java code inside the template and add tool-specific statements (like lr.start_transaction and lr.end_transaction). Here is how the beginning of the script looks:

import com.essbase.api.base.*;
import com.essbase.api.session.*;

public int action() {
String s_userName = “system”;
String s_password = “password”;
try {
ess = IEssbase.Home. create
   (“01_Create_API_instance”, lr.AUTO);
IEssDomain dom = ess.signOn(s_userName,
   s_password, s_domainName, s_prefEesSvrName,
   s_orbType, s_port);
lr.end_transaction(“02_SignOn”, lr.AUTO);

It is possible, of course, to create a simple program that will start many such scripts in parallel, but you need to implement all the infrastructure (coordination, results analysis, monitoring, etc.) yourself. Such work is usually not a good option for a small group working with many different products. It makes much more sense when an existing tool provides this infrastructure. However most inexpensive or free tools, unfortunately, are weak in providing such functionality.

Please check for the last part of this series to be published next week.

For the last seventeen years Alex Podelko has worked as a performance engineer and architect for several companies. Currently he is Consulting Member of Technical Staff at Oracle, responsible for performance testing and optimization of Enterprise Performance Management and Business Intelligence (a.k.a. Hyperion) products. alex_podelko_image

 Alex periodically talks and writes about performance-related topics, advocating tearing down silo walls between different groups of performance professionals. His collection of performance-related links and documents (including his recent papers and presentations) can be found at He blogs at and can be found on Twitter as @apodelko. Alex currently serves as a director for the Computer Measurement Group (CMG,, an organization of performance and capacity planning professionals.

Related Posts

  • Performance Testing : See The Bigger PicturePerformance Testing : See The Bigger Picture This is the First of a Three Part Article by Alex Podelko on Performance Testing. For the last seventeen years Alex Podelko has worked as a performance engineer and architect for several companies. Currently he is Consulting Member of Technical Staff at Oracle, responsible for […]
  • Performance Testing : See The Bigger Picture – Part IIIPerformance Testing : See The Bigger Picture – Part III This is the Third of a Three Part Article by Alex Podelko on Performance Testing. For the last seventeen years Alex Podelko has worked as a performance engineer and architect for several companies. Currently he is Consulting Member of Technical Staff at Oracle, responsible for […]
  • Use of Quantitative Models For Performance Testing & Engineering – Part IIUse of Quantitative Models For Performance Testing & Engineering – Part II This article is the second in a two part series by Author Ramesh Iyer on the 101 around Use of Quantitative Models for Performance Testing & Engineering. If you haven't read the first part we encourage you to check it out at the following link - Use of Quantitative Models for […]
  • Are You Crazy, Baseline Performance Without Testing!!!Are You Crazy, Baseline Performance Without Testing!!! What Is A System Performance Baseline - Before we dive into the challenges of defining a System Performance Baseline, let's look at the definition of Baseline Performance including how it fits within the context of this article. Baselining Performance or the art of Baselining System […]