Category Archives: Testing

Understand and choose a load test model

2012-11-27 Andrea

When running load tests it is all about the concept of how you setup and run a test. Previously we already talked about getting to numbers, today we will talk about the load models you can use to run your tests.

One of the decisions you have to make is for the right load model. Sounds complicated – but in fact isn’t. A load model basically describes what basic characteristic you can influence to reach a certain load and performance behavior.

Just one more thing before we talk about the models, because the definitions used in the following paragraph are not unique across the industry. So each vendor uses them slightly differently.

Transactions: “A transaction is an execution of exactly one test case or test scenario. In order to perform the scenario, the page flow is modeled in code. The test scenario is implemented as a test case, which itself executes a sequence of one or more actions.”

Action: “An action can be defined as one irreducible step within a test case. So an action interacts with the current page and – as a result – loads the next page. That page is associated with this action and becomes the current page for the next action in the test scenario. Generally, an action triggers one or more requests.”

Requests: “This level is equivalent to the HTTP request level used in web browsers or in any other application that relies on HTTP communication. You do not have to deal with requests directly because they are automatically generated by the underlying HtmlUnit framework when performing actions on HTML elements.”

Let’s check out an example. Before running the next marketing campaign your company’s online shop will have to be tested to make sure it won’t break under that much traffic. Two of your teammates are discussing the problem:

Head of IT: “The shop will have to handle 2000 concurrent users.”

Head of marketing: “With our campaign the the shop will have to handle 500 orders per hour.”

Both requirements can lead to two different load models, depending on the exact goal that is set.

User Count Model: This is the approach of the IT guy. Define a certain number of concurrent users the system will have to handle. At any given time during the test, the target system has to handle 2000 concurrent users, no more, no less. The number of transactions that can be achieved depends on the time the target system needs to respond.
Arrival Rate Model: The money driven marketing guy tends to another criteria: The number of transactions (or here: orders) per hour. The scenario will be performed 500 times, equally distributed across a period of one hour. As many concurrent users as necessary to fulfill the given arrival rate are used.

Okay. Understood. Head of IT is happy with the user count model. He’s pretty sure the shop will be stable with 2000 concurrent users.

So what will happen if the response time of the system increases during the test period?

When using the user count model simply less transactions will be finished. In comparison the arrival rate model will increase the number of concurrent users as well, to make sure the given number of orders will be performed.

You can see the difference here? The arrival rate model works feedback based and will react to response time changes during the test period. This way, the generated load is somewhat unpredictable – and that’s how the real world behaves.

Or to use a more common analogy, people will usually still enter the supermarket despite the fact that there are long line at the cashier. That what happens in your online shop when the response time increases and you use the arrival rate model. More user are coming in, because they do not know that there are already lines in the store.

The chances for picking the arrival rate model are getting higher, but the IT guy is just a little bit afraid of the aggressive behaviour of such a test. Just imagine:

response time increases
more concurrent users are used
response time will increase even more due to heavier load
more concurrent users are used
recursion here

But there is a solution. To avoid a complete breakdown, you can define an upper limit to the number of concurrent users used in the arrival model. This can help you to restrict the total load on the system, if you want to avoid a total overload as a result of the feedback loop. That is not reality of course, but of the user feel a certain pain in reality, they might hold back as well.

Ok, the arrival rate model is going to be used for the next load test. But there are definitely scenarios in which the user count model could be a good choice? Here are some guidelines which model fits what purpose best.

The user count model is well-suited for:

a simple base line test (single-user test) to assess the base performance of the system under almost no load,
a real load or performance test to assess the performance under a high, but predictable aka stable load,
a test that should be easily repeatable and its load factor is not influenced by the system under test.

The arrival rate model is best used if the load test should prove that a system is indeed able to handle a certain number of transactions per hour. Since this is the primary purpose of load and performance tests, the arrival rate load model is the best choice for most of your test tasks.

So get testing and give all models a try. Load tests often start with a fixed user rate and once this runs fine, you move over to the more challenging arrival rate model.

Automation, Testing, XLT

Test Automation for Demandware SiteGenesis with XLT

2012-10-27 Rene 1 Comment

Demandware, Inc., a leading on-demand ecommerce platform provider, offers a fast and ready to use storefront implementation called SiteGenesis. Implementation partners are using that as a template to implement customized storefronts. This means that testing is part of the effort of course.

Testing a storefront can be a tedious task. It often consists of regression testing to ensure that the already properly tested functionality is still doing well after a bug fix. Additionally a lot of development processes are agile nowadays. You do not want to go back and start testing again, when you get a new iteration, do you? So what to do?

Test automation to the rescue! We are going to provide you an initial test suite that covers most bases and can act as a template for your test automation effort. The test suite runs with Xceptance LoadTest Script Developer and is well documented.

Watch this short introductory video and get an idea what you can achieve with proper test automation.

The test automation suite can be downloaded right here: XLT SiteGenesis Test Automation Suite.

If you are taking part in the development of SiteGenesis because you joined Demandware’s Open Collaboration Model and got access to GitHub, you can also get the latest version of the test suite straight from the repository. We are of course always interested to get your feedback, so fork the suite and let us know.

You need Xceptance LoadTest (XLT) and Mozilla Firefox to get the test suite running. Windows, Linux, MacOS? Any OS is fine. XLT is free of charge. No strings attached.

If you are interested in your own version of the test suite, you want achieve a broader coverage, or your simple need a hint how to proceed, let us know. If you need support for XLT, you can purchase it right here: XLT Service Portal.

P.S. We do not provide you an instance of SiteGenesis for testing. You have to be a signed up partner or customer of Demandware to get your own test instance.

Misc, Testing

Xceptance, Inc. is hiring

2012-04-20 Rene

Xceptance, Inc. is hiring a Quality Assurance Test Lead in the Greater Boston Area. Take a look at our job posting.

Automation, Java, Testing, XLT

Handle authentication during WebDriver testing

2012-02-25 Rene 2 Comments

Sometimes authentication is necessary before a test case can be executed. While HtmlUnit based tests can easily enter and confirm authentication requests, most browser based tests, cannot workaround the dialog. This is a browser security measure to prevent automated data capture and/or data entering. WebDriver for Firefox delivers a solution for that problem, but IE and Chrome rely on a manual interaction with the browser before the test automation can run.

The following steps describe a solution for the authentication problem and how to run a script test case as WebDriver based test. The key to this solution is the usage of Sikuli, an image based testing tool that directly interacts with the screen to find the right elements by using the screen.
Continue reading Handle authentication during WebDriver testing →

Performance, Software, Testing, XLT

XLT 4.1.8 Update Release

2012-02-18 Rene

XLT 4.1.8 has been released. It contains a couple of improvements and bug fixes. You can get the latest version here: https://lab.xceptance.de/releases/xlt/4.1.8/.

These are the two most important changes.

Optimized scanning CSS rules

XLT simulates the download behavior of a browser as close as possible. With com.xceptance.xlt.css.download.images set to onDemand, XLT carefully considers the download of image resources referred to in CSS files. It checks all CSS rules if they apply to all elements in the DOM tree and, if so, extracts the resource information for later download. This process was previously very expensive and ran for up to several seconds when a page was complex and a lot of CSS rules had to be matched. This has been optimized heavily.

Configurable retry behavior for keep-alive connections

If persistent connections (keep-alive) are enabled, test cases might failed sporadically with an exception such as:
java.lang.RuntimeException: org.apache.http.NoHttpResponseException: The target server failed to respond

This exception occurs only when the request was sent while the connection was about to be closed on the server side at the same time.

The underlying HttpClient retries requests if the connection was closed by the server. However, by default it retries only idempotent operations such as GET and HEAD, but not POST and PUT. In order to avoid these connection errors, the user can now configure whether non-idempotent operations are to be retried as well. Common browsers seem to use the same behavior and most certainly assume that the server did not start to process the request when it closes the connection immediately without responding with a proper HTTP status code.

A new boolean property com.xceptance.xlt.http.retry.nonIdempotentRequests in config/default.properties controls whether or not operations such as PUT and POST are retried in case of certain networking problems. Note that idempotent operations are always retried.

Testing, XLT

Testing Arabic Web Pages with XLT

2011-09-28 Rene

As you may know, there aren’t only Western character encoded websites on the web. There are also websites in Chinese, Japanese, Arabic, etc., and we wanted to know how XLT would perform if we use it for testing a non-Latin website. Non-Latin does not necessarily mean non-UTF-8, but in this case it especially means non-Western characters and right-to-left (RTL). Or in other words, things a normal European or American programmer is not used to.

So, for the first time we tested a non-Latin website, precisely an Arabic one. We created a test case with the Script Developer to test an Arabic news website. The test entered Arabic words in a search field and validated the response to check the correctness of the website’s content. Afterwards, we ran the script as a JUnit Test in Eclipse. It was successful. The test was short, but provisionally it proved that XLT also works for non-Latin websites.

Furthermore, we ran the same test in a real browser and it worked as well. We used the FirefoxDriver to simulate user-like actions in Firefox to see if the WebDriver also works with non-Latin input.

When we continued testing, we observed some facts that may interest people who don’t often use Arabic as an input language. As you might know, Arabic is written from right to left. We noticed that there’s a difference between what appears and what actually happens technically. The following example illustrates that.

Placement of the Asterisk

We inserted a text assertion command in the test case. It was an ends-with validation, i.e. the asterisk should be at the first position if the input language is Latin. So, theoretically, if the input language is Arabic, then the asterisk should be at the first position of the Arabic text (the far right), but that wasn’t the case. The asterisk was at the first position from the Latin point of view (the far left).

Then, we tried inserting the asterisk at the first position of the Arabic text (the far right), but as you can see in the figure below the evaluation failed because the asterisk’s encoding is Latin-1, and accordingly the first position is then the far left according to the computer. So, we tried to insert the asterisk in Arabic (as input language), but the evaluation failed, as well.

The next picture shows how it should be.

It was really hard trying to switch our way of thinking to the “right way”. We spent so much time trying to figure out how to insert the asterisks in the test cases and trying to understand the Arabic point of view.

Not all Asterisks are the same

So, in this case the asterisk must be inserted in Latin. Firstly, the asterisk will be inserted at the last position of the Arabic text and secondly, it has another code. As it turns out, the Arabic asterisk does not have the same code as the Latin asterisk. The Latin asterisk’s code in Unicode code points is U+002A and the Arabic one’s is U+066D, and it even has another name. It is called “Arabic Five Pointed Star” and it actually looks differently. You might be wondering why that is. We asked ourselves the same thing, and we couldn’t find any plausible answer. Of course, there are different characters in Arabic such as the comma (Arabic comma: “،”), but the asterisk is pretty much an asterisk everywhere and we wondered why it is a different character code.

In the following example we wrote a very simple website as an example to check if the position of the asterisks is always on the wrong side (seen from the Arabic point of view).

At the beginning it is important to set the “charset” to Unicode or you’ll just get question marks as output. As you can see, we set in the head tag the direction of the text alignment to “rtl”- this means “from right to left”- so that the output on the website would appear from right to left. This also applies to the input field in line 13. As you may notice, there’s an exclamation mark added in Latin that’s why it’s shown at the beginning of the Arabic word because it’s technically the last position in Latin. The following picture shows how it appears in the browser.

Page Preview

We decided to write the text in different HTML elements (here: div and span elements) to see if it makes a difference or not if the tested text is over multiple elements. As it turned out, it has no affect if the text is over multiple elements or not and it will be output in the right order, but the problem that the position of the asterisk is wrong to the eyes of an Arab still remains, which may cause great confusion.

Parameters in Western encoding preferred

There’s also a small disadvantage for Arabic developers. Parameter names cannot be put in Arabic. Because our tool only accepts characters “A”-“Z”, “a”-“z”, “0”-“9” and “_” for parameter names. That goes for test case names, as well. If you export your test cases with the Script Developer and you change the parameter names to Arabic in Eclipse, your test will fail unless you change the parameters in the data files as well, and then it will work just fine. The Script Developer will also show the modified Arabic parameter names and it will replay without any trouble. But if you check the parameter names in the Script Developer, you’ll notice that the name field is empty.

Programming in Arabic

Unfortunately, Arabic isn’t really supported in Eclipse on Windows, even if you run Eclipse in an Arabic version because it’s just a translated version of the platform and the operating system’s encoding is set to ISO Latin-1, i.e. any output in the console that is non-Latin will only be displayed as question marks. But in Linux it works because it supports UTF-8.

We were surprised that the Arabic version could align the Arabic text to the (far) right where it actually should begin, which is not supported by the Latin version of Eclipse. So, if you’re planning on developing a non-Latin website in Eclipse and you stumble upon a version of Eclipse in your language and decide to give it a try don’t get your hopes up because these versions are just translated, and might even be incomplete.

To sum up, we expanded our horizon by proving that our tool could also do test automation problem-free on non-Latin websites.

Java, Testing, XLT

Review Of Cross-Browser Testing Tools

2011-08-07 Rene

Smashing Magazine lists a couple of free and commercial tools to cover cross-browser testing:

Good news: very powerful free testing tools are available for Web designers today. Some are more user-friendly than others, and some have significantly better user interfaces. Don’t expect much (if any) support with these tools. But if you’d rather not spend extra money on testing, some great options are here as well.

Read the full article…

By the way, our own tool Xceptance LoadTest (XLT) offers a way to run cross-browser functional tests. XLT leverages WebDriver, a multi-browser API for automation. WebDriver does not support all browser and does not equally support all browser well, but we tried to iron out as much as possible. On top of it, you can use the XLT Script Developer to easily create automation scripts and run them either using our own scripting language or export them to Java to directly run them on the WebDriver-API.

You can download Xceptance LoadTest for free with no strings attached from our web site: www.xceptance-loadtest.com.

Performance, Testing, XLT

Get the right load mix out of a few numbers

2011-06-07 Rene 4 Comments

When testing ecommerce applications on SaaS environments, you often do not get enough numbers from clients because they simply do not know these numbers or only a few. One reason for that is, that the client simply have not had any only presence before. Often the client also does not have detailed numbers, because the previous hoster or the IT department just holds them back or simply cannot get to these numbers.

So what to do, when you do not know every detail about the current or future load pattern? We are describing one approach below that was very successful so far and always yielded satisfying results.

What we need

Visits per peak hour (example 10k)
Page views per peak hour (example 100k)
Orders per peak hour (example 200 orders)
Optionally we can use the conversion rate to get from visits to orders or vice versa.
Optionally we can take searches, “add to cart” operations, user registrations, and so on into account.

The mentioned scenarios are typical ecommerce scenarios and look like that. We will not talk about smaller scenarios such as address editing for a registered user.

TSingleClickVisit: Enters the store only, does not move beyond the start page
TBrowsing: TVisitor plus category and product browsing
TSearch: TVisitor plus keyword search plus browsing of the result
TAdd2Cart: TBrowsing plus add to cart operations
TGuestCheckout: TAdd2Cart plus checkout without an order placement (anonymous user)
TGuestOrder: TAdd2Cart plus full checkout (anonymous user)
TRegisteredCheckout: TAdd2Cart plus checkout without an order placement (registered customer)
TRegisteredOrder: TAdd2Cart plus full checkout (registered customer)
TRegistration: Account creation

What we assume

Ecommerce sites follow similar patterns and with a few exceptions, such as special promotions, certain behavioral patterns are nearly identical. So for instance, about 50% of all checkouts are stopped before the order is placed. About 20 to 50% of all created carts aren’t checked out at all.

What we calculate

Based on these assumptions, we put together a fairly simple but sufficiently accurate load mix. Of course, we can also analyze the current log files and try to come up with something more precise, but that will be a snapshot only. Traffic is very volatile and so we should be very generous when setting up this mix.

Since we do not take any daily averages as base but the peaks, we will have a pretty comfortable buffer for our daily ecommerce life anyway.

Bottom-Up

Let’s say, 200 orders are set as goal. Splitting them 50/50 between registered and anonymous users, we get 100 visits of each type. All numbers are per hour of course.

TGuestOrder = 100
TRegisteredOrder = 100

As a next step, we take our 50% checkout abandonment rate into account. We have 200 checkouts per hour that are stopped and 200 that run through and turn up as orders (as counted previously). So we need to add 200 visits. And because these visitors can either run with their preset account or without, we split them up in 100 guest and 100 registered checkout attempts.

TGuestCheckout = 100
TRegisteredCheckout = 100
TGuestOrder = 100
TRegisteredOrder = 100

This gives us 400 visits per hour that go into the checkout. We now assume a low cart to checkout conversion rate, about 20% for instance, and so we take 400 checkout visits * 5 and get 2,000 visits that involve cart usage. Since we already have 20% converted into checkouts, we have 2,000 minus 400 visits that use the cart.

TAdd2Cart = 1,600
TGuestCheckout = 100
TRegisteredCheckout = 100
TGuestOrder = 100
TRegisteredOrder = 100

We also know that many users do not continue after hitting the home page or any landing page. Let’s add some of these users now.

TSingleClickVisitor = 1,000
TAdd2Cart = 1,600
TGuestCheckout = 100
TRegisteredCheckout = 100
TGuestOrder = 100
TRegisteredOrder = 100

But wait, what are we missing? Well, we have not registered any new accounts yet. Didn’t we? We did, because the registered checkout creates accounts if required and reuses them several times. But to get a more substantial customer growth, we simply add 200 visits that run registrations:

TRegistration = 200
TSingleClickVisitor = 1,000
TAdd2Cart = 1,600
TGuestCheckout = 100
TRegisteredCheckout = 100
TGuestOrder = 100
TRegisteredOrder = 100

What is left to do? Well, we do not have any “I am just looking around”-visitors yet. We know that our total visit count is 10,000 and we already assigned 3,200 of these to cart, checkout, and registration, so we have 6,800 visits left we can now use for something else. Depending on the shop type (large store, small store etc), people tend to use search more or less. To put enough stress on search and refinements, we simply assume 50% of all people like to search. Thus the missing 6,800 visits will be 3,400 catalog browser visits and 3,400 visits with usage of search before browsing the search result.

The total mix is:

TBrowsing = 3,400
TSearch = 3,400
TRegistration = 200
TSingleClickVisitor = 1000
TAdd2Cart = 1,600
TGuestCheckout = 100
TRegisteredCheckout = 100
TGuestOrder = 100
TRegisteredOrder = 100

Wait… where are my concurrent users? This is simple: “concurrent users” is an inaccurate way of describing traffic, so we have not used that number yet. Why is that?

To get to the bottom of that, we simply check how long a visit takes. Depending on the shop, an average visit might take 2 to 4 minutes. Successfully shopping might take 15 minutes. If we expect about 10 page views per visit and a page view takes 1 second to load and 20 seconds to read it (already a really really high number for an average), a visit would take 10 * 1 second + 9 * 20 seconds = 190 seconds.

Let’s go with the 190 seconds for a visit on average. If we just could serve one visitor at a time, we could serve 60 minutes (3600 seconds) / 190 seconds per visits = 19 visitors per hour. But because we would like to serve 10,000 per hour, we have to deal with 10,000 / 19 = 526 visitors at the same time. This is the famous concurrent user number.

If we now double the think time, we have 1,052 concurrent users/visitors. If we cut it down to 1 second think time, we will get a visit length of 19 seconds and therefore 10,000 visits / (3600 seconds / 19) = 53 concurrent visitors.

So we already have three different “concurrent user” numbers and are still simulating the same traffic. This shows that the number of concurrent users is a pretty questionable way of describing traffic.

It does not matter which number we take, because most of the time the servers will see the same traffic. Because we run against a SaaS environment that serves a multiple of other customers at the same time and is sized to serve the peak traffic for all customers at the same time, we have plenty of comfortable room around us. This permits us to run with 53 concurrent visitors for most of the testing. This will save us client hardware resources for the load generation. e.g. saves us money. We are basically only interested in the runtime of requests and not if the environment can handle that, because it can.

The goal of this test is to demonstrate that the implementation on the SaaS platform is efficient, not that the SaaS platform itself is fast and stable, because this is guaranteed by design and contract. Testing this would require way more traffic and generate huge costs, because the environment would suddenly no longer be a shared one but exclusively used for this testing purpose.

When finalizing the entire test and all tests turned out good, we are going to turn up the concurrent user count to 530 users and compare the result with the previous measurements. Just to satisfy the traditional test expectations.

Does that work for you?

Hope that gives you an idea how to come up with a nice user mix for testing without having too much data in the first place. Comments welcome.

Linux, Testing, XLT

XLT 4.0.5 Amazon-EC2 AMIs available

2011-05-14 Rene

These are the AMI-IDs of the XLT 4.0.5 images for Amazon-EC2.

EU-West: ami-772b1d03
US-East: ami-52649b3b
US-West: ami-29bfec6c

Images can be used free of charge. The EU image is brand new and features Ubuntu 11.04. It has also a smaller disk of only 8GB compared to 15GB before. This helps to make it eligible for a free tier micro instance. Of course this instance type is not recommended for load testing, but you can easily test deployment and remote execution of XLT before you move up to more expensive setups.

Performance, Testing, XLT

Load Testing Web Applications – Do it on the DOM Level!

2011-04-07 Ronny

This article was first published in the June 2010 issue of the magazine Testing Experience.

HTTP-level tools record HTTP requests on the HTTP protocol level. They usually provide functionality for basic parsing of HTTP responses and building of new HTTP requests. Such tools may also offer parsing and extraction of HTML forms for easier access to form parameters, automatic replacement of session IDs by placeholders, automatic cookie handling, functions to parse and construct URLs, automatic image URL detection and image loading, and so on. Extraction of data and validation are often done with the help of string operations and regular expressions, operating on plain HTTP response data. Even if HTTP-level tools address many load testing needs, writing load test scripts using them can be difficult.

Challenges with HTTP-Level Scripting

Challenge 1: Continual Application Changes

Many of you probably know this situation: A load test needs to be prepared and executed during the final stage of software development. There is a certain time pressure, because the go-live date of the application is in the near future. Unfortunately, there are still some ongoing changes in the application, because software development and web page design are not completed yet.

Your only chance is to start scripting soon, but you find yourself struggling to keep up with application changes and adjusting the scripts. Some typical cases are described below.

The protocol changes for a subset of URLs, for example, from HTTP to HTTPS. This could happen because a server certificate becomes available and the registration and checkout process of a web shop, as well as the corresponding static image URLs, are switched to HTTPS.
The URL path changes due to renamed or added path components.
The URL query string changes by renaming, adding or removing URL parameters.
The host name changes for a subset of URLs. For example, additional host names may be introduced for a new group of servers that delivers static images or for the separation of content management URLs and application server URLs that deliver dynamically generated pages.
HTML form parameter names or values are changed or form parameters are added or removed.
Frames are introduced or the frame structure is changed.
JavaScript code is changed, which leads to new or different HTTP requests, to different AJAX calls, or to a new DOM (Document Object Model) structure.

In most of these cases, HTTP-level load test scripts need to be adjusted. There is even a high risk that testers do not notice certain application changes, and although the scripts do not report any errors, they do not behave like the real application. This may have side effects that are hard to track down.

Challenge 2: Dynamic Data

Even if the application under test is stable and does not undergo further changes, there can be serious scripting challenges due to dynamic form data. This means that form field names and values can change with each request. One motivation to use such mechanisms is to prevent the web browser from recalling filled-in form values when the same form is loaded again. Instead of “creditcard_number”, for example, the form field might have a generated name like “cc_number_9827387189479843”, where the numeric part is changed every time the page is requested. Modern web applications also use dynamic form fields for protection against cross-site scripting attacks or to carry security-related tokens.

Another problem can be data that is dynamically changing, because it is maintained and updated as part of the daily business. If, for example, the application under test is an online store that uses search-engine-friendly URLs containing catalog and product names, these URLs can change quite often. Even worse, sometimes the URLs contain search-friendly catalog and product names, while embedded HTML form fields use internal IDs, so that there is no longer an obvious relation between them.

Session IDs in URLs or in form fields may also need special handling in HTTP-level scripts. The use of placeholders for session IDs is well supported by most load test tools. However, special script code might be needed, if the application not only passes these IDs in an unmodified form, but also uses client-side operations on them or inserts them into other form values.

To handle the above-mentioned cases, HTTP-level scripts need manually coded, and thus unfortunately also error-prone, logic.

Challenge 3: Modeling Client-Side Activity

In modern web applications, JavaScript is often used to assemble URLs, to process data, or to trigger requests. The resulting requests may also be recorded by HTTP-level tools, but if their URLs or form data change dynamically, the logic that builds them needs to be reproduced in the test scripts.

Besides this, it can be necessary to model periodic AJAX calls, for example to automatically refresh the content of a ticker that shows the latest news or stock quotes. For a realistic load simulation, this also needs to be simulated by the load test scripts.

Challenge 4: Client-Side Web Browser Behavior

For correct and realistic load simulations, the load test tool needs to implement further web browser features. Here are a few examples:

Caching
CSS handling
HTTP redirect handling
Parallel and configurable image loading
Cookie handling

Many of these features are supported by load test tools, even if the tools act on the HTTP level, but not necessarily all of them are supported adequately. If, for example, the simulated think time between requests of a certain test case is varied, a low-level test script might always load the cacheable content in the same way – either it was recorded with an empty cache and the requests are fired, or the requests were not recorded and will never be issued.

DOM-Level Scripting

What is the difference between HTTP-level scripting tools and DOM-level scripting tools? The basic distinction between the levels is the degree to which the client application is simulated during the load test. This also affects the following characteristics:

Representation of data: DOM-level tools use a DOM tree instead of simple data structures.
Scripting API: The scripting API of DOM-level tools works on DOM elements instead of strings.
Amount and quality of recorded or hard-coded data: There is much less URL and form data stored with the scripts. Most of this data is handled dynamically.

DOM-level tools add another layer of functionality on top. Besides the handling of HTTP, these tools also parse the contained HTML and CSS responses to build a DOM tree from this information, similar to a real web browser. The higher-level API enables the script creator to access elements in the DOM tree using XPath expressions, or to perform actions or validations on certain DOM elements. Some tools even incorporate a JavaScript engine that is able to execute JavaScript code during the load test.

Advantages

DOM-level scripting has a number of advantages:

Load test scripts become much more stable against changes in the web application. Instead of storing hard-coded URLs or data, they operate dynamically on DOM elements like “the first URL below the element xyz” or “hit the button with id=xyz”. This is especially important as long as application development is still ongoing. As a consequence, you can start scripting earlier.
Scripting is easier and faster, in particular if advanced script functionality is desired.
Validation of result pages is also easier on the DOM level compared to low-level mechanisms like regular expressions. For example, checking a certain HTML structure or the content of an element, like “the third element in the list below the second H2” can be easily achieved by using an appropriate XPath to address the desired element.
Application changes like changed form parameter names normally do not break the scripts, if the form parameters are not touched by the script. But, if such a change does break the script because the script uses the parameter explicitly, the error is immediately visible since accessing the DOM tree element will fail. The same is true for almost all kinds of application changes described above. Results are more reliable, because there are fewer hidden differences between the scripts and the real application.
CSS is applied. Assume there is a CSS change such that a formerly visible UI element that can submit a URL becomes invisible now. A low-level script would not notice this change. It would still fire the old request and might also get a valid response from the server, in which case the mismatch between the script and the changed application could easily remain unnoticed. In contrast, a DOM-level script that tries to use this UI element would run into an error that is immediately visible to the tester.
If the tool supports it, JavaScript can be executed. This avoids complicated and error-prone re-modeling of JavaScript behavior in the test scripts. JavaScript support has become more and more important in recent years with the evolution of Web 2.0/AJAX applications.

Disadvantages

There is one disadvantage of DOM-level scripting. The additional functionality needs more CPU and main memory, for instance to create and handle the DOM tree. Resource usage increases even more if JavaScript support is activated.

Detailed numbers vary considerably with the specific application and structure of the load test scripts. Therefore, the following numbers should be treated with caution. Table 1 shows a rough comparison, derived from different load testing projects for large-scale web applications. The simulated client think times between a received response and the next request were relatively short. Otherwise, significantly more users might have been simulated per CPU.

Scripting Level	Virtual Users per CPU
HTTP Level	100..200
DOM Level	10..30
DOM Level + JavaScript execution	2..10

If you evaluate these numbers, please keep in mind that machines are becoming ever more powerful and that there are many flexible and easy-to-use on-demand cloud computing services today, so that resource usage should not prevent DOM-level scripting.

Conclusion

Avoid hard-coded or recorded URLs, parameter names and parameter values as far as possible. Handle everything dynamically. This is what we have learned. One good solution to achieve this is to do your scripting on the DOM level, not on the HTTP level. If working on the DOM level and/or JavaScript execution are not possible for some reason, you always have to make compromises and accept a number of disadvantages.

We have created and executed web-application load tests for many years now, in a considerable number of different projects. Since 2005, we have mainly used our own tool, Xceptance LoadTest (XLT), which is capable of working on different levels and supports fine-grained control over options like JavaScript execution. In our experience, the advantages of working on the DOM level, in many cases even with JavaScript execution enabled, generally by far outweigh the disadvantages. Working on the DOM level makes scripting much easier and faster, the scripts handle many of the dynamically changing data automatically, and the scripts become much more stable against typical changes in the web application.

Ronny Vogel is technical manager and co-founder of Xceptance. His main areas of expertise are test management, functional testing and load testing of web applications in the field of e-commerce and telecommunications. He holds a Masters degree (Dipl.-Inf.) in computer science from the Chemnitz University of Technology and has 16 years of experience in the field of software testing. Xceptance is a provider of consulting services and tools in the area of software quality assurance, with headquarters in Jena, Germany and a branch office in Cambridge, Massachusetts, USA.