Category Archives: Software Development

The Delicate Art of Load Test Scripting

TL;DR: We are often asked why we need that much time to recheck load test scripts. So, here is our explanation in ten sentences or less.

“Why is the script broken? We haven’t changed anything?” A load test script can be broken in two ways. It breaks clearly, hitting you with an exception or assertion and giving you explicit errors. It’s just incorrect when it appears to pass but is subtly flawed, leading to misleading results.

“But we haven’t changed anything in the UI!” Load test scripts don’t magically adjust. You can’t assume load testing is UI automation on steroids because UI tests are resource-heavy due to our modern browsers. That makes the tests inefficient and costly for simulating many users, whereas load tests use lightweight, lower-level simulations designed for true scale.

“So, where did this change occur?” Scripts break due to changes in HTML/CSS, JSON, required data, or application flow, while they become incorrect because of optional data changes, wrong requests/order, or outdated data.

The secret to lower script maintenance lies in communicating changes with your performance testers early and in validating all data. The less vague an API is, the easier it is to keep the scripts up to date. This way, you’ll ensure your load tests are a true reflection of reality, not just a green checkmark!

Introduction

It’s a question that echoes through many development teams: ‘Why do we need to touch the load test scripts? We haven’t changed a thing on the UI!’ This common query stems from a perfectly logical assumption – if the user interface looks the same, surely everything behind it is too, right? Unfortunately, in the complex world of modern applications, what you see isn’t always what you get, and a static UI can hide a whirlwind of activity that directly impacts your performance tests.

Let’s talk first about what can really mess up your load testing scripts. It’s super important to know the difference between a script that’s completely broken and one that’s just plain incorrect.

  • Broken scripts: These are the ones that just won’t run through at all. If it’s a JUnit test, for example, you’ll see a big fat failure message. It’s dead in the water.
  • Incorrect scripts: These are trickier. They’ll actually run all the way through and show a “green” result, but they haven’t done what they were supposed to. They’re silently misleading you, and they are the main reason why load test script validation isn’t just run and tell based on outcome.

Test Automation vs. Performance Tests: Not the Same Thing!

Because people often mix these up and assume that a load test script is as easily maintainable as a test automation script, here are the key differences.

  • Test automation is usually about checking if something works. Think of it as replacing manual testing or extending its reach so you can get feedback faster. These tests are often UI-based and interact with a real browser or an app.
  • Performance/Load Tests, on the other hand, are about simulating how users behave, but at a much lower level than the UI. This lets you directly control things like the type of calls, data, and even filtering. You also get rid of a direct dependency on a real browser.

Now, you might think, “Why not just scale up my UI automation tests?” Good question! But here’s why that typically doesn’t work.

Modern browsers are hardware hogs. They need multiple CPUs, 512 MB or more memory, and often a GPU just to run smoothly. Trying to run huge tests with actual browsers chews up too many resources, making it expensive and unreliable to test higher traffic.

Illustration comparing resource-intensive web browsers to efficient load testing simulations.
Illustration: Browsers at Scale vs Load Test Simulation

Plus, running a UI test at scale would be a massive waste. You’d be rendering the same UI a million times over without learning anything new about performance. Browsers are also a pain to control remotely, especially when you need to filter or tweak requests to hit (or avoid) certain resources. That filtering can lead to all sorts of problems because of JavaScript dependencies or rendering quirks when third-party calls are skipped during a load test. And frankly, browsers are terrible at telling you when they’re truly ‘ready’ because so much is happening asynchronously. Modern websites are almost never silent, always doing something in the background, making that “ready” state even harder to pin down.

So, to sum it up: performance test scripts are not test automation scripts. The latter are typically UI-based, work at a higher level, and aren’t as sensitive to the nitty-gritty, low-level changes like more or fewer requests, or little parameter tweaks. Performance scripts live in that nitty-gritty world.

What Makes a Script Break?

These are the things that will make your script crash and burn:

  • HTML Changes: Your CSS selectors or XPath expressions suddenly don’t work because the HTML changed.
  • Required Data Changes: The data you have to submit has changed, and your script isn’t sending the correct stuff.
  • Flow Changes: The application’s flow changed (like a checkout process becoming shorter or longer), and your script can’t follow it any longer.
  • Invalid Test Data: You’re using test data that’s just plain wrong, and the system can’t handle it at all.

What Makes a Script Incorrect?

These are the sneaky ones that run but give you bad intel:

  • Optional Data Changes: The data you can submit has changed. The script runs, but it’s not truly reflecting real user behavior.
  • Extra/Missing Requests: Your script is sending requests it shouldn’t, or missing requests it should be sending.
  • Wrong Order of Calls: The calls are happening, but not in the sequence they would in a real user journey.
  • Invalid Test Data: The data is technically valid, but no longer represents the state of the system. Because there is not enough validation in the system under test, things don’t break but rather go unnoticed.

Of course, these are all just examples. While a JSON format change might break scripts for you, it might just lead to incorrect testing for someone else.

How to Keep Your Scripts from Going Haywire

To avoid this constant headache between your scripts and the actual application, you need to bake script maintenance right into your development process. The biggest help here is communication. If performance testers know about feature changes, they can figure out what needs to be done. This means less scrambling to review every script all the time.

Here’s how to tackle those changes:

  • Talk About Changes: Especially when the front-end will see new or removed requests, logic updates, or changes to the data being collected or sent. Keep performance testers in the loop!
  • Disable Old Functionality: When something’s removed from the front-end, make sure to disable it to make it impossible to work. This will make your scripts fail if they try to hit old endpoints, which is a good thing – it forces you to update the scripts.
  • Verify All Required Data: Always verify all the data needed for an action; don’t just leave things optional. This doesn’t just help your performance tests; it also boosts functional quality and security.

Or in English: Every change of the UI for already performance test scripted functionality should yield to a breaking script state. This requires the performance tester to validate things carefully and the application engineer to apply boundaries to the application that communicate clearly what is needed or not desired.

And yes, it is still possible that you cannot set these clear boundaries because the application itself might not know about certain states, or the APIs are not yours so you depend on their overflexibility.

Conclusion

Load testing scripts are different from normal automation, with unique requirements for creation and maintenance. Recognizing the fundamental differences between “broken” and “incorrect” scripts, and between performance testing and UI automation, is vital for achieving accurate and reliable performance insights. By integrating script maintenance into the development process through proactive communication and robust practices, teams can ensure their performance tests remain effective, reflecting the true state of their application under load.

So, there you have it. Load testing scripts are a different beast, and understanding their quirks is key to getting good performance insights. Keep these points in mind, and you’ll be much better equipped to handle them.

P.S. There is an option to run sensible but still resource-intensive load tests with XLT: It supports load testing with real browsers at scale. This is perfect for a blend of test automation and load testing. Of course, you likely are not going for hundreds of users, but rather a small set for a regression and sanity check. First line of defense, so to speak.

P.S. Depending on the framework and concepts you use, going headless with your application probably means a significant increase in scripting effort.

Performance Test Rating Criteria

TL;DR: Load and performance testing produces a vast amount of data. This data has to be interpreted and communicated. Because not every interested party speaks the same language, Xceptance developed a performance test rating and grading system. It evaluates response time, stability, and predictability and transforms three factors into a simple and communicable form. While doing that, it does not compromise on quality. It has been successfully used in more than 400 projects.

The Challenge

Load and performance testing is a key activity for making an online business successful. It validates that traffic and conversion expectations can be fulfilled. This of course applies to all kinds of Internet-based applications. Basically, as soon as there are expectations in terms of stability and performance, a test is mandatory to validate these. Expectations are usually set as requirements by different organizational groups such as sales, product management, engineering, and development teams.

Every group has a different understanding when it comes to results, goals, and success criteria. Some might be more concerned with the business impact, others are looking for technical implications of design decisions, and some just want to improve performance.

The group that is tasked with the evaluation of the requirements is faced with a very wide range of success definitions. In addition, it has to explain its technical measurements to all participating parties so that each party easily understands the state of testing.

Engineers rather look for detailed metrics including but not limited to the system behavior under test, while business-centric stakeholders just expect a clear yes or no. But performance testing typically does not deliver a clear result.

How can one reach all target groups without causing too much extra work to cater to all individual needs?

The Rating System

Xceptance developed a rating system that uses an American education system-like grading from A to F. The grades A to C symbolize a pass, while D and F are considered a fail. A grade B stands for an assumed average across similar customers and projects. It also stands for a good result. This leaves room in both directions to over or underperform.

Because performance results are not just shaped by response times, three factors are taken into account:

  • Response Times
  • Errors
  • Predictability

Before continuing a detailed discussion of the factors, this table explains each grade in one sentence. For the more ambitious customers, the A+ stretch goal was added.

Let’s talk about our three factors in detail now.

Continue reading Performance Test Rating Criteria

Java Training Sessions

Today we are going to publish four of our Java training sessions so you can use the material and benefit from it.

Let’s get started with four direct links to extensive material that might help you to understand Java or code quality better or just help you to reflect on topics you already know.

  • The Java Memory Model: Why you have to know the JMM to understand Java and write stable, correct, and fast code.
  • Java Memory Management: Know more about the size of objects and how Java does garbage collection.
  • High Performance Java: All about the smart Java internals that turn your code into fast code and how you can leverage that knowledge.
  • High Quality Code: The anatomy of high quality code that supports longevity, cross-team usage, and correctness. This is not just about Java, this is about good code in general.

Show a little patience when loading the training, these are all large reveal.js based slide sets. Use the arrow keys or space to navigate. Because the slide sets are designed to be interactive sessions, in many cases, not the entire slide context is revealed at once but block by block.

We publish these training sessions because they are also based on openly shared material, it greatly helped us to advance and understand, as well of course advertise a little what Xceptance might be able to do for you.

We will release more of our material in the next weeks and month, so everybody can browse and learn. This won’t be limited to Java and also cover material about approaching load testing, how to come up with test cases, and more about the modern web and its quality and performance challenges. Of course there will be more Java material too. You can get a glimpse of it when you just follow this link and page through the slides: The Infinite Java Training. Please remember, not all material is complete yet.

If you like the material and you need an audio track aka a real presentation, please talk to us. If you see other training needs in the area of quality assurance, testing, and Java, please contact to us.

More to come.

Thuringia’s Open-Source Prize for XLT

Wolfgang Tiefensee, Thuringia’s Secretary of Commerce, in conjunction with the board of directors of the IT industry network ITNet Thuringia, awarded the first Thueringen Open-Source Prize to three companies, all of them software companies based in Jena: TRITUM, Xceptance and GraphDefined.

Open Source Prize Title Picture Second Place

It is an honor for Xceptance to be the second-place winner of this competition. This result clearly demonstrates that open source as a component of commercial products can be a clear competitive advantage. XLT incorporates a number of open-source projects, including Apache HttpClient, Jetty, HtmlUnit, JUnit, and the Apache Commons libraries. As part of developing XLT, Xceptance is involved in testing and providing feedback for these projects, thus giving back to the open-source community.

While XLT is itself not open source, Xceptance does provide the software free of charge and with virtually no usage restrictions, so for most applications there is no noticeable difference to open-source software.

Use XLT with Sauce Labs and BrowserStack

Sauce Labs and BrowserStack – What Are They and Why Use Them?

This approach still work fine, but we came up with a much better one. Head over to GitHub and see our Multi-Browser-TestSuite for XLT. It will make multi browser testing a breeze. By the way, all the code is licensed under the MIT license, so absolute flexibility for you.

Sauce Labs and BrowserStack allow you to run automated test cases on different browsers and operating systems. Both provide more than 200 mobile and desktop browsers on different operating systems. The benefit? You can focus on coding instead of having to maintain different devices. You can easily run your test cases written on iOS on an Internet Explorer without actually buying a Windows device; and last not least, you don’t need to worry about drivers or maintenance.

By the way, Internet Explorer even seems to run faster at Sauce Labs than on a desktop machine. Also note that Sauce Labs supports Maven builds.
Continue reading Use XLT with Sauce Labs and BrowserStack

Tutorial: Git – The Incomplete Introduction

Software Testing is part of software development. So you need a form of revision control for your source aka test code, and documents. You also need it to be able to review code, compare the history of code… or maybe simply to help others to master it.

We recently started our migration from Subversion to Git. Not because we have been unsatisfied with SVN, mostly because we want to use what our customers use. Additionally we want to profit from the different functionality Git offers, such as local commits and cheap branching.

But Git is different and just changing the tool does not change anything, it might even turns things worse. Because you cannot run Git like SVN. Well, you can, but that still requires you to know the basics of Git to understand what it will do to your work and how a typical workflow looks like. The commands are different too.

So we created this tutorial to get used to Git, understand, and learn it.
Continue reading Tutorial: Git – The Incomplete Introduction

HPQC and XLT – Integration Example

You have to work with HP Quality Center (HPQC), but you don’t want to execute all the test cases manually. You automated some tests using XLT Script Developer and like the outcome. You want to use the Script developer much more but you face one last problem: You still have to enter the test results manually into HPQC. This renders some of the test automation advantages useless.

The following example can mitigate that problem. HPQC offers an API called Quality Center Open Test Architecture API (OTA API).
Using this interface, you can set test results automatically.
Continue reading HPQC and XLT – Integration Example

Spurious wakeup – the rare event

After hunting for quite some time for a strange application behavior, I finally found the reason.

The Problem

The Java application was behaving strangely in 4 out of 10 runs. It did not process all data available and assumed that the data input already ended. The application features several producer-consumer patterns, where one thread offers preprocessed data to the next one, passing it into a buffer where the next thread reads it from.

The consumer or producer fall into a wait state in case no data is available or the buffer is full. In case of a state change, the active threads notifies all waiting threads about the new data or the fact that all data is consumed.

On 2-core and 8-core machines, the application was running fine but when we moved it to 24-cores, it suddenly started to act in an unpredictable manner.
Continue reading Spurious wakeup – the rare event