All posts by Rene

Performance Test Rating Criteria

TL;DR: Load and performance testing produces a vast amount of data. This data has to be interpreted and communicated. Because not every interested party speaks the same language, Xceptance developed a performance test rating and grading system. It evaluates response time, stability, and predictability and transforms three factors into a simple and communicable form. While doing that, it does not compromise on quality. It has been successfully used in more than 400 projects.

The Challenge

Load and performance testing is a key activity for making an online business successful. It validates that traffic and conversion expectations can be fulfilled. This of course applies to all kinds of Internet-based applications. Basically, as soon as there are expectations in terms of stability and performance, a test is mandatory to validate these. Expectations are usually set as requirements by different organizational groups such as sales, product management, engineering, and development teams.

Every group has a different understanding when it comes to results, goals, and success criteria. Some might be more concerned with the business impact, others are looking for technical implications of design decisions, and some just want to improve performance.

The group that is tasked with the evaluation of the requirements is faced with a very wide range of success definitions. In addition, it has to explain its technical measurements to all participating parties so that each party easily understands the state of testing.

Engineers rather look for detailed metrics including but not limited to the system behavior under test, while business-centric stakeholders just expect a clear yes or no. But performance testing typically does not deliver a clear result.

How can one reach all target groups without causing too much extra work to cater to all individual needs?

The Rating System

Xceptance developed a rating system that uses an American education system-like grading from A to F. The grades A to C symbolize a pass, while D and F are considered a fail. A grade B stands for an assumed average across similar customers and projects. It also stands for a good result. This leaves room in both directions to over or underperform.

Because performance results are not just shaped by response times, three factors are taken into account:

  • Response Times
  • Errors
  • Predictability

Before continuing a detailed discussion of the factors, this table explains each grade in one sentence. For the more ambitious customers, the A+ stretch goal was added.

Let’s talk about our three factors in detail now.

Response Times

To clarify one thing first, this is about server-side performance. It does not take the client-side rendering including the loading of any content or JavaScript execution into account. For this, Google Web Vitals provide an excellent scoring system. In addition, when you evaluate API performance, you might want to refine your performance expectations.

A response time typically defines the time from initiating a request to a server until the full response has been returned to the requestor. It is likely the most used number to describe performance.

Server response times are validated using percentiles. This number indicates what response time a certain amount of users experience at maximum. For the response time grading, the P95 value is used. That means that 95% of requests have been finished within that time or faster. 

One of the reasons for going with the P95 is that the user of provided infrastructure and software has often only limited influence on the behavior of the full stack. Hence there will be peaks and outliers which cannot be controlled nor diagnosed.

One can use the average, but it is a very imprecise number. It will either move too much or too little in regard to outliers. It just does not show what most of the users will truly see. An average user will never post something negative on social media, but the last 15% of the response time range will. So the rating cares more about the negative impression than the positive. This sadly ignores the last 5% and that is where projects become expensive. It is really hard to convince all stakeholders to use the P95 instead of the average, trying to establish the P99 or higher is nearly impossible. 

The response times are grouped by activities because the user usually is more impatient at the beginning of its user journey. This data is based on years of measurements and ensures that goals are achievable but also ambitious enough. They all might heavily vary on your platform and stack. So these are not set in stone and these examples might only work for commerce.

Errors

The fastest response times are of no help if the response is incorrect. So correctness is key to performance testing. One might just say, every response code larger than 400 is a failure, but it is not that easy. Hence this factor distinguishes between two different types of errors: technical and functional errors. 

Examples of functional errors are, not fully set up products, products that cannot be bought together, or an empty search result. If a response was not received or the response code indicates a problem, it is considered a technical error. Or in other words: If the user is properly notified about the problem, it is mostly a functional issue. If the state or messaging is undefined, it is mostly technical in origin. 

A performance test has to ensure that technical errors are clearly highlighted because they might be caused by performance testing due to volume or scaling. Additionally, a performance test has to ensure functional correctness, because if the order cannot be placed and this is measured without validating the result, the test is just void.

To assign a grade to the error factor, this is verified:

  • How many percent of the visits have been affected by an error?
  • Do the errors occur in patterns or clusters?
  • Are there any main features affected such as checkout or sign in? 

Since barely any test run is error free when reaching a certain complexity, a low error rate might still be acceptable, even for an A rating. 

The disadvantage of this approach is that errors become an accepted fact.

Predictability

In a perfect world, response times would be the same all day long. But that is impossible to achieve in a complex IT landscape, hence it is essential to minimize the noise. That is what this rating tries to capture. It introduces a predictability grade. Because the evaluation of noise is a complex process, Xceptance decided to retract to a metric anyone can grasp.

Predictability is defined by the business impact on the end user. It uses response times exceeding a certain value and the occurrence of technical errors (mainly non-recoverable errors such as response codes 500 or no response at all). This should give the merchant an idea of how many visits might have been affected by problems or slowness. Furthermore, being affected means that a visit carries the risk of a loss in revenue or reputation.

The threshold of 10 seconds is based on the user perception model published in the Google RAIL Model. This model applies to the total page loading time. The factor predictability uses the 10-second threshold for the runtime of a request. This makes it even softer than the value Google suggests, because the additional loading time of images, CSS, JavaScript is ignored.

Apart from the affected visits, this factor also considers response time patterns, increasing response time over time, and any sudden runtime changes.

To determine the grade these two questions are asked:

  • How many visits are affected (see formula above)?
  • Are there any patterns in response times such as increases, waves, or repeating spikes?

Summary

This rating system provides comparable and understandable results and caters to the needs of many stakeholders. Xceptance rolled it out to almost all load and performance tests and it proves to be effective and reliable. Some customers even arrive now with a goal of a certain rating to improve on the previous year’s results.

Xceptance has put that performance rating guide under the Creative Commons license – CC BY-SA 4.0. Feel free to use it for your everyday work, improve on it, and please let us know any feedback. We certainly appreciate it.

You can also use this slidedeck for communication and documentation purposes – https://training.xceptance.com/qa-and-test/050-performance-rating-system-devtest-leeds-2019.html.

Data Science Day 2022

After a two-year break, on 18 May 2022, the 4th Data Science Day Jena took place. The Friedrich-Schiller University is hosting this mini-conference format annually.

Xceptance presented this year a closer look at the data that is collected and processed during load and performance testing. René Schwietzke, Managing Director of Xceptance, talked about the challenges to capture the right data as well as translate the collected data into meaningful results. 

A load test simulates millions of user interactions with a website and therefore is capturing huge amounts of data points. These have to be transformed into a few numbers to make the result of the test easy to communicate but still preserve important details. The talk started with typical business requirements and expectations of the target groups of a load test. It showed the data XLT captures and the dimensions which later drive the data reduction. A few example data series demonstrated the challenges behind the data reduction and what numbers are finally used to satisfy the requirements.

An example load test result illustrated the talk with real data. That example test run created about 17,500 data rows per second which contain about 293,000 data points. The entire test result consists of 3.2 billion data points. This massive data set is turned into a consumable report by XLT in less than six minutes. 

For everyone with an interest in data science, this presentation also offers ideas for research in regard to unsolved data challenges. There might be even some Master’s and Bachelor’s theses topics waiting for you. 

You can find a recording of the presentation below (courtesy of Thüringer Universitäts- und Landesbibliothek Jena).

This is the accompanying slide deck. It is a Reveal.js-based. You can navigate with the spacebar and the arrow keys.

Picture of the presentation


iJUG-Magazin Java aktuell – Performance Tests of Microservices

Load and performance testing is not trivial. Load and performance testing microservices is even more challenging. Our article “Performance Tests of Microservices” in the iJUG-Magazin Java aktuell talks about approach and challenges when designing and executing performance test for microservices. So get yourself the Java Aktuell or, if you just want to check out the article, use this link to the PDF and enjoy. The article is in German.

Performancetests von Microservices

Eine Softwarearchitektur, die aus kleinen und unabhängigen Teilen zusammengesetzt ist, ist einfacher zu erstellen und zu warten. Microservices sind dadurch zum dominierenden Thema in der modernen Softwareentwicklung geworden. Während die Komplexität auf der Entwicklungsseite sinkt, steigt sie in Bezug auf die Themen Architektur, Performance und Zuverlässigkeit. Dieser Artikel diskutiert die Anforderungen an die Performance und die Planung von Performancetests von Microservices.

Comments are very welcome.

2020 – One Year, One Picture

One Picture Says It All

If we can only use one picture to symbolize 2020, this might just be it. Working from home and video conferences are now the new normal.

Happy Holidays, Merry Christmas, and a Happy and Healthy 2021, because we all need that.

iJUG-Magazin Java aktuell – High Performance Java

Java Aktuell Cover Issue 2/2020
Java aktuell 2/2020

After our presentation at the JUG Saxony Day in 2019, we have been asked to turn the talk into an article for the iJUG-Magazin Java aktuell. The February 2020 issue just got published and features this article. So get yourself the latest Java Aktuell or, if you just want to check out the article, use this link to the PDF and enjoy. The article is in German.

High Performance Java – Hinter den Kulissen von Java

Die Zeiten, in denen Java der langsame Bytecode-Interpreter war, sind lange vorbei. Die JVM nutzt viele Tricks, um Code effizient auszuführen, und transformiert und optimiert Java-Code auf die ausführende Hardware. Mit etwas Wissen über diese Prozesse kann man vermeiden, dass man gegen die JVM arbeitet, und gleichzeitig mehr Geschwindigkeit erreichen. Selbst wenn man nicht die letzte Mikrosekunde jagt, ist es interessant zu sehen, welche Techniken die JVM einsetzt, um die Ausführungsumgebung besser zu verstehen.

Comments are very welcome.

Xceptance LoadTest Goes Open Source

We have often been asked why XLT (Xceptance LoadTest) is free but not open source. Now, after considering our customers’ business needs and the advantages of using open source as opposed to free software, Xceptance has decided to open source XLT under the Apache License 2.0.

Within the next 30 days, we will restructure the XLT code base and move subprojects into independent open repositories. We will continue to expand and improve the documentation. All code will be published on GitHub.

Xceptance will continue to maintain and extend XLT as before. Releases will be built and published by Xceptance on a regular schedule. Libraries will be published to Maven Central as part of the open-sourcing process. The latest releases will be available from the XLT GitHub repository as well. Of course, the GitHub defect tracker will start to be your source for feature development and defect fixes.

If you are interested in training, projects, or support for XLT and load testing in general, we of course also offer those services, and will expand this offering further in upcoming months.

By open sourcing, we are opening a new chapter for XLT and we invite everyone to join us. More to come!

BOM – Byte Order Mark

Because we encountered another hidden encoding issue as part of test data, here are some information about BOM and why this might be interesting in general for everyone working with a computer beyond Excel and Word.

Before you educate yourself, here is the tool to own to see such a problem easily. Most of all editors hide that information and so you might scratch your head why some data is failing with strange error messages. Get xxd and you will see with other eyes:

The first marked bytes are the magic and now head over to the Wikipedia to read more about BOM: https://en.wikipedia.org/wiki/Byte_order_mark

Java Training Sessions

Today we are going to publish four of our Java training sessions so you can use the material and benefit from it.

Let’s get started with four direct links to extensive material that might help you to understand Java or code quality better or just help you to reflect on topics you already know.

  • The Java Memory Model: Why you have to know the JMM to understand Java and write stable, correct, and fast code.
  • Java Memory Management: Know more about the size of objects and how Java does garbage collection.
  • High Performance Java: All about the smart Java internals that turn your code into fast code and how you can leverage that knowledge.
  • High Quality Code: The anatomy of high quality code that supports longevity, cross-team usage, and correctness. This is not just about Java, this is about good code in general.

Show a little patience when loading the training, these are all large reveal.js based slide sets. Use the arrow keys or space to navigate. Because the slide sets are designed to be interactive sessions, in many cases, not the entire slide context is revealed at once but block by block.

We publish these training sessions because they are also based on openly shared material, it greatly helped us to advance and understand, as well of course advertise a little what Xceptance might be able to do for you.

We will release more of our material in the next weeks and month, so everybody can browse and learn. This won’t be limited to Java and also cover material about approaching load testing, how to come up with test cases, and more about the modern web and its quality and performance challenges. Of course there will be more Java material too. You can get a glimpse of it when you just follow this link and page through the slides: The Infinite Java Training. Please remember, not all material is complete yet.

If you like the material and you need an audio track aka a real presentation, please talk to us. If you see other training needs in the area of quality assurance, testing, and Java, please contact to us.

More to come.

Thuringia’s Open-Source Prize for XLT

Wolfgang Tiefensee, Thuringia’s Secretary of Commerce, in conjunction with the board of directors of the IT industry network ITNet Thuringia, awarded the first Thueringen Open-Source Prize to three companies, all of them software companies based in Jena: TRITUM, Xceptance and GraphDefined.

Open Source Prize Title Picture Second Place

It is an honor for Xceptance to be the second-place winner of this competition. This result clearly demonstrates that open source as a component of commercial products can be a clear competitive advantage. XLT incorporates a number of open-source projects, including Apache HttpClient, Jetty, HtmlUnit, JUnit, and the Apache Commons libraries. As part of developing XLT, Xceptance is involved in testing and providing feedback for these projects, thus giving back to the open-source community.

While XLT is itself not open source, Xceptance does provide the software free of charge and with virtually no usage restrictions, so for most applications there is no noticeable difference to open-source software.

Neodymium – An Open Source Framework for Web Testing

TL;DR: Neodymium is a Java-based test library for web automation that utilizes existing libraries (Selenide, WebDriver, Allure, JUnit, Maven) and concepts (localization, test multiplication, page objects) and adds missing components such as test data handling, starter templates, multi-device handling, and other small but useful everyday helpers.

Motivation

As a company focused on quality assurance and testing, Xceptance always needs test automation software, especially end-to-end automation software. Several years ago we built a Firefox add-on that was designed to create and run browser automation. The tool was primarily used by people who didn’t necessarily have a strong background in software development. Today, the landscape is a bit different: Mozilla cut the cord on the APIs we were using and standard programming languages have largely taken over test automation because they are more flexible and less proprietary. These changes convinced us it was time to implement an idea we had already hatched, namely our own Open Source test automation project: Neodymium. It is written in and utilizes the Java platform, it is MIT licensed, and of course you will find it on GitHub: https://github.com/Xceptance/neodymium-library

Basis

There are many libraries out there to aid web automation in Java, so developers are faced with the task of choosing ones they like and somehow making them work together. On top of that, there are tasks that require some custom code to work properly. We identified the overall tooling problem mostly as a hurdle in getting started and setting up a project. Finally, there are always things missing such as test data handling, concurrency, and common patterns which you don’t want to have to develop yourself. We chose JUnit, Selenide, WebDriver, Maven, and Allure for the base tooling.

Selenide provides an easy-to-use API to control Selenium WebDriver. Allure offers good mechanics to generate useful reports based on the assertions and actions you perform throughout your test cases. Maven is used to set up the build and execution environment for our framework and all the test projects. We decided to use JUnit as the test runner since it is the de facto standard in the Java world, but we enhanced the capabilities of JUnit to do even more. At its heart, Neodymium is a JUnit runner that wraps default JUnit behavior and adds significant useful functionality to it.

Multiple Browsers

You want to be able to run the same tests for different resolutions and/or browsers to simulate the browsers most common among your users. Additionally, you need to be able to implement small differences within your test execution to address variants such as responsive designs or progressive web apps. So we added a way to run web browsers with different configurations and retrieve the current device type and resolution from within the test.

Neodymium provides a Java annotation that can be added to your test case, in order to run different browser setups. Neodymium is very flexible in configuring browsers, allowing you to fully leverage the Chrome device emulation offerings.

Test Data

Another common task is the execution of a test case with different data sets, such as testing address forms with all the relevant variations. The basic idea is to have test data and data sets in structured files next to your code, preferably as JSON, XML, property style, or simply CSV. Hence, we introduced an easy-to-use API to access the current data set and retrieve basic types from it. Furthermore, you can configure specific scenarios running only a subset or even no data set at all by adding proper annotations. To complete the picture, Neodymium supports test data on a global and package-level scope.

Localization

Another recurring topic in modern software projects is localization. Most of the web sites that are in need of test automation also support several locales. We decided to provide an out-of-the-box solution.

Neodymium’s localization feature makes use of a central translation file written in YAML format. YAML helps to structure the translations. Additionally, we implemented a simple way to override specific translations for different locales. The localized text can be easily retrieved using Neodymium API methods that are globally available.

Development Support

As it is essential to understand what your test is doing, we added a feature that enables you to slow down the test execution and highlight elements that match the current selector. Since you can chain selectors using Selenide, any chain of elements is also represented by the highlighting. With this feature activated, a developer can track down the cause of test failures much more easily. In addition, we provide information on how to set up logging in your project should you need that. Finally, we decided to use the Page Object pattern to organize the website-related code to reduce the maintenance effort and increase reusability.   

Reporting

Allure is a widely used framework to generate reports. When using Neodymium with Selenide, your automation code also contributes report information. Your test classes and methods are listed as well as detailed Selenide automation commands. In case of errors, additional details such as screenshots and source code of the page in question are available. Neodymium also provides means to structure code blocks for reporting purposes.

Continuous Integration

Implementing principles of continuous integration will deliver more reliable software by increasing efficiency, and automation is nothing without a continuous integration environment.  Yet in almost every development cycle you will eventually end up needing varied settings due to differences in your setup, which can get complicated. Neodymium provides support for extra configuration files during development to override the standard production settings as needed. Furthermore, the framework supports overriding properties that change the configuration of your test execution by setting environment variables or simply passing Java arguments.

Because automation is supposed to run quickly, Neodymium provides support for parallel test execution and also demonstrates that setup as part of the sample test suite.

Documentation and Templates

Does Neodymium address some of your test automation challenges? Does it sound like a good entry point for your test automation?

Neodymium is hosted on GitHub (https://github.com/Xceptance/neodymium-library), where the accompanying project wiki (https://github.com/Xceptance/neodymium-library/wiki) provides extensive documentation to help you get started and answer your questions.

You might also want to take a look at the comprehensive example projects using Neodymium with Cucumber (https://github.com/Xceptance/neodymium-cucumber-example) or plain Java (https://github.com/Xceptance/neodymium-example). We’ve even provided a template project (https://github.com/Xceptance/neodymium-template) to get you started automating in no time.

License

Neodymium is licensed under the MIT License.

Who Are We

We are Xceptance. A software testing company with strong commerce knowledge and projects with customers from all around the world. Besides Neodymium, we have developed Xceptance Load Test (XLT), a load and performance test tool that is available free of charge and features an extensive range of awesome features to make the tester’s and developer’s life easier.

If you are looking for test automation that also covers the performance side of life, take a look at XLT. You can write and run load tests with real browsers including access to data from the Web Performance Timing API. In case browsers are too heavy, XLT has other modes of load testing to offer as well.

We offer professional support for Neodymium as well as implementation and training services.