Performance Engineering and Performance Testing

By Scott Moore for QA Consultants
February 2020

Performance Engineering and Performance Testing are related but are not the same thing. Testing is one discipline of engineering. The following illustration shows the breadth of identifying performance engineering:

As the illustration shows, there is a lot more to performance than just testing. The “Quality Assurance” segment would have application performance testing tasks associated with it to verify the application and/or infrastructure meets the performance requirements. However, there are many other disciplines involved in bringing continuous performance and optimization processes to an organization. In prior years, most companies were trying to test performance into a product after it had been developed and functionally verified, and the results were less than spectacular. As work culture and software methodologies began to emphasize smaller and faster development cycles, testing had to move into development to find and fix things as they occurred as part of this cycle. Performance testing earlier in the life cycle (or “shift left”) became a must to keep up with lean development practices.

Many organizations are moving towards Continuous Integration (the practice of merging all working code to a shared mainline several times a day) and Continuous Deployment (committing verified/tested code automatically into a production environment). This requires Continuous Testing, and this leads us to the idea of Continuous Performance. That may include continuous types of performance testing, continuous performance monitoring, or continuous optimization. It may include any or all of these.

To implement a continuous performance process, it requires a lifecycle approach. This means looking across the entire technology stack, the development lifecycle, and most importantly including the business in the scope.

Lifecycle Approach

Many development and testing organizations only include the first three circles in the illustration above. However, if you start with the business and the end-user experience it dramatically increases the chance of successful software deployment. This works for any area of development or testing, but for the purpose of this article, let’s focus on the performance aspect.

Use Case

QAC was recently engaged at a software company with a unique technology stack and components that were not included in their traditional performance testing (or monitoring as we would later find out). This included Hadoop, Apache Spark, Tableau, and Elastic Search among others. While they were producing sophisticated modern software, they still struggled with basic performance issues in production. They were doing basic performance testing but had no visibility into what negatively impacted the application between releases, and this was concerning. They also had very specific requirements that limited the toolset that could be used.

After doing an initial assessment by interviewing over a dozen people across the organization, it became clear that there were three issues that consistently surfaced:

There was a disconnect between QA and the business. Although they were testing the software, many of the QA team had never interfaced with business analysts or customers
Environments and Data: they needed an easier way to offer realistic testing environments quickly and it was difficult to create enough realistic test data
Performance testing was not part of the entire lifecycle or continuously run, nor was any monitoring taking place

All of this created a situation where the company spent a lot of time in triage situations and in war rooms with customers trying to fix performance problems and determine the root cause of outages. These were very costly to the company in terms of technical debt and reputational cost.

Start with The Business

The first task was to address the disconnect with the business. To gauge the performance of an application, just find out what the end-user experience is. In this case, what are the fully rendered page timings for key web requests? It really doesn’t matter how fast code executes, a certain amount of data is queried, or how much bandwidth the network as if the web page experience is still slow. Customers will still be unhappy. Fortunately, that’s easy to find out.

During our conversations, the questions we asked were:

Who are the users (roles)?
What are they doing?
How often are they doing it?
What is the current performance experience from the customer viewpoint?

Starting with open-ended questions usually opens the door for more questions and discussion, which leads to more discovery. We didn’t just get feedback from internal employees, but we also arranged an onsite customer visit and spoke to the end-users directly. Sitting with the people who use the application every day sharing their experience provided a lot of valuable information in the course of two hours. This exercise had not been a regular part of their culture. We discovered that much of the user workflows that were part of testing were “best guesses” and not actually based on how the customers/number of users actually used the software. This led to changes in the business processes to be automated that better represented a test scenario. Concurrent load levels, key timings, and other factors had to be adjusted but led to more accurate testing.

Environment and Data Issues

The second issue is an ongoing battle at many companies. There is nothing trivial about “spinning up an environment”. While Containers and Infrastructure-As-Code have made administering infrastructure somewhat easier, there is generally more complex configuration and other issues to consider. While environment automation can be achieved, it can take a while to get there, and it takes extremely skilled people in the DevOps teams to make it happen. Even if the environment becomes controllable, this particular client had special data requirements creating random, unstructured communications content (emails, text, etc.) which would require an AI generator to meet their needs. While the creation of this data is outside the scope of performance testing, it absolutely affects the testing results. Stale data, cached data, and improperly sized datastores can lead to false conclusions based on skewed test results. QAC offered a custom solution to deal with this, working with the customer to create an engine that would be able to meet the data demands in terms of both scale and complexity. The accuracy of the test environment and configuration being a central part of the software testing process cannot be overstated. It should be given proper attention. Doing it properly may take more time (and money) than initially budgeted for, but the results of doing this right payback in big dividends.

Continuous Testing Platform

To deal with the remaining issue, QAC first addressed the current performance testing and performance monitoring solutions and processes in place. We found that performance testing exercises were more “one-off” events, usually only happening for major releases. Only main page timings were gathered with no integrated infrastructure monitoring (i.e. CPU, memory, disk, and network). In order to get infrastructure metrics, data had to be manually correlated across multiple tools. They also had multiple monitoring solutions for production environments, none of which tied back to the true end-user experience. This left “blind spots”. QAC recommended reassessing their tools to find the least number of products that would give an end to end view of performance in both testing and production. In the test environment, they needed to support a CI/CD pipeline. Test results should contain a combined view of volume, end-user timings, infrastructure metrics, and application-specific metrics (i.e. JAVA JVM internals). They also needed to integrate this with any environment level monitoring (including production) to see what the impact on performance was under load for a specific environment.

QAC suggested several product combinations (both commercial and open-source) that would meet their licensing budget, supported their entire set of technologies being used and worked well together. The challenge ahead for this company is maturing to the place where performance testing in development is more of a self-service offering by the QA team, empowering all developers to be able to add performance test to their CI pipeline without needing to be focused on performance like a full-time performance engineer would. Unless the company is starting from scratch and builds in performance process day one, there is a growth term to get to continuous performance optimization. This is simply a matter of maturity.

Summary

Address performance starting with the business view first and work backwards.
Environments and test data management are KEY to making test results trustworthy.
Continuous performance requires a platform that allows for it. The first step is to ensure performance testing results are based in reality (the right business processes, volumes, etc.). Over a period of time, shift left to get closer to development and fully automating everything.

QA Consultants can help to address all the disciplines associated with responsiveness and stability, endurance testing, spike testing, and performance engineering across the entire software lifecycle. Regardless of the number of concurrent users and the expected load, this includes traditional performance stress testing, but also any area where you need to address the system performance of your application. Performance testing is a complex form of testing that has many layers. QA Consultants can help with performance testing services to address any issues.