Whenever I talk to people about performance testing, there’s always discussion and different interpretations around exactly what the different types of performance testing are and what they are supposed to achieve.
There are very few hard and fast rules that define the terms used around performance testing, more so than any other type of testing. Everyone’s interpretations seem to differ, and I think this is why performance testing is seen as a bit of a dark art by many.
A quick scan through the IEEE standards definitions confirms this. I can only find definitions for the most common terms used; Performance Testing and Stress Testing. According to the standards:
- Performance Testing is testing conducted to evaluate the compliance of a system or component with specified performance requirements.
- Stress Testing is testing conducted to evaluate a system or component at or beyond the limits of its specified requirements.
So officially there are only two types of performance test; one that tests performance at expected levels and one that tests past expected levels. This is quite correct of course, but there is quite a bit more to performance testing than simply testing at the required level and past it.
Having performance tested many systems over the years, there are certainly a number of other “performance test” types used regularly to evaluate system performance. Although these performance test sub-types don’t have any formal definition in the IEEE standards, I’ve put a list together below that covers most of the different test types and terms I’m aware of, how I would describe them and their major objective. Hopefully, this might help remove some of the mystery around the terms we use in the dark art of performance testing.
Like the IEEE standard says, a performance test evaluates a system’s compliance to its performance requirements. In reality, the term performance test is more of an umbrella definition which covers pretty much all of the other sub-types of performance test I’ve listed below.
A load test evaluates a system or component under predefined load levels, against their specified performance requirements. Generally this test aims to measure whether the system meets its performance targets (whether they be response times, resource consumption, etc.) for usual volumes of user and transaction volumes and rates. It answers the question whether the system will be able to cope and keep things going at the speed and rates required on a day-to-day basis. Sometimes the load levels will be increased to peak daily levels or beyond providing they are defined.
Peak Load Test
A peak load test evaluates a system or component under predefined peak (the maximum expected) load levels against their specified performance requirements.
Note this is NOT a Stress Test. Similar to a load test, a peak load test aims to measure whether the system meets its performance targets, but does so for expected peaks in business volumes. End of month and end of financial year are good examples of peak load tests, where the load is a lot higher than usual, but still falls within expected business levels.
Using the IEEE standard definition, the stress test expands on a controlled load (or peak load) test scenario, measuring a system’s performance against defined performance criteria, but doing so well past its intended design limit. I think the best way to achieve this is by a gradual increase in load — in a controlled way — until the weakest link in the chain is found.
This could be past a certain performance threshold, or even past this until a performance response increases exponentially. Effectively what the test does is push the system to exhaustion for one or more resources types and verifies the system’s behaviour when this happens.
Soak Test – Stability – Endurance
Soak testing, endurance testing or stability testing; the terms are largely interchangeable but refer to a test that is designed largely around running a controlled load, usually a little lower than business-as-usual, over a prolonged period. This tests for resource consumption and non-release that can only be identified over time. Gradual memory leaks are the classic example, where available memory might reduce little by little.
In the short term this will have no effect on the system, but over time could be a major resourcing issue. Log files tend to grow over time, and past a certain point may cause system issues if there is insufficient disk space, garbage clean up processing, etc. This small cumulative growth and non-release can all be issues that are only apparent over time.
So how long should a soak test run? As long as possible. In reality you should aim for anything over 12 hours. Ideally a soak test would run for 72 hours, so the system can demonstrate its ability to hold things together for a period longer than a weekend’s-worth of non-stop operation.
If the system continues to operate normally for 72 hours, it gets over the period it generally will need to be supported by out-of-hours on-call staff. This is generally the critical length of time a system needs to be able run “unattended,” as weekend support is far more costly to an organisation than business-hour support.
Scalability testing — sometimes referred to as capacity testing (which personally I think is incorrect, which I put in its own category later) — tests the system previously load tested or peak load tested at controlled multiples of the previous tests. So if you consider the load test to have tested 100% load on the system, then the scalability test would increase this to test at, say, 150%, 200%, 300% load, etc.
The increase in growth needs to be applied in a gradual and controlled manner, and in my opinion the more granular the better, so that any performance issues can be nailed down to the level of scaling applied. Something else to consider here is that increased load can come in a number of different ways.
For example, the system might behave differently with 200% of the user base performing 100% of the transaction volume to the way it might behave with a 200% growth in transaction volume using 100% of the user base. How you scale the load will depend on the system, what you need to scale to and the behaviour you are modelling.
Similar to the scalability test, a capacity test applies controlled load in systematic increases to the point of failure. The subtle but important difference between capacity and scalability is that the capacity test should have no upper boundary level and should apply load progressively until system capacity can be determined. In my opinion, capacity should be determined as a direct multiplication ratio of the load test rather than as an artificial scaling of one or other of the user and transaction load variables used to test scalability.
Spike Test – Point Load
Spike testing is the process of applying a rapid increase in load, to a system that is already under load, for a given period of time, and then reducing the load back to its original level. The aim of this test is to determine how the system behaves under unexpected heavy load conditions, and its ability to recover.
For example, the system is operating normally under a steady-state of 50% load, when load is quickly ramped up to 200% and remains at that level for a period before returning to 50% load. This simulates real-life situations where, for example, web site visits spike due to a radio commercial being aired, or a special offer being publicised. The system’s ability to normalise back to steady-state processing is just as important as its ability to meet the increased demand.
Benchmark Test – Performance Regression – Delta Comparison
Benchmark, performance regression or delta comparison testing all fall under the banner of “before and after” testing. Sometimes in the absence of adequate performance requirements, performance testing is conducted to determine any effect changes may have on system performance. The easiest way to achieve this is to do a before-and-after test and compare the results against each other. To do this effectively, the load applied, and the manner in which the load is applied, should be consistent to allow valid comparison.
Component tests are performance tests that target specific individual system components, or groups of components in the system and not the system as a whole. Often this requires test harnesses, stubs or customised environments to test the component(s) in isolation. The intention here is to run a variety of targeted testing — usually peak load, stress, scalability, capacity tests and/or performance regression testing on new or updated components before they are introduced into the system as a whole.
Smoke Test – Low Load
Smoke testing in functional testing refers to a executing subset of the full test suite designed to “smoke out” obvious major defects that would otherwise prevent execution of the full test suite. In performance testing, smoke testing refers to the process of running the system under a low level of load — usually lower than the load test would — to determine if there are any obvious performance issues before full load is applied during other performance test stages.
Similar to the smoke test, the shakedown test operates the system under a low level of load, but this time the focus is on the mechanisms used to apply load to the system — performance test scripts, scenarios, test data, pacing rates, etc. The intent is to determine whether the performance test tool, its processes and methods and the way they are being executed is valid and sufficient to exercise the system under the planned load levels.
Configuration Test – Performance Tuning
Depending on the results of other performance tests, different system and/or component configurations may need to be iteratively tuned and tested to determine optimal performance or reach the specified performance requirements. This might take place in a specific component test phase, or alternatively and more usually, in a specific configuration/performance tuning test. Often this will be a shortened, focused subset of the full performance test suite and customised specifically to the components or system configurations being investigated.
Failover Test – Recovery – Resilience
Failover testing is a form of performance test where the system is tested operating under load and specific system components are artificially removed or dynamically reconfigured to determine how the system responds and recovers. For example, a load-balanced system with multiple back end servers might have the load balancer reconfigured to direct all the load to one server only. Various infrastructure components can be swapped out and back in during this test to determine the effect on system performance and stability.
Volume testing can mean many things to many people depending on their roles and specialisation. To a Database Administrator a volume test might test the performance of a database once it has reached a critical size. To Network Engineers, volume testing might refer to the amount of data transported across the network. Developers might refer to volume testing of different file sizes.
In the case of performance testing, volume testing can mean any or all of the above and more. Volume testing refers to specific tests designed to reach a particular number of events or size. For example, a performance test could test verify the system’s ability to continue to operate at a particular performance level once a number of database updates have been exceeded. Another might test the system’s performance uploading/downloading file sizes larger than a certain size. In general though it refers to either specific numbers of concurrent user or transaction loads over a defined period. Exercise caution here — more often than not, volume testing can have very site-specific connotations.
The important thing to bear in mind is that as there are no common standards for these terms and a lot of this is open to interpretation.
That’s quite a list of sub-tests and terms that fall under the banner of Performance Testing. It’s probably incomplete and I’m sure not everyone will agree with my descriptions or examples. But those are the terms and test phases I’ll generally use when I’m on performance test engagements.
The important thing to bear in mind is that as there are no common standards for these terms and a lot of this is open to interpretation. What you might think of as one thing, someone else might think of as another.
The easiest way around all of this, of course, is to make sure the performance requirements are defined in the first place, agree how they will be tested and get this documented in a Performance Test Plan. If people agree on nothing else, let’s hope they agree on that at least.
Did I miss any? I’d love to hear your thoughts. Leave a comment below.