Load / Stress Testing of Websites

1. The Importance of Scalability & Load Testing

Some very high profile websites have suffered from serious outages and/or performance issues due to the number of people hitting their website. E-commerce sites that spent heavily on advertising but not nearly enough on ensuring the quality or reliability of their service have ended up with poor web-site performance, system downtime and/or serious errors, with the predictable result that customers are being lost.

In the case of toysrus.com, its web site couldn't handle the approximately 1000 percent increase in traffic that their advertising campaign generated. Similarly, Encyclopaedia Britannica was unable to keep up with the amount of users during the immediate weeks following their promotion of free access to its online database. The truth is, these problems could probably have been prevented, had adequate load testing taken place.

When creating an eCommerce portal, companies will want to know whether their infrastructure can handle the predicted levels of traffic, to measure performance and verify stability.

These types of services include Scalability / Load / Stress testing, as well as Live Performance Monitoring.

Load testing tools can be used to test the system behaviour and performance under stressful conditions by emulating thousands of virtual users. These virtual users stress the application even harder than real users would, while monitoring the behaviour and response times of the different components. This enables companies to minimise test cycles and optimise performance, hence accelerating deployment, while providing a level of confidence in the system.

Once launched, the site can be regularly checked using Live Performance Monitoring tools to monitor site performance in real time, in order to detect and report any performance problems - before users can experience them.

2. Preparing for a Load Test

The first step in designing a Web site load test is to measure as accurately as possible the current load levels.

Measuring Current Load Levels

The best way to capture the nature of Web site load is to identify and track, [e.g. using a log analyzer] a set of key user session variables that are applicable and relevant to your Web site traffic.

Some of the variables that could be tracked include:

the length of the session (measured in pages)
the duration of the session (measured in minutes and seconds)
the type of pages that were visited during the session (e.g., home page, product information page, credit card information page etc.)
the typical/most popular ‘flow’ or path through the website
the % of ‘browse’ vs. ‘purchase’ sessions
the % type of users (new user vs. returning registered user)

Measure how many people visit the site per week/month or day. Then break down these current traffic patterns into one-hour time slices, and identify the peak-hours (i.e. if you get lots of traffic during lunch time etc.), and the numbers of users during those peak hours. This information can then be used to estimate the number of concurrent users on your site.

Concurrent Users

Although your site may be handling x number of users per day, only a small percentage of these users would be hitting your site at the same time. For example, if you have 3000 unique users hitting your site on one day, all 3000 are not going to be using the site between 11.01 and 11.05 am.

So, once you have identified your peak hour, divide this hour into 5 or 10 minute slices [you should use your own judgement here, based on the length of the average user session] to get the number of concurrent users for that time slice.

Estimating Target Load Levels

Once you have identified the current load levels, the next step is to understand as accurately and as objectively as possible the nature of the load that must be generated during the testing.

Using the current usage figures, estimate how many people will visit the site per week/month or day. Then divide that number to attain realistic peak-hour scenarios.

It is important to understand the volume patterns, and to determine what load levels your web site might be subjected to (and must therefore be tested for).

There are four key variables that must be understood in order to estimate target load levels:

how the overall amount of traffic to your Web site is expected to grow
the peak load level which might occur within the overall traffic
how quickly the number of users might ramp up to that peak load level
how long that peak load level is expected to last

Once you have an estimate of overall traffic growth, you’ll need to estimate the peak level you might expect within that overall volume.

Estimating Test Duration

The duration of the peak is also very important-a Web site that may deal very well with a peak level for five or ten minutes may crumble if that same load level is sustained longer than that. You should use the length of the average user session as a base for determining the load test duration.

Ramp-up Rate

As mentioned earlier, Although your site may be handling x number of users per day, only a small percentage of these users would be hitting your site at the same time.

Therefore, when preparing your load test scenario, you should take into account the fact that users will hit the website at different times, and that during your peak hour the number of concurrent users will likely gradually build up to reach the peak number of users, before tailing off as the peak hour comes to a close.

The rate at which the number of users build up, the "Ramp-up Rate" should be factored into the load test scenarios (i.e. you should not just jump to the maximum value, but increase in a series of steps).

Scenario Identification

The information gathered during the analysis of the current traffic is used to create the scenarios that are to be used to load test the web site.

The identified scenarios aim to accurately emulate the behavior of real users navigating through the Web site.

for example, a seven-page session that results in a purchase is going to create more load on the Web site than a seven-page session that involves only browsing. A browsing session might only involve the serving of static pages, while a purchase session will involve a number of elements, including the inventory database, the customer database, a credit card transaction with verification going through a third-party system, and a notification email. A single purchase session might put as much load on some of the system’s resources as twenty browsing sessions.

Similar reasoning may apply to purchases from new vs. returning users. A new user purchase might involve a significant amount of account setup and verification —something existing users may not require. The database load created by a single new user purchase may equal that of five purchases by existing users, so you should differentiate the two types of purchases.

Script Preparation

Next, program your load test tool to run each scenario with the number of types of users concurrently playing back to give you a the load scenario.

The key elements of a load test design are:

test objective
pass/fail criteria
script description
scenario description

Load Test Objective

The objective of this load test is to determine if the Web site, as currently configured, will be able to handle the X number of sessions/hr peak load level anticipated. If the system fails to scale as anticipated, the results will be analyzed to identify the bottlenecks.

Pass/Fail Criteria

The load test will be considered a success if the Web site will handle the target load of X number of sessions/hr while maintaining the pre-defined average page response times (if applicable). The page response time will be measured and will represent the elapsed time between a page request and the time the last byte is received.

Since in most cases the user sessions follow just a few navigation patterns, you will not need hundreds of individual scripts to achieve realism—if you choose carefully, a dozen scripts will take care of most Web sites.

Script Execution

Scripts should be combined to describe a load testing scenario. A basic scenario includes the scripts that will be executed, the percentages in which those scripts will be executed, and a description of how the load will be ramped up.

By emulating multiple business processes, the load testing can generate a load equivalent to X numbers of virtual users on a Web application. During these load tests, real-time performance monitors are used to measure the response times for each transaction and check that the correct content is being delivered to users. In this way, they can determine how well the site is handling the load and identify any bottlenecks.

The execution of the scripts opens X number of HTTP sessions (each simulating a user) with the target Web site and replays the scripts over and over again. Every few minutes it adds X more simulated users and continues to do so until the web site fails to meet a specific performance goal.

System Performance Monitoring

It is vital during the execution phase to monitor all aspects of the website. This includes measuring and monitoring the CPU usage and performance aspects of the various components of the website – i.e. not just the webserver, but the database and other parts aswell (such as firewalls, load balancing tools etc.)

For example, one etailer, whose site fell over (apparently due to a high load), when analysing the performance bottlenecks on their site discovered that the webserver had in fact only been operating at 50% of capacity. Further investigation revealed that the credit card authorisation engine was the cause of failure – it was not responding quick enough for the website, which then fellover when it was waiting for too many responses from the authorisation engine. They resolved this issue by changing the authorisation engine, and amending the website coding so that if there were any issues with authorisation responses in future, the site would not crash.

Similarly, another ecommerce site found that the performance issues that they were experiencing were due to database performance issues – while the webserver CPU usage was only at 25%, the backend db server CPU usage was 86%. Their solution was to upgrade the db server.

Therefore, it is necessary to use (install if necessary) performance monitoring tools to check each aspect of the website architecture during the execution phase.

Suggested Execution Strategy:

Start with a test at 50% of the expected virtual user capacity for 15 minutes and a medium ramp rate. The different members of the team [testers will also need to be monitoring the CPU usage during the testing] should be able to check whether your website is handling the load efficiently or some resources are already showing high utilization.

After making any system adjustments, run the test again or proceed to 75% of expected load. Continue with the testing and proceed to 100%; then up to 150% of the expected load, while monitoring and making the necessary adjustments to your system as you go along.

Results Analysis

Often the first indication that something is wrong is the end user response times start to climb. Knowing which pages are failing will help you narrow down where the problem is.

Whichever load test tool you use, it will need to produce reports that will highlight the following:

§ Page response time by load level

§ Completed and abandoned session by load level

§ Page views and page hits by load level

§ HTTP and network errors by load level

§ Concurrent user by minute

§ Missing links report, if applicable

§ Full detailed report which includes response time by page and by transaction, lost sales opportunities, analysis and recommendations

Important Considerations

When testing websites, it is critically important to test from outside the firewall. In addition, web-based load testing services, based outside the firewall, can identify bottlenecks that are only found by testing in this manner.

Web-based stress testing of web sites are therefore more accurate when it comes to measuring a site's capacity constraints.

Web traffic is rarely uniformly distributed, and most Web sites exhibit very noticeable peaks in their volume patterns. Typically, there are a few points in time (one or two days out of the week, or a couple of hours each day) when the traffic to the Web site is highest.