Performance and Availability

In an earlier post I talked about metrics for reporting Web site performance (response time). Site availability is also an important metric. And the relationship between them is often misunderstood.

A typical Operations measure of availability is that the servers are up (i.e. pingable) or respond to a health check. A more user-centric, and more correct, measure is that the expected response (e.g. full page load) is received within a reasonable timeframe (e.g. before the user gives up and abandons the site). This timeframe is the availability threshold.

Now, for the sake of argument, let’s define the availability threshold at 30 seconds (I’m being generous – most studies put user abandonment at about 10-20 seconds). If a full page load takes longer than 30 seconds, it’s unavailable – no matter whether it took 31 seconds or never loaded at all.

So, when measuring or aggregating response time data for this site, should data points beyond that 30 second availability threshold be included?

No.

Response time and availability are non-intersecting data sets: a site either responds as expected within a certain timeframe, or it’s unavailable.

So, you must look at both response time and availability metrics to get to the complete picture of site performance.

Note the careful choice of words: site performance is a combination of response time and availability. The following diagram demonstrates this relationship visually.

Availability threshold impact

Note that as the availability threshold is lowered, the response time metric improves, while the availability metric degrades. Thus the importance of looking at both metrics to properly judge site performance.

This can be confusing, especially to the uninitiated. And makes it difficult to compare performance of multiple sites.

Wouldn’t it be handy to combine these metrics somehow into one metric that is normalized for easy comparison? This is where techniques like Apdex come in. But more on that in a future post.

Update:
Chris Loosley posted an article discussing Where Performance Meets Availability that provides additional insight. And one of his earlier posts on Acceptable Response Times is a great discussion of the de facto standards we often see referenced.