Many Web pages are comprised of content from a variety of sources. The base page (i.e. the HTML of the main page) may draw in content from one or more Content Delivery Network (CDN) providers, multiple ad providers, widget providers, partners, etc. When a page shows degraded performance, how do you quickly identify who is responsible?
Is this one issue or two? Where’s the problem? Who owns the resolution?
Typical troubleshooting involves examining various charts and metrics to try to identify the culprit(s). This can be very tedious and time consuming, and often requires specialized knowledge and experience.
How can we simplify the troubleshooting process? Make it available to a wider audience, and allow easier identification of problem ownership?
One step might be to break out the performance for each object on the page – essentially a time series for each unique URL of page content. But, a typical page contains more than 60 objects, and some pages more than 200. It’s just too much information to digest.
Adding a layer of abstraction, with boundaries aligned to typical areas of functionality or responsibility, would provide a very helpful first step. That is to say, define groups, or categories for all the domains on the page, and create a time series for each Domain Category.
For example, let’s say the domains comprising a page were mapped to the following categories:
- O&O (Owned and Operated)
- CDN (3rd-party Content Delivery Networks)
- 3rdParty (All other content, including ads, etc)
Looking at the performance by Domain Category, we can begin to see more information:
Specifically, we can see there were actually two separate performance issues, one was caused by 3rdParty content, and the other by O&O content.
The next step would be to drill down within each category (e.g. initially at the domain level, then at the object level, if needed) to get a better sense of the actual root cause. All this can be done manually via Pivot Tables in Excel, but automation is highly recommended.
While this provides the basic mechanics of the technique, there are some more details that require consideration. Note that the Domain Category chart above is labeled “Average Performance by Category”. The method of aggregating the performance results for all the objects in a category was simply arithmetic mean.
While it’s a simple approach, the mean can also hide important information. Another simple approach would be arithmetic sum. Again – simple but lacking.
Ideally, we’d like an aggregation method that computes the contribution each Category (or domain, or object) makes to the overall page load time. In other words, for a Domain Category graph like that above, the three lines could be summed to obtain the page-level load time (top graph).
Visually, using a stacked area graph, it might look like:
Note that this example adds another category to break out advertising domains, and is for a different time period than previous example graphs.
One way to accomplish this, and what we’ve done at AOL, is develop an algorithm for computing the load time contribution for objects in each category. This can be applied via post-processing of performance data, so is independent of the tool used to collect the data.
Whichever approach you choose, the additional information provided by performance visualization using domain categories can be a valuable troubleshooting tool.