Performance Is Always in the Details

·5 min read
engineeringperformancedataanalytics

There is a mistake that almost every engineer, analyst, and product person makes at least once when chasing a performance problem.

They look at the wrong level of detail.

Not the wrong metric. Not the wrong time window. The wrong level of granularity. They pull the summary view, the aggregated dashboard, the rolled-up stat, and they start optimising based on what they see there. And what they see there is real data. It just does not represent the problem they are trying to fix.

The summary lies to you

Not intentionally. The summary is just doing what it was built to do: compress many things into one number.

The problem is that compression destroys signal. When you average across users, you hide the outliers. When you roll up across pages, you hide the page that is actually slow. When you look at weekly latency instead of per-request latency, you smooth over the spike that is costing you conversions every Thursday afternoon.

The number looks fine. The number is fine, in aggregate. But aggregate is not where your users live. Your users live in the specific request, on the specific device, at the specific time. If that experience is broken, the summary will absorb it and show you green.

You will tune the wrong thing. You will ship the fix. The metric will not move. You will be confused.

This is not a failure of effort. It is a failure of resolution.

What happens when you stay at the surface

The optimisation you make based on high-level stats will almost always be real. You will find something. You will improve something. The number will tick up slightly, or not at all, and you will move on.

But the actual problem, the one that caused the investigation in the first place, will still be there. Because it was never visible at the level you were looking.

This pattern is surprisingly easy to fall into because the surface-level view is fast, familiar, and clean. It does not require you to dig. It does not ask you to sit with messy, high-cardinality data and figure out what it is telling you. It gives you a number and lets you act on it immediately.

That feeling of momentum is dangerous. You are moving, but not toward the thing you need to fix.

The answer is always more granular than you think

When a performance problem is real and persistent, the answer is always in the details. Not probably. Always.

That means query-level data, not table-level averages. It means per-user session breakdowns, not cohort summaries. It means individual trace spans, not service-wide p99 latency. It means the specific row, the specific request, the specific call that is behaving differently from everything around it.

The instinct to zoom out first is understandable. The high-level view gives you orientation. You want to know roughly where the problem is before you go hunting for it specifically. That is fine. Use the summary to point you toward the right area, then immediately go deeper.

Do not stay at the summary level longer than it takes to get your bearings. The moment you start forming hypotheses about what to fix, you need to be looking at granular data. If you are not, you are guessing, and the summary is just giving your guess the appearance of evidence.

Debugging is a zoom operation

The mental model that actually helps here is thinking about debugging as a zoom operation, not a search operation.

You are not looking for the problem in the dashboard. The dashboard cannot show you the problem. You are using the dashboard to zoom to the right part of the system, and then you keep zooming until you are looking at individual events, individual requests, individual rows.

At some point in that zoom, the problem becomes obvious. It always does. The slow query that only appears when a specific filter is applied. The endpoint that degrades under a specific payload size. The component that re-renders unnecessarily for one particular user flow. These things are invisible at every zoom level above them.

The temptation is to stop zooming too early, because the intermediate level looks interesting, looks actionable, looks like something you could fix. Resist that. Keep zooming until you see something that is clearly, specifically wrong, not just slightly off.

What this looks like in practice

Pull the aggregate stat. Note where it points. Then immediately ask: what is the most granular version of this data I can access?

If you are investigating slow page load, do not stay at the page-level average. Get the per-session breakdown. Then the per-user breakdown. Then the individual session trace. Find the sessions where load time is genuinely bad and read what happened in them specifically.

If you are investigating a drop in a conversion rate, do not stay at the funnel summary. Get the per-step breakdown by segment. Then by device. Then by entry point. Find the specific combination of attributes where the drop is concentrated.

You will almost always find that the problem is not spread evenly across the aggregate. It is concentrated. It is specific. It has a shape. And that shape is only visible when you are close enough to see it.

The uncomfortable part

Doing this properly takes longer than staying at the surface. The granular data is messier, harder to read, less immediately legible. It requires more context to interpret. It often raises more questions before it answers the one you started with.

That is the work. There is no shortcut around it.

The summary view will always offer you a faster answer. But fast answers to the wrong question are how performance investigations end with a shipped fix, no improvement, and nobody sure what happened.

Go deeper before you decide what to fix. The answer is down there.

Written by

Martin Dimoski

Senior R&D Executive & AI Systems Builder