Performance in the Trenches

Because I think it’s particularly cool, I wanted to show off something I built at work last week.

As apps grow in complexity it gets harder to reason about their behavior, and therefore harder to reason about their performance. The app at work has grown to have 5 different data layers, from storage to API wire format. The abstractions which provide for architectural niceties like “I don’t have to know how the data gets to me, I just have to know how to process it” lead to code that is far less linear and far trickier to intuit.

But all that great architecture and abstraction is worthless if the app is too slow to use.

I’m doing most of my work in Java these days, where Netflix’s flame graphs are something of a gold standard for performance analysis and exploration. You end up with pretty stack traces that give you visual indicators of where your time is spent in code:

Flame Graph example

This is a very simple example, and it’s something of a degenerate case: it only shows the Java stack, not the rest of the OS. But let’s be honest: for most non-horrific performance analyses, a single stack is all you need.

The problem with the Netflix approach is that you need to jump a bunch of hurdles to get the data for the flame graph. You need a special Java agent, some process and script juggling, and you need a production-like system on a Linux VM. Again, that’s all fine for production debugging, but it’s really overkill for the first pass of the 80/20 rule.

I wanted something simpler that I could do entirely within code. It didn’t need to be amazing, it just needed to be better than nothing and it needed to be simple. I ended up with something I call Banky:

Banky Trace example

Banky is the combination of a Chrome DevTools extension and a Java library intended for use in Spring stacks. Oh, and instead of “flame graph” I’ve inverted it into a “trench graph”. Functionally you get to pretty much the same thing, but with a toolset geared toward entry-level developers instead of systems engineers.

In the UI

Flame/Trench graphs are silly easy to read once you’re used to them. Each bar is a code point, typically a method on a class in the Java world. In the example above, the orange R code point maps to the generate method in the GmsJsonLDResponseGenerator class. The colors in Banky aren’t calculated the same way as in flame graphs, but instead are use strictly for visual identification and grouping.

Each bar has a color and an identifier, orange R above, as well as an execution time in milliseconds. This time is the total for all calls at this point—that is, from this particular parent. Since the same code may be called from a number of places, the time is not shared for calls from other places. Time is “wall clock” time: it’s the end time minus the start time, so children further down overlap with the parent’s time.

The last piece is optional and is the number of calls at this point. If it was called once we omit the number.

On the right side where the code points are listed there’s one extra piece: an optional line number in the class file. I’ll talk later about why this isn’t always available.

For devs looking into performance, you can check for a few things:

  • Wide, child-less areas indicate slow methods or calls out to third-party code.

  • Deep spikes represent heavily-nested code, which may be an opportunity to add in shortcuts or optimizations.

  • Large insets from parent to child indicate a parent with a large overhead.

Obviously, other patterns may be uncovered—these are just a few of the easy ones.

Across the Wire

Banky cheats a bit to get the data for this cart to the browser: it sends it as metadata in the HTTP headers:

Banky-Trace: url=/api/trace/eb4d0ad7-c498-4b8c-ae4b-c8ba15bf2736
Banky-Trace: base64lines=62; base64bytes=4664; jsoncrc32=1437723126; jsonbytes=3498
Banky-Trace: 0=eyJ2ZXJzaW9uIjoxLCJzdGFja0NvbHVtbnMiOlsicG9pbnQiLCJoaXRzIiwibXMiLCJjaGlsZHJl
Banky-Trace: 1=biJdLCJzdGFjayI6W1swLDEsNTMsW1sxLDEsMF0sWzIsMSwwXSxbMyw5LDM1LFtbNCw0LDM0LFtb
Banky-Trace: 2=MywxMiw5LFtbNCw0LDksW1szLDUsMl1dXV1dXV1dXV1dLFs1LDEsMTA1OSxbWzYsMSwxMDU4LFtb
Banky-Trace: 3=NywxLDBdLFs4LDEsNDQyLFtbMTMsMSw1MCxbWzE0LDIwLDQ5LFtbMTksODAsMl0sWzE1LDIsMTIs

In this example, you see a few things:

  • For cases where there’s too much data to send in headers there is a url where the data can be retrieved separately.

  • There’s some sanity-checking metadata, such as base64lines and jsonbytes

  • A number of header lines are Base64-encoded data.

As you might guess, it’s JSON that’s been encoded:

{
  "version": 1,
  "stackColumns": [
    "point",
    "hits",
    "ms",
    "children"
  ],
  "stack": [
    [ 0, 1, 443, [
      [ 1, 1, 0 ],
      [ 2, 1, 0 ],
      [ 3, 9, 235, [
        [ 4, 4, 235, [
          // ...
          "classesColumns": [
          "name"
          ],
          "classes": [
          "JsonLDApiConverter",
          "JpaTripleRepositoryHolder",
          "GmsSerializationService",
          // ...
          "pointsColumns": [
          "class",
          "method",
          "line"
          ],
          "points": [
          [ 9, "userWithName", -1 ],
          [ 1, "getEntityRepository", -1 ],
          [ 10, "_convertWithType", -1 ],
// ...

The format seems a little goofy but the gist is that there will be a ton of duplicate data and I didn’t want to have to include a Zlib JS on the UI so instead I worked out a minimally-redundant format. In the stack array, for example, each item starts with the index of the entry from the points array. Each of those starts with the index of the class in classes.

The UI then does a bit of work to reassemble all of this into something closer to a real data model with objects like Point and Stack.

In the Service

The Java layer has two components: one using Spring AOP (Aspect Oriented Programming) and the second for manual use.

Spring AOP lets you magically wrap service methods in your app:

@Aspect
@Component
@Profile(TraceProfile.ENABLED)
public class TraceAspectAdvice extends BankyBasicAspectAdvice {
  @Around("ourCode() && (repository() || component() || requestMapped() || restController()) && notNoTrace()")
  public Object addTrace(@NonNull final ProceedingJoinPoint joinPoint) throws Throwable {
  return trace(joinPoint);
}
@Pointcut("within(com.example.app..*)")
protected void ourCode() {}

Banky gives you a few existing pointcut definitions to get you started like component():

@Pointcut("(within(@org.springframework.stereotype.Component *) || within(@org.springframework.stereotype.Service *)) && publicMethod()")
protected void component() {}

That’s really it for the first big chunk: define a few pointcuts and let Spring AOP do the rest for you. All of your services, repositories, REST controllers, or other Spring-managed classes can have their public endpoints magically wrapped with that addTrace method:

return trace(joinPoint);

That leads to the second half: manual traces.

Most of your data objects will not be Spring-managed. Sure, you’ll get them from Spring-managed services, but they’ll be POJOs. Because of this, Spring AOP won’t touch them. This means that when you start your trench graphs will only show service-level stack traces. Maybe if your business objects are really simple, like value-object simple, this will be enough. But eventually you’re going to want to add traces to non-managed code.

It’s not as magical or pretty, but you can add Banky traces manually:

public Whatever getWhatever() {
  return trace(() -> {
    // ...
    return whatever;
  });
}

All of these end up at the same place: a thread-local Trace instance which follows invocations of trace(..) and manages a representation of the stack. It’s lightweight enough that even heavy stacks add only a negligible overhead, and if you aren’t tracing the request then it’s almost zero.

A Spring filter sets up this thread-local var, and some controller advice then adds the headers just before the response is serialized and sent.

Dev Experience

Banky isn’t amazing, but it’s enough.

After ~15min of setup in Java-land, entirely within code, you have Spring sending basic trace data in your dev environment. Whether a server-side dev or a UI dev, both are going to be savvy enough to install the Chrome extension and ~~peer into the tea leaves~~ investigate where the high-visibility performance problems are in the service code.

In practical terms at work, Banky has already taken the typical performance inquiry from a day’s worth of investigating down to ~15min. After that it’s down to the fixed cost of figuring out the data structure or algorithm optimization to make the performance problem go away. Banky won’t fix it, but it will help you find it.

I’m hoping we can open source Banky in the near future. There’s nothing proprietary about it, so it’s just a lawyer conversation. Fingers crossed!