Co-written by Adrienne* and Alexis
A friend recently told me a story about how an Amazon team had chosen a north star metric that misguided them, leading them to millions of dollars in lost sales. Average order value is one of the dominant metrics for any e-commerce business that desires to make more money, and Amazon was no different.
The team believed their average order value was already high enough, so they stopped focusing on improving this metric. Several months later, however, they realized that the metric was very skewed. While there were a small number of customers who spent a lot on their orders, the vast majority of customers had very small order sizes. This neglect led the team to lose out on a lot of potential sales.
How to know when your top-line metric is good enough to capture all the business-critical scenarios
Several years ago, I* created a metrics dashboard to inform our team on the quality of our in-car experience. When I was getting feedback on metrics for the dashboard, every time, someone would propose some way to capture yet another bias. I felt like I was on a treadmill of endless improvements to make the dashboard better. So how do you know when your dashboard is good enough to be useful to drive decision-making for product decisions?
The original metric I started with as the top-line for measuring quality of experience was simple: average errors encountered per 100 trips taken.
Soon, I realized this was not the best metric. After all, trips can be different lengths. What if one customer has a 5 hour road trip and another has a 15 minute grocery trip? We need to normalize this metric by time spent rather than trips.
Another colleague mentioned:
Average errors per ride is a skewed metric. The average neglects a long tail of customers who are having an extremely bad experience. And these are the customers who will be complaining on social media.
Every time I improved the metrics dashboard, I’d always think of another way to improve it or receive another suggestion. It’s easy to get into a cycle of endlessly improving your metric to reduce bias. So when is your metric is good enough?
Bad vs. Good vs. Great PMs
Here are examples of how different product managers might approach the creation of a metrics dashboard:
Bad PM: “I will eliminate all possible bias in this top-line metric.”
Feels good because you are laser-focused on a single thing.
Dashboard takes too long to complete and never gets to a place where it is useful.
Good PM: “Let’s just get the metrics to a place where they are good enough -- they don’t have to be perfect.”
Feels good because you’re done with the dashboard. Just ship it, right?
Your metric has blind spots due to bias that’s not addressed. You lose out on customer growth because their bad experiences are not reflected in your metrics.
Great PM: Let's say this top-line metric I've chosen is good enough. How many supporting graphs do I need to track all the badness?
Feels like a lot of work because you have to carefully think through every scenario.
You have a top-line metric that is easy to understand, and you also have supporting metrics to capture scenarios missing in your top-line metric.
How to tackle ambiguous problems
If you have to solve a problem and are struggling with navigating ambiguity, attack it by first having a working solution, and second asking yourself:
“Ok, let’s assume this working solution is good enough. What are all the ways things might go wrong and how large is that bucket of things?”
An un-constructive way to approach the problem once you have a working solution is to convince yourself it’s already good enough, and then stop there.
For example, let’s say your top-line metric is: average errors per 100 trips.
If you’re worried that it might not capture all scenarios, you can add supporting graphs underneath to track ways that top-line metric might be misleading. Distributions are a nice way to capture data points your top-line might miss.
For example, let’s say you’ve identified that one blindspot you want to remove is related to the long tail of bad experience. You come up with the following metrics:
#1: Remove blindspots related to the long tail of bad experience:
Distribution of users by average errors per 100 rides
# of people who experienced an error today
The percentage of users who experience zero errors in their lifetime
So, you’d add the supporting graphs below your top-line:
Then, you identify a second blindspot you want to remove: normalize by different trip durations:
#2: Remove blindspots related to variable trip length
Average errors per 100 trips, for the bottom 25, 50, and 90 percent of trips lengths (p25, p50, p90)
This dashboard has four supporting graphs. If you had more, you may want to reconsider another top-line metric.
A dashboard needs to tell the whole story, so you’ll display your top-line metric graph, along with supporting graphs to remove blindspots. If you’re finding that you need a lot of supporting graphs, however, then you should revisit your top-line metric to ensure it captures a comprehensive perspective.
Thank you Stefan Hermanek for reading early drafts. This post has also been published on www.productschool.com communities.