Uncertainty is the foundation of every experiment. When we don’t know if a change has a positive or negative impact, we A/B test it to resolve the disagreement. It’s why controversy is so essential to the experimentation process and should be embraced.
Controversy is also a good way to prioritize ideas. When testing features, start with your most controversial ones first. Are people angered when mentioning a possible change? Test it. The outcome could go both ways.
Let’s say there’s an internal argument over how to list products on screen. Some believe bigger, fewer images will perform better. Others think smaller, more numerous thumbnails should remain in place. The smartest way to resolve this controversy is through experimentation and testing to uncover what performs best.
Lukas Vermeer used to lead Experimentation at Booking.com and now runs the program at Vista. When discussing how to start an experimentation culture, he also recommends embracing controversial debate within an organization. When you grab the most heated topic, it ensures that your program delivers a relevant and memorable result that will influence minds.
Although disagreements can sometimes create tension and put employees in uncomfortable positions, they help us uncover where we went wrong. And once we get over our perfectionist tendencies, admitting that we’re wrong is a good thing. It’s the linchpin of improvement, the force behind transformation, and the most important chapter in any hero’s journey.
When You Welcome Transformation, Better Outcomes Follow
How can you help your organization sail through controversies without ruffling too many feathers? In this three-part blog series, we argue that controversies are the baselines of experimentations:
1. They need to rely on an agreed-upon goal.
2. You need to structure disagreements on top of those agreed-upon goals.
3. Prepare for surprising results before a negative result disappoints stakeholders. Running through scenarios ahead of time is less painful than being unexpectedly confronted with being retroactively wrong.
Start With a Detailed, Agreed-Upon Goal
People rarely fight openly about business ideas. In that context, identifying disagreements requires you to be inquisitive. Sometimes, you risk rubbing colleagues the wrong way by suggesting that not everyone agrees. If there are no public debates, ask everyone what they see as the organization’s primary goal.
Drive an All-Encompassing Goal From Disagreements
If there are apparent disagreements, try to go above the fray. Ask proponents of each idea what they are trying to achieve.
Aggregate the ideas in a single, shared objective:
- Say one team wants to increase profits, while another wants to improve customer satisfaction. That can feel contradictory. Spending more to make customers happy will hurt margins.
- In reality, both care for long-term profitability. Instant profits are a part of it, but keeping current customers happy encourages them to come back and contribute to later profits.
- You can leverage the apparent disagreement on the strategy. Spell out that both share the same objective on principle. Propose to use an estimation of discounted profits as an overall goal. Even without details, with agreement on principle, you have pulled the teams together.
- However, more details are needed to support decisions. Gradually introduce the following:
1. The disagreement has helped your organization define a more comprehensive objective.
2. Both stakeholders should agree with that overarching goal. However, they might find it too complicated. Rarely will they engage without the promise of an agreement.
3. It is compelling to look at their distinct objectives as elements of a larger mission.
Either way, the starting point has to be well-defined and have an explicit agreed-upon shared objective. For example, this goal could be to increase revenue, improve application performance, or influence customer retention.
Typically, you don’t just want to use one metric. Create a timeline over the next three or six months. Or, you could use a discounted cash flow estimation if you care about long-term trends.
Breakdown High-Level Goals Into Specific Team Objectives
A long-term objective like that is an excellent North Star for a company. It’s comprehensive and has a long-term view. However, it’s not always a great goal for individual experiments.
Why? Here are four main reasons:
1. It can take too long after improvements happen.
To measure the twelve-month retention, you need to wait at least twelve months. Finding a short-term proxy, like a number of complaints, helps get test results in a reasonable time.
2. That metric measures something further from your change.
That makes it a noisier signal. A noisier signal means results are less likely to be significant. If you improve your product search, users are more likely to find what they want. Whether they buy it depends on the competitive price, reasonable delivery conditions, or if the items are in stock. It should increase sales, but you are better off measuring whether users found what they were looking for.
3. It includes all activities, including those not affected by the change.
That makes it noisier still, and even less likely to be significant. If you improve how you handle complaints, users who have complained are less likely to churn. Overall churn will include users who didn’t complain and were not influenced by the change.
Not all reasons are because the “overall” or lagging metric is less representative of what is happening.
4. There are cases when a local improvement displaces more than it improves—often called “cannibalization.”
Common Displacement Examples
Say, a supermarket sells toiletries. They offer larger packages of cleaning products and toilet paper at a discount. Customers with foresight and an eye for a rebate might order in bulk. Revenue increases for now, but they won’t be back for weeks.
There are further examples where displacement can happen the other way like when customers postpone consumption. Imagine an airline with a frequent-flier program, where members have the ability to redeem their miles and book flights. In many cases, users don’t spend their miles as fast as expected. Eventually the airline sees accumulated miles piling up and becomes concerned. Consider this a good opportunity to increase redemption rates.
Another Example of Shifting Goals
In order to encourage spending, airlines might introduce a higher tier of rewards: exclusive perks for millions of miles. The announcement might initially lower the redemption rate because some flyers will try to accumulate enough to reach that higher tier.
Another classic example of cannibalization is promoting a new payment option. Let’s say you’re promoting Apple Pay over credit cards because it’s more convenient. In this case, you can expect an increase in customers using Apple Pay being significant. However, there’s more to the story. You might be mainly converting credit users to Apple Pay in the process, which doesn’t necessarily mean new revenue for the business.
You can’t look at the increase in sales using Apple Pay and see it as additional revenue. You want to look at the overall conversion rate to determine the impact of promoting a more convenient option.
Influencing Local Metrics
Most changes influence the “local” metrics, like the conversion from one step to another in a funnel. An experiment can generally measure a significant impact on local metrics. While the overall metrics might be affected, the impact is less likely to be significant.
Let’s take the last example: a video streaming platform has noticed that users who watch more shows have better retention. They want to increase the number of viewing hours. On their platform, users can bookmark shows they want to see on a watchlist.
Viewers who actively use bookmarks tend to have higher viewing hours overall. They also have better subscription retention. The platform wants to make the option to bookmark shows more prominent. They A/B test the impact.
A seasoned experimentation specialist would expect both:
1. A significant impact on the metric that you were aiming to improve directly and early—to prove that the change did as expected. In our example, that’s an increase in shows being bookmarked. Now that the feature is more prominent, is it getting more use? If that change isn’t significantly positive, the implementation is at stake. Bookmarking wasn’t made easier by the change.
2. A positive but not significant impact on the most distant, or wider objective: a lagging metric covering an activity downstream. That lagging metric is often more business relevant. Those can be purchase, retention, or in our example, hours-watched and retention. If significantly more people bookmark shows, but there isn’t a significant increase in hours watched, the intent of the change is debatable. That second metric can also be wider, as the overall conversion rate. You might not be able to reach significance on that last metric if it’s too remote or general.
Agreeing on Metrics
When reviewing results and deciding what to do, decision-makers need to appreciate the mechanism. There should be an agreement on what metrics need to be significantly affected before deciding to roll out the feature to all users.
Let’s take a final example to illustrate how to think about objectives with different horizons: a music streaming service wants to increase the number of customers who list their favorite musicians.
That might not raise the number of hours listened (that’s often constrained by users’ schedules) or the subscription rate significantly. But, you can separately model how users with favorites tend to have a better experience. They use the service more, and retain their subscriptions for a longer time.
Alternatively, you can decide that knowing users’ favorite musicians explicitly allows you to develop new services. They can be alerted when they release a new album, or when tickets for their concerts are available. In that case, the local, short-term metric is sufficient because the wider objectives haven’t yet been implemented.
Make Those Specific Objectives Team Goals
Assigning responsibility is a common business practice that helps teammates develop a sense of belonging and impact.
Grant Permission Through Goal Metrics
Ask one team to be responsible for improvements to one specific metric. Then, grant them free reign to find initiatives that can improve upon it. For example, let’s say a team is responsible for customer satisfaction. They should modify the experience before the sale if they think that supplier ratings, explanations, or clearer warnings will avoid costly returns.
The teams in charge of increasing conversion might have a vested interest in simplifying the funnel. They would evaluate those changes according to their objectives—and warnings tend to hurt funnels. Detailed supplier ratings mean a more complex ranking. Detailed explanations don‘t fit on a small mobile screen.
However, the team focusing on conversion won’t have the oversight of clients and their concerns after the sale. If those teams have disagreements, they can test the impact of changes and judge the test results using the company’s overall goal. This more considerate evaluation criteria should make both teams happy.
If the gains of reducing customer complaints are worth more than the loss of restricting the conversion funnel, then release the warnings. If the loss on conversion is greater than the cost of dealing with unhappy customers, then keep a simple flow.
Build Up Results to Overall Impact
To satisfy all concerns, reconstruct the overall goal from a set of agreed-upon leading team metrics.
Construct a Tree of Goal Metrics
Consider metrics like visits per week, conversion, margin per transaction, or post-sales costs. You can multiply the first two to have overall conversion events per week. Multiply that by the third to have a margin per week. If you take out the fourth, that’s your profits per week.
Every company would have slightly different metrics (per week or month, have several registered or active users rather than visits, costs per conversion rather than overall). The key is to have an accounting formula towards the key goal: usually revenue, or profit. With that formula, you can compute that an increase of 10 percent in visits corresponds to the same extra profit as a reduction of 25 percent in post-transaction costs.
Communicate that Goal Tree Widely
Making sure every team knows a simple version of that formula, with rough values for each element, is key.
Armed with that comparison, a team can think of improvements in the service that happens to be expensive, for example offering free delivery or having more stock available. They can A/B test it. If this test leads to a significant improvement—say a 5 percent increase in conversion—then convert it to an equivalent rise in profit.
Use The Goal Tree to Resolve Arbitrages
They can then compare that effect to other costs, notably paying for that improvement. This approach allows ideas that are overall beneficial but affect other teams, to be considered fairly.
Imagine some you sell products, and some occasionally get out of stock. They are generally back fast enough, but it can delay delivery by up to a week. Some customers don’t appreciate the delay. On occasion, they cancel the sale. Should you flag unavailable products, or remove them from consideration? It would lower conversion, even when you might be able to honor the sale.
If the cost of cancellations, or dealing with angry customers, is worth less than the increase of conversion, yes. If long-term clients are valuable and aggravating them is more expensive, then no. Comparing those costs precisely requires well-defined goals and accurate A/B testing.
The Impact on Metrics
This formula allows your team to draw an equivalence between impacts on each metric.
The parallels are not always exact, but they give a good sense of orders of magnitude. They are also helpful in understanding which team has the most impact on the overall business. If 90% of your margin goes to pay for after-sales costs, then that’s where most of your impact is. When that represents only 10% of your margin, then you can focus on upstream improvements.
Reward Impactful Exploration
Don’t just count successful test runs. Look at the overall impact of campaigns of tests. Compare two teams:
- One runs five tests to fix well-known edge cases. Three of those five have a significant positive impact on a small group of users. There’s a 60% success rate. They saved 10 percent of customer service costs, representing 2% of profits. That’s great.
- Another team is very ambitious and tries something completely different. They iterate four times. The fifth one is a significant improvement. There’s a 20% success rate. But they increased the average spend per order by 15%, which corresponds to an increase in profits by 24%. That’s much better for the business.
After running tests for a few quarters, you’ll get a sense of how much each team can achieve. This expected impact can guide which effort should receive the most investment, or be given the most leeway in exploring ideas. Executives might disagree on who should get more resources and pull their way. That’s a big part of the job. However, they are less likely to disagree on which team has demonstrated greater impact with A/B tests.
Sharing results allows employees to evaluate which approaches work best. They can gradually appreciate how the product can overlook specific customers. They can notice that certain changes (for example reducing ambiguity, removing blockers, and faster page loads) have more impact. They can lead which ones are a good investment of their time, attention, and effort.
Disagreeing on which team should benefit the most from company growth is typically a tense conversation. Rather than surprise collaborators, present a context where impact is well understood. Explain the mechanism for growth in the less-heated context of rolling-out features in a more structured and empowering framework.
Thank you for reading our three-part blog series, “How to Make Controversy Good for Business, Part 1!” Part 2 is coming soon to a Split blog near you.
See how feature flags and data drive results. Schedule a demo!