The A/B Test: Inside the Technology That’s Changing the Rules of Business
Using A/B,
new ideas can be essentially focus-group tested in real time: Without being
told, a fraction of users are diverted to a slightly different version of a
given web page and their behavior compared against the mass of users on the
standard site. If the new version proves superior—gaining more clicks, longer
visits, more purchases—it will displace the original; if the new version is
inferior, it’s quietly phased out without most users ever seeing it. A/B allows
seemingly subjective questions of design—color, layout, image selection,
text—to become incontrovertible matters of data-driven social science.
Today, A/B
is ubiquitous, and one of the strange consequences of that ubiquity is that the
way we think about the web has become increasingly outdated. We talk
about the Google homepage or the Amazon
checkout screen, but it’s now more accurate to say that you visited a Google
homepage, an Amazon checkout screen. What percentage of Google
users are getting some kind of “experimental” page or results when they
initiate a search? Google employees I spoke with wouldn’t give a precise
answer—”decent,” chuckles Scott Huffman, who oversees testing on Google Search.
Use of a technique called multivariate testing, in which myriad A/B tests
essentially run simultaneously in as many combinations as possible, means that
the percentage of users getting some kind of tweak may well approach 100
percent, making “the Google search experience” a sort of Platonic ideal: never
encountered directly but glimpsed only through imperfect derivations and
variations.
A/B is
revolutionizing the way that firms develop websites and, in the process,
rewriting some of the fundamental rules of business.
Here are
some of these new principles.
You have to make choices.
Choose
everything.
A/B
increasingly makes meetings irrelevant. Where editors at a news site, for
example, might have sat around a table for 15 minutes trying to decide on the
best phrasing for an important headline, they can simply run all the proposed
headlines and let the testing decide. Consensus, even democracy, has been
replaced by pluralism—resolved by data.
The person
at the top makes the call.
Data
makes the call.
Google
insiders, and A/B enthusiasts more generally, have a derisive term to describe
a decision-making system that fails to put data at its heart:
HiPPO—”highest-paid person’s opinion.” As Google analytics expert Avinash
Kaushik declares, “Most websites suck because HiPPOs create them.”
Tech
circles are rife with stories of the clueless boss who almost killed a project
because of a “mere opinion.” In Amazon’s early days, developer Greg Linden came
up with the idea of giving personalized “impulse buy” recommendations to
customers as they checked out, based on what was in their shopping cart. He
made a demo for the new feature but was shot down. Linden bristled at the
thought that the idea might not even be tested. “I was told I was forbidden to
work on this any further. It should have stopped there.”
Instead
Linden worked up an A/B test. It showed that Amazon stood to gain so much
revenue from the feature that all arguments against it were instantly rendered
null by the data. “I do know that in some organizations, challenging an SVP
would be a fatal mistake, right or wrong,” Linden wrote in a blog post on the
subject. But once he’d done an objective test, putting the idea in front of
real customers, the higher-ups had to bend. Amazon’s
culture wouldn't allow otherwise.