Sunday, March 31, 2013

The A/B Test: Inside the Technology That’s Changing the Rules of Business

BY BRIAN CHRISTIAN

Using A/B, new ideas can be essentially focus-group tested in real time: Without being told, a fraction of users are diverted to a slightly different version of a given web page and their behavior compared against the mass of users on the standard site. If the new version proves superior—gaining more clicks, longer visits, more purchases—it will displace the original; if the new version is inferior, it’s quietly phased out without most users ever seeing it. A/B allows seemingly subjective questions of design—color, layout, image selection, text—to become incontrovertible matters of data-driven social science.

Today, A/B is ubiquitous, and one of the strange consequences of that ubiquity is that the way we think about the web has become increasingly outdated. We talk about the Google homepage or the Amazon checkout screen, but it’s now more accurate to say that you visited a Google homepage, an Amazon checkout screen. What percentage of Google users are getting some kind of “experimental” page or results when they initiate a search? Google employees I spoke with wouldn’t give a precise answer—”decent,” chuckles Scott Huffman, who oversees testing on Google Search. Use of a technique called multivariate testing, in which myriad A/B tests essentially run simultaneously in as many combinations as possible, means that the percentage of users getting some kind of tweak may well approach 100 percent, making “the Google search experience” a sort of Platonic ideal: never encountered directly but glimpsed only through imperfect derivations and variations.

A/B is revolutionizing the way that firms develop websites and, in the process, rewriting some of the fundamental rules of business.

Here are some of these new principles.

You have to make choices.
Choose everything.

A/B increasingly makes meetings irrelevant. Where editors at a news site, for example, might have sat around a table for 15 minutes trying to decide on the best phrasing for an important headline, they can simply run all the proposed headlines and let the testing decide. Consensus, even democracy, has been replaced by pluralism—resolved by data.

The person at the top makes the call.
Data makes the call.

Google insiders, and A/B enthusiasts more generally, have a derisive term to describe a decision-making system that fails to put data at its heart: HiPPO—”highest-paid person’s opinion.” As Google analytics expert Avinash Kaushik declares, “Most websites suck because HiPPOs create them.”

Tech circles are rife with stories of the clueless boss who almost killed a project because of a “mere opinion.” In Amazon’s early days, developer Greg Linden came up with the idea of giving personalized “impulse buy” recommendations to customers as they checked out, based on what was in their shopping cart. He made a demo for the new feature but was shot down. Linden bristled at the thought that the idea might not even be tested. “I was told I was forbidden to work on this any further. It should have stopped there.”

Instead Linden worked up an A/B test. It showed that Amazon stood to gain so much revenue from the feature that all arguments against it were instantly rendered null by the data. “I do know that in some organizations, challenging an SVP would be a fatal mistake, right or wrong,” Linden wrote in a blog post on the subject. But once he’d done an objective test, putting the idea in front of real customers, the higher-ups had to bend. Amazon’s culture wouldn't allow otherwise.