Putting Our Pricing to the Test

 

As I’m sure many of you know by now, recently we announced that we’re running a price experiment. While we explained the basics in that post, Uber’s Team Science wanted to go into a little more detail about how this experiment’s being conducted. OMG Uber is running a price experiment

Here at Uber HQ, we notice what you, our users, say about us. And you really like us!

  • @TheClayFox : The line for a cab at 4th and King was 20 people long. @uber_sf picked us up in 2 minutes. Literally. I. Love. @uber.
  • @ellentupman: Customer service awesomeness from the good people @Uber_SF. Again. If you haven’t tried @Uber, you are missing out.
  • @gina_oreilly: @Uber_SF I just can’t resist you. Hands down the fastest and easiest way to get around this city.
  • @dariusmiranda: @Uber_SF Car was prompt. Immaculate. Driver was courteous. Awesome ride. I moved the Uber app to front page of my iPhone. Thanks!

We’ve argued in our Hidden Cost of Cabs post that our San Francisco pickup times are so much lower than taxis that our premium is worth the time you saved by riding with us. Our reliability, speed, and quality user experience counts for something, right? Okay, we hear you. You want lower prices too. That’s why we’re doing science. Even better, the mathematical foundation of our experiment basically comes down to beer. Seriously. More on that in a bit…

Source: Anne Holcomb

We’re always looking for ways to make Uber better. (For example, building city-specific models of estimated travel times.) So how much can we reasonably drop our price? To answer this, we turn to our trusted friend #UberData. So, where, exactly, will we get the Uber data we need? From you! Well, some of you.

For an extended period of time in San Francisco we’re running a test offering reduced rates to a small group of randomly selected users. If you’re one of these people, you already received an email from us with the details.

A few important things to note:

  • This test is only running in San Francisco.
  • Random = random. So sorry, we can’t add you to the test group. It’ll break science. Don’t break science!

You may not know this, but I’m actually a neuroscientist. And my summer with Uber is making me miss writing papers like I used to. So although a blog post isn’t peer review, I’m going to geek out a bit and write a science paper about our testing. Actually, let’s write this like a high school science experiment:

INTRODUCTION

 

Uber Calculus: Uber + MATH > taxis

QUESTION

Will people use Uber more if it’s cheaper?

VARIABLES

Independent: Percent discount. Dependent: Number of rides taken.

HYPOTHESIS

We’re conducting what’s known as a “price elasticity test”. The idea is simple: if we decrease our price by 10% and that results in a 20% increase in ridership, then overall it’s better for business if we are 10% cheaper. The goal is to find the cost/demand point that’s best for everyone. Our hypothesis is that lower price will result in more rides.

GROUPS

We’re conducting a random experiment. We have four total experimental groups, with each group getting a different discount rate. These groups are random, but they are also relatively homogeneous. (That means they’re pretty similar in all the metrics we’re tracking.)

This gives us a good starting point, statistically speaking, from which to launch our experiment. Each group will have the same number of participants. We determined the number of people using what’s known in experimental research as a “power analysis”. This lets us calculate the number of data points we need in order to detect a statistical effect at a given level of certainty.

Importantly, in statistical hypothesis testing, you have two main sources of error: the false positive rate (α) and the false negative rate (β). The α is the same as the significance level, which is a number in the domain [0, 1]. An acceptable α is chosen prior to commencing the research. Usually in science this is 0.05. The β is the probability of not detecting an effect that truly exists. “Power” is equal to 1-β, so the higher the power, the better your sensitivity at seeing the effect of interest.

After taking into account the desired power, the a priori significance level, and other internal factors such as the proportion of our users that we can expect to take a ride within the given timeframe of the experiment, we settled on 200 users.

With this number, we have both a specificity and power greater than 0.99. This means that any changes in rider behavior have a high probability of being detected. Also important to an experiment is the control group. Because of normal company growth and other factors, we need to make sure that any changes we see in ridership behavior is due specifically to the change in cost.

While in essence anyone not getting a discount could be part of the control group, to ensure that we’re comparing like-to-like and to make the groups more comparable, we’re segmenting out a group of 200 users who have a similar ridership profile as our three discount groups. This lets us compare our experimental groups against this group of 200 lucky 0%-off users. (Hooray you!)

METHODS

First and foremost we’re observing the ridership behavior of our experimental guinea pigs users. We’re taking into account a lot of factors that, in the end, will inform us about an optimal pricing strategy.

Here’s where the beer comes in. In the early 1900s, Guinness had the amazing foresight that math and science people are an invaluable resource for any good company. So they hired a statistician: William Gosset, the original home-brewing, mustachioed, tiny-bespectacled, industry math geek:

Gosset wanted to test the quality of the beer they were brewing. But he couldn’t test all of the beer. He could only test small batches. He had an incomplete knowledge of the system.

So he did something quite clever: he randomly sampled small amounts of beer and created a statistical formula that allowed him quantify the probability that these small samples represented the rest of the beer. This method became the backbone of experimental research, giving researchers a simple tool to infer the behavior of a population based upon a small, random sample: the Student’s t-test.

So not only did beer help Uber’s Science Team write the code to conduct our testing, but it also gave birth to the very testing technique we’re using to analyze our data!

Eventually Gosset’s methods were expanded upon, but to this day the simple t-test is commonly used in experimental research. The variant we’re using is called the one-way ANOVA, which can be simplistically thought of as a t-test for many groups simultaneously while reducing the penalty for correcting for multiple comparisons:

So what’s the end result?

RESULTS Uber Calculus Results: Uber + money = happy! Cheaper Uber for the win!

Our ultimate goal is to minimize your cost while providing our service at a price that maximizes business. The data we collect from this experiment will ultimately benefit all of our riders. If you’re sad that you didn’t get to take part in this experiment, stay strong — we’re likely going to be lowering prices for everyone. And rest-assured, this is only our first, simple foray into Uber experiments. We have a lot more planned for the future…