The real lesson from wedding stats

Our interview on the BBC this week told the story of us optimising our wedding invitation list using statistical modelling.

I’m delighted that the BBC has conveyed the nature of the problem and a sense of playfulness. I’m even starting to get people contacting me to find out more about ‘guestimation’, as we’ve been calling it. I’m also glad to have this opportunity to show how good statistical thinking can help address (but not necessarily ‘solve’) a vexing problem that many of us might face.

I wrote about this project for Significance magazine last year because I thought it provided an ideal theme for an accessible introduction to some basic statistical ideas, and in their use to aid decision making. In particular, the idea that these situations are about trying to quantify uncertainty and manage risk, rather than seeking high-precision estimates (which are basically impossible in this scenario).

I’d like to take the opportunity, though, of explaining the main lesson from using our wedding statistics model. This key insight was missed by the BBC; it’s always difficult to guess what the main messages a journalist will publish after they speak to you.

The BBC focused on whether the model was ‘right’, concluding that in fact that there were ‘sizeable statistical errors’ and that the model was ‘wrong’. It ended with the moral, ‘if you can’t be right, then be lucky’, which isn’t the point as you’ll see from below.

‘True’ models

I have never seen any model that I could describe as ‘true’, in BBC’s sense of the word. Some might say that it’s a central tenet of applied statistics, and scientific research more generally, that our models will always be ‘wrong’.

George Box famously said that, ‘Essentially, all models are wrong, but some are useful.’ It is inevitable that our models will not be identical to reality, but that’s not what matters. Rather, it is whether they capture enough of reality to make them useful tools for understanding the world and helping us make good decisions.

In some parts of our lives we have embraced this. Weather forecasting is a great example. If the forecast for tomorrow is for rain, we can prepare by bringing a coat or umbrella as we head outside. We know that the forecast isn’t perfect and it won’t always rain, but we are grateful for the warning. The alternative is to make our own judgement by looking out of the window, which often works well, but is far less reliable.

In fact, weather forecasting is an area that has seen tremendous progress over the years (see Nate Silver’s book, The Signal and the Noise for a great account). I remember when I was a child how the 4-day forecasts were always taken with a huge pinch of salt, but I can now regularly rely on them when planning my week. Nevertheless, they are still ‘wrong’.

How ‘wrong’ were we?

Last year, Tim Harford wrote about us on his blog and for the Financial Times. He conveyed a similar message as in the More or Less episode, but put it more strongly. He described my modelling assumptions as ‘flat-out wrong’ and ‘felicitously flawed’. While I certainly don’t claim the model was right, the evidence doesn’t bear him out.

Harford notes that I overestimated the probability of attendance for each group of guests that we invited. He quotes the fact the group we called as ‘likely’ to attend, for which we assigned a probability of attendance of 80%, had in fact zero attendance. But he fails to mention that this group only included 2 invitees. Getting 0 out of 2 is certainly not strong evidence against the true probability being 80%, which any first-year statistics student can appreciate. You can hardly tell anything from a sample size of 2.

If you read my article, you’ll see that for each group of guests except one, the probability we assumed was within the 95% confidence interval calculated from the actual attendance. In other words, you can’t claim with confidence that our assumptions were very different from reality.

A minor exception was for the ‘definitely’ group, where we assumed 100% attendance. This was bound to be an overestimate, but it was a deliberate one we made for pragmatic reasons and were upfront about. Thus, it is not deserving of the ‘flawed’ label. (For the record, the attendance for the ‘definitely’ group turned out to be 96 out of 100 guests.)

How useful was our model?

Estimating wedding attendance is difficult. We hadn’t done it before, there are no reliable guides to doing it, and we had no previous data to work with. But we had to invite some guests and we certainly didn’t intend to do it blindly.

We drew on our own intuitions and life experiences to help us get a handle on how many people would come. This is the same information that any other couple would draw upon for their wedding. The only difference is that we formalised our intuitions into a concrete mathematical model.

As I described in my article, we were making a calculated stab in the dark. Our model could be described as an extreme Bayesian: all assumptions and no data. Hardly up to scientific research standards. Don’t do this at home kids!

A simple approach is to say something like, ‘95% of local invitees will come, and 20% of the overseas ones’, and then proceed to calculate a single number as your estimate. For example, if you invite 100 locals and 40 people from overseas, you expect on average \(100 \times 0.95 + 40 \times 0.20 = 103\) guests. This is somewhat useful, but begs a few questions. How close to 103 are we expecting to get? How likely are we to exceed a certain number (e.g. the capacity of the venue)?

We took this idea further and, with a few quite reasonable assumptions, calculated a prediction interval for our wedding, rather than just a single point estimate. This gave us a much better assessment of the true uncertainty.

We didn’t expect our assumptions to be perfect. However, the formulation as a model allowed us to more easily work out how any set of assumptions translated into an actual range of attendance.

This was particularly crucial for us because we didn’t care so much about the expected attendance, but more the fact that we didn’t exceed the capacity of the venue. This required selecting an invitation list where the expected number of guests is lower than the upper limit. But how much lower? There is no way to gauge this from a point estimate alone.

What’s the alternative?

We wanted to send our invitations in a single round and focused on calculating the optimal number to send.

Our modelling approach was more sophisticated than most couples would attempt. They might do more crude calculations, or none at all, and send out their invitations blindly hoping for the best.

Is this any more ‘wrong’ than our approach? Does our increased sophistication lead people to (falsely) expect magic-bullet results?

That’s possible, and understandable. In our case, we understood the limitations of our assumptions and expected them to be somewhat fallible. Those not familiar with using models in this manner might find it more difficult. One of my goals was to demystify this process using a familiar scenario. Alas, this did not filter through to the BBC coverage.

Are there better solutions?

Harford admits that he doesn’t have a better suggestion. He personally prefers multiple rounds of invitations, and selective `disinvitations’. He believes that it generally leads to less embarrassment overall (although he himself wasn’t so lucky when he tried to organise a party in this way). It’s certainly a valid strategy, although not one we were comfortable with for our wedding.

A friend of ours reduced his uncertainty by calling up each of his guests before sending out the official invitations. That’s a lot of work, but the payoff is much less risk, which he thought was worth the effort.

Errors cancel out

There is a passing reference in the BBC coverage to the idea that ‘errors cancel out’.

This is actually a fundamental idea in probability theory (see the law of large numbers and the central limit theorem) which plays a key role in the success of applied statistics. It is what allows us to make reliable and accurate inferences from relatively small samples.

Unfortunately, we didn’t have time in the interview to go into these ideas, and the fact that we had perfect wedding attendance was inaccurately put down to luck. However, I’m glad they included it anyway because it is such an important idea.

Engaging and educating

Overall, I thought the BBC stories of us were fun and engaging. I hope that the coverage helps to popularise statistics. It’s a tough job, combining the teaching of basic statistical concepts with news/entertainment. The BBC’s More or Less radio program generally does a good job and I hope that there will be more opportunities to get involved in future.

On the BBC

My wife Joan and I were featured on the BBC today. Twice!

For our wedding, I did some statistical modelling to optimise our invitation list. I wrote about it last year for the Young Writers Competition run by Significance, a popular statistics magazine. It was selected as the winning entry and published in the Aug 2013 issue.

This caught the eye of Tim Harford, the host of More or Less on BBC Radio 4. He interviewed both of us for today’s episode.

Ruth Alexander wrote an accompanying article for BBC News Magazine, also published today.

(Note: if you are streaming the episode from the website, our interview begins at 23:44. If you are listening to the podcast version, the interview begins at 22:53.)

Update: see my follow-up post for some more discussion.

Odds

Sometimes I hear people ask, ‘what are the odds?’ and without skipping a breath they start mentioning probabilities rather than odds.

Unless you have been taught otherwise, I guess it’s natural to think of these as vaguely the same sort of thing (not to mention similar concepts such as chance and likelihood). However, within statistics they have precise and different meanings.

You can think of probability and odds as being like Celsius and Fahrenheit. They measure the same thing but on different scales. You need to convert from one to the other before you can compare them.

A probability is a number between 0 and 1 that we use to represent how likely something is to happen. I think most people know this and use it correctly. The letter \(p\) is often used to denote a probability.

Odds are another way to give a number to an event to represent its chance of occurring, but this time the scale goes between 0 and infinity. A probability of \(p\) corresponds to an odds of \(p/(1-p)\). For example, suppose there are only two possibilities for the weather tomorrow: rainy or sunny. If the probability of rain is 0.2, then the corresponding odds of rain is 0.2/0.8 = 0.25. The probability of being sunny is therefore 0.8, and the odds is 0.8/0.2 = 4.

This version of odds is also called odds in favour. An alternative is odds against, which is simply the reciprocal, \( (1-p)/p\). In the weather example, the odds against rain is 4, and the odds against sun is 0.25.

Gambling odds

The place where most people encounter odds is at the horse races (or other sports where gambling is popular). In this case, the ‘odds’ are a way for bookmakers to show how much money they would pay if you were to select the winning bet. The favourite horse in a race will pay less than any of the others, because it is deemed most likely to win.

It turns out many different types of odds are used. (I don’t know why there are so many, it only makes things difficult!) They typically vary by country. I’ll describe three of them.

In the UK, the standard is fractional odds. It is the ratio of the winnings to the bet amount, expressed as a fraction. For example, if were offered odds of 3/2 for the horse Prancing Diva, and placed a $10 bet, you would be paid 3/2 \(\times\) $10 = $15 if it won, plus also your original $10 bet, leading to a total payout of $25. If the horse loses, you lose your $10 bet.

In Australia, the convention is to use decimal odds. This is the ratio of the full payout (winnings plus original bet) to the bet amount, expressed as a decimal. For Prancing Diva, the equivalent in decimal odds is 2.5 (which is $25 divided by $10). It is easy to calculate decimal odds from fractional odds, simply convert to a decimal and add 1 (for the original bet).

In the USA, the system of choice is moneyline odds. It is represented as a whole number. When positive, it is the winnings for a bet of 100. When negative, it is the bet required to win 100. The odds for Prancing Diva in this case would be 150. To get moneyline odds from fractional odds, simply multiply by 100 if greater than 1, and multiply the reciprocal by -100 if less than 1. For example, 3/1 becomes 300 and 1/3 becomes -300.

The table below compares these three types of odds.

Gambling odds vs true odds

Strictly speaking, gambling odds are different to the odds I described earlier, which I’ll refer to as true odds. Whereas true odds are precise statements about how likely an event is, gambling odds describe possible financial transactions on offer. You can think of them as showing the ‘cost’ of various bets.

To understand the relationship between the two, suppose there will be a race between Prancing Diva, Gallop-a-lot, Canterberry and Trotskyite, with the the probability of each horse winning being 0.5, 0.2, 0.2 and 0.1 respectively. The corresponding true odds against are 1, 4, 4 and 9.

Honest Joe the bookmaker could offer fractional odds of 1/1, 4/1, 4/1 and 9/1 on this race (i.e. equal to the true odds against). If he were to do this, his average profit would be exactly zero. This is not a good way to run a business. Instead, he offers the less favourable odds of 2/3, 3/1, 3/1 and 7/1. This reduces his required payouts and increases his profit.

If we pretend these are the true odds against and convert them back to probabilities, we get 0.6, 0.25, 0.25 and 0.125 respectively. We call these the implied probabilities.

Probabilities have the nice property that if you add them all up, you always get 1. In Joe’s case, the total is 1.225. The difference is due to him building in a profit margin. The bigger the difference, the greater the profit.

An intuitive explanation is that Joe is ‘pretending’ each horse is more likely to win then they actually are, so that he can pay you less on your bets. This requires him to add in extra probability, pushing the total over the true total probability of 1.

While gambling odds need to reflect the underlying knowledge of how likely each of the possible events are (otherwise punters would have a sure bet), the fact that they don’t respect the probability sum property means they are not true odds.

Everyday odds

When your friend asks you, ‘what are the odds?’ and starts quoting numbers, which type of odds are they (if any at all)? Most likely it’s unclear. To avoid confusion, I like to stick with probabilities. What do you prefer?

Gambling odds Comparison values
Fractional Moneyline Decimal True odds against Implied probability
10/1 1000 11.00 10.00 0.091
9/1 900 10.00 9.00 0.100
5/1 500 6.00 5.00 0.167
4/1 400 5.00 4.00 0.200
3/1 300 4.00 3.00 0.250
2/1 200 3.00 2.00 0.333
3/2 150 2.50 1.50 0.400
1/1 +/-100 2.00 1.00 0.500
2/3 -150 1.67 0.67 0.600
1/2 -200 1.50 0.50 0.667
1/3 -300 1.33 0.33 0.750
1/4 -400 1.25 0.25 0.800
1/5 -500 1.20 0.20 0.833
1/9 -900 1.11 0.11 0.900
1/10 -1000 1.10 0.10 0.909