The real lesson from wedding stats

Our interview on the BBC this week told the story of us optimising our wedding invitation list using statistical modelling.

I’m delighted that the BBC has conveyed the nature of the problem and a sense of playfulness. I’m even starting to get people contacting me to find out more about ‘guestimation’, as we’ve been calling it. I’m also glad to have this opportunity to show how good statistical thinking can help address (but not necessarily ‘solve’) a vexing problem that many of us might face.

I wrote about this project for Significance magazine last year because I thought it provided an ideal theme for an accessible introduction to some basic statistical ideas, and in their use to aid decision making. In particular, the idea that these situations are about trying to quantify uncertainty and manage risk, rather than seeking high-precision estimates (which are basically impossible in this scenario).

I’d like to take the opportunity, though, of explaining the main lesson from using our wedding statistics model. This key insight was missed by the BBC; it’s always difficult to guess what the main messages a journalist will publish after they speak to you.

The BBC focused on whether the model was ‘right’, concluding that in fact that there were ‘sizeable statistical errors’ and that the model was ‘wrong’. It ended with the moral, ‘if you can’t be right, then be lucky’, which isn’t the point as you’ll see from below.

‘True’ models

I have never seen any model that I could describe as ‘true’, in BBC’s sense of the word. Some might say that it’s a central tenet of applied statistics, and scientific research more generally, that our models will always be ‘wrong’.

George Box famously said that, ‘Essentially, all models are wrong, but some are useful.’ It is inevitable that our models will not be identical to reality, but that’s not what matters. Rather, it is whether they capture enough of reality to make them useful tools for understanding the world and helping us make good decisions.

In some parts of our lives we have embraced this. Weather forecasting is a great example. If the forecast for tomorrow is for rain, we can prepare by bringing a coat or umbrella as we head outside. We know that the forecast isn’t perfect and it won’t always rain, but we are grateful for the warning. The alternative is to make our own judgement by looking out of the window, which often works well, but is far less reliable.

In fact, weather forecasting is an area that has seen tremendous progress over the years (see Nate Silver’s book, The Signal and the Noise for a great account). I remember when I was a child how the 4-day forecasts were always taken with a huge pinch of salt, but I can now regularly rely on them when planning my week. Nevertheless, they are still ‘wrong’.

How ‘wrong’ were we?

Last year, Tim Harford wrote about us on his blog and for the Financial Times. He conveyed a similar message as in the More or Less episode, but put it more strongly. He described my modelling assumptions as ‘flat-out wrong’ and ‘felicitously flawed’. While I certainly don’t claim the model was right, the evidence doesn’t bear him out.

Harford notes that I overestimated the probability of attendance for each group of guests that we invited. He quotes the fact the group we called as ‘likely’ to attend, for which we assigned a probability of attendance of 80%, had in fact zero attendance. But he fails to mention that this group only included 2 invitees. Getting 0 out of 2 is certainly not strong evidence against the true probability being 80%, which any first-year statistics student can appreciate. You can hardly tell anything from a sample size of 2.

If you read my article, you’ll see that for each group of guests except one, the probability we assumed was within the 95% confidence interval calculated from the actual attendance. In other words, you can’t claim with confidence that our assumptions were very different from reality.

A minor exception was for the ‘definitely’ group, where we assumed 100% attendance. This was bound to be an overestimate, but it was a deliberate one we made for pragmatic reasons and were upfront about. Thus, it is not deserving of the ‘flawed’ label. (For the record, the attendance for the ‘definitely’ group turned out to be 96 out of 100 guests.)

How useful was our model?

Estimating wedding attendance is difficult. We hadn’t done it before, there are no reliable guides to doing it, and we had no previous data to work with. But we had to invite some guests and we certainly didn’t intend to do it blindly.

We drew on our own intuitions and life experiences to help us get a handle on how many people would come. This is the same information that any other couple would draw upon for their wedding. The only difference is that we formalised our intuitions into a concrete mathematical model.

As I described in my article, we were making a calculated stab in the dark. Our model could be described as an extreme Bayesian: all assumptions and no data. Hardly up to scientific research standards. Don’t do this at home kids!

A simple approach is to say something like, ‘95% of local invitees will come, and 20% of the overseas ones’, and then proceed to calculate a single number as your estimate. For example, if you invite 100 locals and 40 people from overseas, you expect on average \(100 \times 0.95 + 40 \times 0.20 = 103\) guests. This is somewhat useful, but begs a few questions. How close to 103 are we expecting to get? How likely are we to exceed a certain number (e.g. the capacity of the venue)?

We took this idea further and, with a few quite reasonable assumptions, calculated a prediction interval for our wedding, rather than just a single point estimate. This gave us a much better assessment of the true uncertainty.

We didn’t expect our assumptions to be perfect. However, the formulation as a model allowed us to more easily work out how any set of assumptions translated into an actual range of attendance.

This was particularly crucial for us because we didn’t care so much about the expected attendance, but more the fact that we didn’t exceed the capacity of the venue. This required selecting an invitation list where the expected number of guests is lower than the upper limit. But how much lower? There is no way to gauge this from a point estimate alone.

What’s the alternative?

We wanted to send our invitations in a single round and focused on calculating the optimal number to send.

Our modelling approach was more sophisticated than most couples would attempt. They might do more crude calculations, or none at all, and send out their invitations blindly hoping for the best.

Is this any more ‘wrong’ than our approach? Does our increased sophistication lead people to (falsely) expect magic-bullet results?

That’s possible, and understandable. In our case, we understood the limitations of our assumptions and expected them to be somewhat fallible. Those not familiar with using models in this manner might find it more difficult. One of my goals was to demystify this process using a familiar scenario. Alas, this did not filter through to the BBC coverage.

Are there better solutions?

Harford admits that he doesn’t have a better suggestion. He personally prefers multiple rounds of invitations, and selective `disinvitations’. He believes that it generally leads to less embarrassment overall (although he himself wasn’t so lucky when he tried to organise a party in this way). It’s certainly a valid strategy, although not one we were comfortable with for our wedding.

A friend of ours reduced his uncertainty by calling up each of his guests before sending out the official invitations. That’s a lot of work, but the payoff is much less risk, which he thought was worth the effort.

Errors cancel out

There is a passing reference in the BBC coverage to the idea that ‘errors cancel out’.

This is actually a fundamental idea in probability theory (see the law of large numbers and the central limit theorem) which plays a key role in the success of applied statistics. It is what allows us to make reliable and accurate inferences from relatively small samples.

Unfortunately, we didn’t have time in the interview to go into these ideas, and the fact that we had perfect wedding attendance was inaccurately put down to luck. However, I’m glad they included it anyway because it is such an important idea.

Engaging and educating

Overall, I thought the BBC stories of us were fun and engaging. I hope that the coverage helps to popularise statistics. It’s a tough job, combining the teaching of basic statistical concepts with news/entertainment. The BBC’s More or Less radio program generally does a good job and I hope that there will be more opportunities to get involved in future.

Damjan Vukcevic