Statistics capstone

On Tuesday, SSA Vic hosted a panel discussion on Statistics education in the age of Big Data. One of the panellists was Julie Simpson, who I work with at ViCBiostat. She decided to poll the ViCBiostat postdocs beforehand to get our thoughts and channel them into the discussion.

I thought back to how I would change my undergraduate learning and came up with two suggestions:

  1. End-to-end exposure on working with real problems. That means everything from planning an experiment or study, dealing with the acquisition and cleaning of the data, through to delivering a final report or presentation (or interactive web app…).

  2. A mental map of statistical methods. That is, a broad understanding of all of the different areas of statistics (and machine learning, data mining, etc.), how they relate to each other, and what types of problems each of them are useful for. I think is more useful than learning to be highly proficient in a few methods and being ignorant of what else is out there (which accurately describes my state after undergrad, although it was even worse because I was too ignorant to appreciate how ignorant I was!).

Ideally, both of these would be slowly developed over the whole degree, but they can also be explicitly taught as part of a ‘capstone’ subject in the final year. A quick web search for ‘statistics capstone’ reveals that some universities (mostly in the USA) indeed seem to run subjects of this sort, especially focusing on the ‘end-to-end’ aspect. I don’t know if they also provide a mental map. If not, I think that would be a valuable addition.

Barriers to reforming statistics education

Last week I gave a talk, Factors for success in big data science, at the University of Melbourne. This was to the Big Data Reading Group, a recently formed informal group within the Department of Mathematics and Statistics.

I had three aims for my talk: to give a brief overview of some ‘big data’ projects I have been involved in; to describe what I think made them successful (especially factors that are transferable across projects); and finally to suggest ways we can reform statistics education at university to foster such success.

In a nutshell, I advocated for a more practical focus in our education, with explicit teaching of data management and programming skills, more emphasis on using real (and messy!) data, and more time spent doing projects, including as part of a group. See my slides for more details.

I’m certainly not the first to suggest such changes. In fact, this seems to be one of those perennial discussions that gets rehashed regularly, with university inertia preventing too much rocking of the boat. However, given the recent surge of interest in ‘big data’ and ‘data science’, and the call from our leaders to reform our profession (such as Terry Speed and Bin Yu), I thought this was a perfect opportunity to have this conversation.

The barriers

About a dozen people came to my talk, including four senior academic staff. We engaged in an extensive discussion which, judging by how far we went overtime, made it clear that we were all passionate about this topic. We agreed that reform would be an excellent idea. The hard part was how to do it. These were the main barriers put forward by members of the audience:

  • Lack of resources. This refers to funding cuts, lack of qualified teaching staff and university rules that prevent running subjects with too few students. Ultimately, it all boils down to a limited (and shrinking) pot of money.

  • Student resistance to change. Apparently, current students are more interested in the mathematical side of statistics and do not like open-ended assignments. As Rafael Irizarry reports, teaching the messy parts of applied statistics ‘requires exploration and failure which can be frustrating for new students.’ Many students also dislike group work, partly because additional effort of working with others and partly because they believe the assessment allows some students to free-ride off the efforts of more diligent ones.

  • Students are ill-prepared by high school. Much of the early undergraduate teaching is spent on getting students ‘up to speed’ due to weak teaching at high school, leaving less time to learn new things.

  • Not enough time for the ‘basics’. There was a view that the current syllabus does not even cover the basic material properly, let alone have any room to add new things.

Overcoming the barriers?

These are real concerns and it is clear they have occupied many people’s minds.

Lack of resources is a fundamental challenge. I do not doubt that our mathematics and statistics departments are under-funded and that more money would make a measurable difference. Nevertheless, there is still a question about how best to spend the existing money.

I believe we don’t yet have the balance right. If learning to manipulate real data is not a ‘basic’ statistical skill, then what is?

We can try looking across campus for help to adapt our teaching methods to more closely reflect real world scenarios. Engineering departments have students regularly work in groups and engage in realistic projects. What can we learn from them? Perhaps we need to look at some good practices for assessing and communicating group work?

We can also look for ways of getting more money. Since income depends strongly on student numbers, can we attract more students? With the surge of interest in big data and data science, surely there is now a strong market for a practically focused statistics course?

Other universities are responding to this demand by innovating and developing new courses. Some courses are even available online, such as the Data Science Specialisation on Coursera, run by three prominent biostatisticians at Johns Hopkins University.

I see this as a challenge for the future of the statistics profession. By no means do I think any of this is easy to implement, nor do I claim any personal expertise in tertiary education. I look to leadership from statistics departments because I worry that students interested in data analysis will look elsewhere and will miss out on learning key statistical principles.

The academic staff from the department said that three new statistics subjects are planned for next year. I hope they feature a decent dose of data analysis.

Data science is inclusive

I’ve often heard data science described as a combination of three things: mathematics & statistics, computer science (sometimes simply called ‘hacking skills’) and domain knowledge. Drew Conway showed this using a, now ubiquitous, Venn diagram:

Drew Conway's data science Venn diagram

This accurately describes the set of skills that an employer is after when they seek to hire a single data scientist.

However, such people are rare. They have been compared to unicorns. To depict data science as an intersection of these skills presents a misleading picture of our ‘profession’. In reality, the term ‘data science’ covers work that is done by many existing professions.

To do data science on a decent scale, we need to engage a multidisciplinary team of data scientists who collectively have the required expertise. None of them will be unicorns, but together they can fill out the Venn diagram. That means data science is more accurately viewed as the union of these skills:

Data Science Venn Diagram v2.0

Evan Stubbs emphasised these points last week in his talk, Big Data, Big Mistake. According to him, the relentless search by employers for ‘unicorn’ data scientists has led to disappointment and disillusionment, and we need to communicate to them the idea that data science is groups of people.

With ‘data science’ now a mainstream term, we have a fantastic opportunity to unite our professions under a common banner and combine our skills together to solve problems we cannot do alone. This is not only good for all of us as practitioners. It is also what society seeks from us.

Let us embrace data science as an inclusive discipline.


Drew Conway’s Venn diagram is licensed under a Creative Commons Attribution-NonCommercial Licence and is reproduced here in its original form. The Data Science Venn Diagram v2.0 is an adaptation of Drew Conway’s diagram by Steven Geringer and is reproduced here by permission. The image of both diagrams link back to the original source.

Adam Bandt discusses evidence-based policy

Two weeks ago the Federal Member for Melbourne, Adam Bandt, gave a public lecture on the role of evidence in public policy in Australia. I helped to organise this talk as one of the monthly events for SSA Vic. Our goal was to hear how evidence is used (or not) by decision makers, in this case politicians.

Adam’s covered many topics and fielded a large number of questions from the audience. You can listen to the recording to hear it all (approx. 1 hour). Here, I summarise the points that stood out for me.

Lessons learnt from climate change policy

Climate change featured prominently in both Adam’s talk and the audience’s questions. As part of his role in the previous government, Adam was frank in describing both their successes and failures. Two of these stuck with me.

Early on, the government put together a committee to develop a set of policies to tackle climate change. It consisted of parliamentarians from multiple parties, and an equal number of experts from a variety of fields. Adam said the presence of the experts changed the dynamic of discussion considerably:

‘When you are sitting across the table from an expert…your ability to prosecute crap arguments diminishes drastically. You’ll be held to account very, very quickly by someone who’ll just tell you that’s simply not right.’

Seems like a great idea to me. Getting politicians and experts talking together, surely it’s a no brainer? Shouldn’t this happen more often?

On the other end, one of their major mistakes started once they had developed their policy and passed the legislation. They presumed there was no longer any need to talk about the problem. The public information campaign that followed concentrated on details of the carbon price and the compensation package, with little mention of global warming or the fact that this legislation is tackling a big social problem.

‘The failing to talk about the problem, and just presuming because you have a good technocratic fix to it then that’s enough, is part of the problem,’ according to Adam. This allowed the Opposition to shift the debate to be about something other than the underlying problem, to a debate about the Government’s credibility, without any reference to climate change.

Adam’s 3-step plan

Often it’s easy to point out problems but much harder to come up with solutions. Adam offered us three.

1. Entrench facts into government decision making, by law

Adam suggested two ways of doing this. Firstly, by setting up a sustainability commissioner in various government departments, whose role is to provide independent scientific advice (for example, about the impacts on biodiversity or energy use). The key point is that the relevant minister would be required, by law, to take that advice into account. Of course, they could chose to ‘ignore’ any advice but they would need to make a statement to this effect. Adam believes this would change the dynamic of many decisions and make evidence harder to ignore.

Secondly, an increased use of randomised controlled trials (RCTs) as part of policy development. However, Adam was a bit reserved on this point, wanting to see more evidence that these are indeed effective. He mentioned that a large review was underway in the UK to assess the ability of RCTs at measuring the effects of social policy.

2. Increase the scientific literacy of the population through public education

Those who wish to attack evidence-based positions can resort to variety of underhanded tactics. One is to manufacture doubt. Another is to falsely undermine the evidence by blurring the distinction between evidence and moral values.

Adam believes that increasing scientific literacy can help to blunt both of these attacks, and also lead to increased acceptance for a greater role for evidence in decisions. He would do this by investing more in science and mathematics education in primary and secondary schools.

A byproduct of such an education would be a greater ability by the public to distinguish between the use of evidence versus the use of values to guide decisions. Hopefully, this will lead us to a situation where politicians would be allowed (in fact, compelled) to change their policies in response to new evidence without being falsely accused of ‘flip-flopping’.

3. Get scientists & researchers to be more political

Adam’s final message was directed squarely at us, the scientists and researchers in the audience. Unless we fight for our slice of the political pie, according to Adam, it will be instead taken by those (of which there are many) who are motivated by self-interest and not necessarily the evidence.

One way to get political is to (like Adam) leave our jobs and stand for election. It would be great to have a few more scientists in Parliament, but that won’t be enough nor is it a realistic prospect for most of us.

Instead, Adam urged us to get organised and pool our efforts. Some of us will need to go out in public and advocate on behalf of scientists. We will also need an effective campaigning organisation. (Adam mentioned the Australian Academy of Science but noted that it acts more as an advisory body than as a campaigning organisation.) Comparing our plight with that of the mining industry, which collectively ran a multi-million dollar advertising campaign against the mining tax, Adam asked, ‘Where is the alternative, equivalent organisation…[who will] run a TV advertising campaign for science & research?’

The question of money arose. Adam admitted that this is indeed a challenge. However, a surmountable one. He said we need to find ‘allies’ out there who have an interest in Australia being a well-resourced, research & science community. There are many of them around and they are just waiting to be pulled together.

To explain or predict?

Inspired by a recent blog post from Rob Hyndman, last week I read Galit Shmueli’s paper, To explain or to pre­dict?.

I cannot recommend this paper enough. It should be essential reading for anyone involved in data analysis.

Shmueli distinguishes two different aims when analysing data: prediction and explanation. She describes in detail how the modelling and analysis process should differ whether you are doing one or the other. She even shows a concrete example where the model that works best for prediction is different to the model that works best for explanation. This was a key insight for me. Previously I had assumed the intuitively appealing idea that the best model for one will also be the best for the other. I’m glad to have this corrected. I see this idea advanced all the time, and now I know for sure that it’s false.

Another key message from Shmueli is that even though our primary aim will be either prediction or explanation, we should, if possible, assess our models on both criteria. We would expect good models to perform reasonably well in either setting, and it will usually be insightful to assess both.

Bin Yu gave a talk earlier this week on ‘mind-reading’, showcasing her group’s work on reconstructing movies from brain signal measurements. In one step of their modelling process, they do a trade-off between ‘explainability’ and ‘predictability’. Specifically, they chose a model that was easier to interpret at the expense of a bit of predictive performance. This is the first time I’ve seen anyone do this explicitly. It reminds me of the bias-variance trade-off and talks directly to the ideas in Shmueli’s paper.

Car share cost comparison

When I moved to the UK to study many years ago, one of the big changes for me was living much closer to my workplace. Having grown up in the Melbourne suburbs, this was a revelation. Suddenly, I didn’t need to spend hours every day commuting. I was an instant convert. It also allowed me to avoid buying a car, very handy on a student budget.

Upon returning to Melbourne, I was keen to continue a minimal-commute, car-free existence. I now live and work close to the CBD. Public transport is very easy when you are so central, there is plenty of choice and frequent service. I’m pleasantly surprised how little I actually need a car.

Nonetheless, sometimes only a car will do the job. What are the best options out there? The familiar ones are to take a taxi or rent a car. Over the last few years, a new option has entered the mix: car sharing. This is similar to renting, but you can book a car for shorter periods of time (for example, 2 hours for a big shopping trip) and with less hassle (simply reserve a car online, and then pick it up and drop it off without signing any forms).

Three car share businesses have established a presence in Melbourne: GoGet, Flexicar and GreenShareCar. I wanted to join one but it wasn’t clear which one was the best deal for me. Frustratingly, they each had a different pricing structure. So I whipped out the trusty spreadsheet and did some calculations, which made the choice much clearer.

Some people have asked me if I could share this around, so I’ve polished it up a bit and hopefully made it easy to use. You can grab a copy from here:

Australian car share comparison (Google Docs spreadsheet)

The instructions are on the first sheet. The easiest way to use it is to make a copy of it within Google Drive (File > Make a copy...).

The spreadsheet makes a number assumptions, such as averaging out your trips equally across all months, and not accounting for any uncertainty in the number of trips, but that’s probably fine for a rough estimate. Use it as a guide only, and trying different scenarios to see how much difference it makes.

Terry Speed sounds the alarm

Two weeks ago, the Victorian Branch of the Statistical Society of Australia celebrated its 50th anniversary. Many people turned out for the event, including a few members who were there right from the beginning. You can read a short historical account of the Branch in this short review from Ian Gordon and myself.

The clear highlight of the day was a lecture from Terry Speed, warning us that statisticians are at risk of being left out of the Big Data revolution. He certainly raised many eyebrows! Find out more in my review of his lecture.

Simple vivid advocacy

I heard many talks at the recent Science meets Parliament event (see my previous post for a summary). The most memorable for me was ‘How to talk like a policy maker’ by Professor Hugh White from the ANU.

The part that stood out the clearest were his three tips for communicating:

  • Simplify, without distortion.
  • Be vivid, without being needlessly provocative.
  • Advocate, don’t polemicise (that is, only use arguments backed by evidence).

Prof. White noted that being simple and vivid is more important than being concise. This was a revelation to me and immediately rang true. I had always conflated the two concepts, but I see now how they relate. The overall goal is to communicate an idea. Doing so in a simple and vivid manner is likely to be successful. Being concise is one strategy for this, and although it can often work it isn’t necessarily the only way (and can backfire, if it leads to oversimplification).

Another piece of advice he gave, related to his third point above, is how to deal with criticism. Rather than respond to the critics, you should respond to their arguments. In other words, focus on the evidence, reasoning and ideas. (This is well known advice, of course, but it’s good to be reminded of it.)

Prof. White had many other things to say, and was immediately followed by talk from Will Steffen. See Nick Falkner’s detailed summary of their talks if you are interested.

Science meets Parliament 2014

‘No other nation does it quite like this.’

Catriona Jackson, CEO of Science & Technology Australia, opened the proceedings at Science meets Parliament 2014 by telling us that more than half of our elected representatives will be personally involved.

Last week, almost 200 scientists from around Australia gathered to meet ‘face to face with the decision makers in Canberra’. Now in its 14th year, this two-day event aims to teach scientists how to communicate more effectively with politicians, policymakers and the media, and also to give them the opportunity to actually meet parliamentarians themselves and put this into action.

We were addressed by many speakers over those two days. These included some of our country’s leaders, each with their own call to action. Ian Macfarlane, Minister for Industry, appealed for closer collaboration with industry and greater commercialisation. Bill Shorten, Leader of the Opposition and Shadow Minister for Science, urged us to make science a national political issue. Ian Chubb, Chief Scientist of Australia, advocated a long-term strategy for science, which should include areas such as education and community engagement, as well as research.

However, the majority of speakers were there to teach us about effective communication. I found this very informative and wish to share with you what I have learnt.

Below, I summarise the main ideas from most of the talks. Many speakers touched on similar topics and advice, I’ve tried to combine them together into a single cohesive guide. If you would prefer more of a ‘blow-by-blow’ account of each speaker, check out Nick Falkner’s comprehensive notes on his blog.

A note about terminology: I use the terms politician and policymaker throughout. Just to be clear, I use the former to refer to our elected parliamentary representatives and the latter to refer more generally to anyone involved in formulating and influencing public policy (this includes, for example, public servants).

Communication: general tips

  • Tell your story. People can engage with and readily understand stories. Craft a coherent story from your findings. This should highlight the key pieces of evidence, and should include some relevant anecdotes (consistent with your evidence).

  • From complicated to meaningful. When explaining complicated concepts, talk only about the parts that are meaningful to the audience. Know when to stop, we don’t usually want the full complexity! Test out your message on some non-expert friends who can give you frank advice.

  • Alternative outcomes, rather than bare uncertainty. Uncertainty is a key part of scientific research and is not part of most people’s everyday language or experience. A good way to frame uncertainty is to present alternative outcomes and the risks associated with each.

  • Build relationships. Communication requires trust. There is a (warranted) widespread perception, especially amongst policymakers, that ‘evidence’ can be created to support any desired viewpoint. Hence, they will only believe facts and advice given to them by people and organisations they trust. This is why building a relationship is cruicial and will usually need to be done over a period of time.

  • Get expert help. It’s okay not to be a communications expert. Not everyone will have the aptitude or interest in this activity. It’s fine to ask others to do it for you. (But we can’t all pass the buck, some of us will need to be communications experts!)

Communicating with politicians

  • Understand the politicians’s goals and drivers. Your advice needs to help the politician meet their committments. For example, if you are talking with someone from the government, what did they promise in the last election? Of course, you also have your own goals. Aim to create win-win solutions.

  • Solutions, not entitlements. Don’t simply make requests. Politicians are bombarded by such claims all the time and will most likely ignore you. Instead, talk about solutions, and specifically for the problems that matter to them.

  • Craft your message. A successful one will have:

    • a narrative,
    • evidence (must be consistent with and supporting the narrative),
    • some ‘breakthrough’ examples (everyone loves scientific ‘breakthroughs’),
    • cost/benefit estimates.

    The last of these is important. The cost of any policy will be heavily scrutinised before it even gets close to the implementation stage. There are at least two benefits to discussing the costs yourself. Firstly, it shows that you can ‘speak the language’ of policymaking, by engaging in this key step in the decision making process. Secondly, it gives you the opportunity to make a compelling case for the benefits, otherwise it will be left to someone with less knowledge and enthusiasm.

  • Unite. For large groups it is very helpful to talk with a single voice. Bill Shorten gave the example of the NDIS. Providing assistance to people with disabilities was always a moral imperative, but it wasn’t until the very many support and lobby groups came together as part of the Every Australian Counts campaign and presented a single message that it gained significant political traction. According to Mr Shorten, a challenge for us when advocating for science is to find our unifying message.

  • Plan ahead. At the conclusion of the meeting you will want to have some next steps. Perhaps it might be the opportunity to present some more detailed findings, or a referral to a more senior politician. Think about your desired next steps as you plan your meeting.

Communicating with policymakers

  • Learn the ‘logic’ of policymaking. Science and policymaking have different goals. Science is about finding the truth, policy is about making decisions. This gives each a different dynamic. Science has a special status within policymaking due to its role in interrogating and elucidating true facts of the world. Nevertheless, the ultimate goal is to make decisions. Anything you say as a scientist should be to assist with that process.

  • Answer the question. It is vital to answer the exact question of interest to policymakers, with reasonable caveats. Don’t answer a tangential or related question simply because you know more about it (hence the importance of the ‘reasonable caveats’).

  • Understand the policy cycle. The are multiple stages to the development of policy: understanding and formulating the questions, exploring potential solutions, costing and comparing the various options, implementing the selected solution, and finally evaluating the outcomes. The stages aren’t necessarily linear, a policy can go back and forth many times as more is learnt about the problem and the policy is refined.

    If you wish to get involved, find out what stage of the development cycle the current policy is at when giving advice. For example, if a policy has already been implemented and is at the evaluation stage, it’s not helpful to give suggestions on how it should have been formulated differently.

    Think about at what stage your knowledge would be useful. Target, and time, your advice appropriately.

Communicating with journalists

  • Tell your story. This was already mentioned above, but is particularly important here. Journalists write stories. They need to turn your news into a story. You can help them by doing this for them. Otherwise, they will have to do it for you and may unwittingly distort the facts in the process.

  • What makes your story newsworthy? There are many factors that make stories ‘newsworthy’, including its timing & location, whether it is inherently interesting, involves people, is controversial, etc. You don’t need conflict to generate media interest. Conflict generally only enters the picture once the issue makes the transition from being only about science to also being about political action.

  • Simplify, just enough. Journalists need to dumb things down to make their stories accessible. Help them out by dumbing it down for them, in way that doesn’t distort the facts. Avoid jargon and unnecessary detail. Focus on key findings and messages.

  • Reduce ambiguity and uncertainty. Scientific research and journalism often has opposing aims. Journalists generally don’t like shades of grey and long timelines, they add complication to stories. Formulate a story that doesn’t require too much of either. Otherwise, they might do this for you in a way that you don’t like.

  • Keep searching for media time. There is plenty of space and time available in the media, just not necessarily on ‘prime time’. You can get exposure by going to local radio stations or newspapers, or for more specialist or niche shows and publishers.

  • Practise. You can develop your media skills by writing regularly. Two ways to do this are to write a blog or for The Conversation. Some more resources are available from Inspiring Australia.

Press coverage

If you wish to read more about Science meets Parliament, I recommend Ara Sarafian’s great summary on The Conversation.

Acknowledgements

I am grateful to the Victorian Branch of the Statistical Society of Australia, the Victorian Centre for Biostatistics and the Murdoch Childrens Research Institute for supporting my attendance at Science meets Parliament 2014.

I also wish to thank Science & Technology Australia for organising the event, all of the guest speakers and the very many Senators and Members of Parliament who made the time for private meetings with us.

Advertising for statisticians (in Australia)

Want to hire a statistician?

For advertising statistical job openings in Australia, I highly recommend:

These are both free and will reach a large number (perhaps even the majority?) of statisticians in Australia and New Zealand.

A few other avenues worth considering:

You may also want to advertise internationally. There are probably many avenues available. A few that I am familiar with, which tend to be popular for academic jobs, include: