Last week I gave a talk, Factors for success in big data science, at the University of Melbourne. This was to the Big Data Reading Group, a recently formed informal group within the Department of Mathematics and Statistics.
I had three aims for my talk: to give a brief overview of some ‘big data’ projects I have been involved in; to describe what I think made them successful (especially factors that are transferable across projects); and finally to suggest ways we can reform statistics education at university to foster such success.
In a nutshell, I advocated for a more practical focus in our education, with explicit teaching of data management and programming skills, more emphasis on using real (and messy!) data, and more time spent doing projects, including as part of a group. See my slides for more details.
I’m certainly not the first to suggest such changes. In fact, this seems to be one of those perennial discussions that gets rehashed regularly, with university inertia preventing too much rocking of the boat. However, given the recent surge of interest in ‘big data’ and ‘data science’, and the call from our leaders to reform our profession (such as Terry Speed and Bin Yu), I thought this was a perfect opportunity to have this conversation.
The barriers
About a dozen people came to my talk, including four senior academic staff. We engaged in an extensive discussion which, judging by how far we went overtime, made it clear that we were all passionate about this topic. We agreed that reform would be an excellent idea. The hard part was how to do it. These were the main barriers put forward by members of the audience:

Lack of resources. This refers to funding cuts, lack of qualified teaching staff and university rules that prevent running subjects with too few students. Ultimately, it all boils down to a limited (and shrinking) pot of money.

Student resistance to change. Apparently, current students are more interested in the mathematical side of statistics and do not like openended assignments. As Rafael Irizarry reports, teaching the messy parts of applied statistics ‘requires exploration and failure which can be frustrating for new students.’ Many students also dislike group work, partly because additional effort of working with others and partly because they believe the assessment allows some students to freeride off the efforts of more diligent ones.

Students are illprepared by high school. Much of the early undergraduate teaching is spent on getting students ‘up to speed’ due to weak teaching at high school, leaving less time to learn new things.

Not enough time for the ‘basics’. There was a view that the current syllabus does not even cover the basic material properly, let alone have any room to add new things.
Overcoming the barriers?
These are real concerns and it is clear they have occupied many people’s minds.
Lack of resources is a fundamental challenge. I do not doubt that our mathematics and statistics departments are underfunded and that more money would make a measurable difference. Nevertheless, there is still a question about how best to spend the existing money.
I believe we don’t yet have the balance right. If learning to manipulate real data is not a ‘basic’ statistical skill, then what is?
We can try looking across campus for help to adapt our teaching methods to more closely reflect real world scenarios. Engineering departments have students regularly work in groups and engage in realistic projects. What can we learn from them? Perhaps we need to look at some good practices for assessing and communicating group work?
We can also look for ways of getting more money. Since income depends strongly on student numbers, can we attract more students? With the surge of interest in big data and data science, surely there is now a strong market for a practically focused statistics course?
Other universities are responding to this demand by innovating and developing new courses. Some courses are even available online, such as the Data Science Specialisation on Coursera, run by three prominent biostatisticians at Johns Hopkins University.
I see this as a challenge for the future of the statistics profession. By no means do I think any of this is easy to implement, nor do I claim any personal expertise in tertiary education. I look to leadership from statistics departments because I worry that students interested in data analysis will look elsewhere and will miss out on learning key statistical principles.
The academic staff from the department said that three new statistics subjects are planned for next year. I hope they feature a decent dose of data analysis.
I notice that we don’t need to go beyond campus to find examples of courses and subjects that aim to teach data analysis in a practical context:
The new Master of Business Analytics at the Melbourne Business School features many of the things I suggested, including team projects, programming and communication skills, and I’m sure they’ll be firmly focused on realistic business problems.
The Master of Biostatistics, run by the Melbourne School of Population and Global Health, has an explicit subject on data management and statistical computing.
Some computer science subjects, such as Statistical and Evolutionary Learning, feature ‘handson practical experience’ on ‘realworld problems’.
Great suggestion and well done.