11 May 2014
The highlight of the SSA Vic 50th anniversary celebrations was a special public lecture, ‘Big Data: where are the statisticians?’, presented by Prof. Terry Speed. Provocatively subtitled, ‘Will we be celebrating 100 years of the Victorian Branch of the Statistical Society of Australia in 2064?’, Terry Speed was upfront that his goal was to make us feel uncomfortable.
Over his long career, Terry has seen many fads in data analysis wax and wane. In his opinion, most did not contribute anything substantially new. For a while he thought ‘Big Data’ was just the latest iteration and would eventually exit the stage similarly, but he has been having second thoughts.
Interest in Big Data has grown rapidly since 2011. That it is over-hyped is beyond doubt. Last year, the advisory firm Gartner reported that Big Data has reached the ‘peak of inflated expectations’ according to its Hype Cycle analysis. This has attracted many companies and individuals to jump aboard and offer their Big Data products and services, with quality varying widely. Terry reviewed a few books on the subject and generally found them lacking in substance or accuracy.
Despite the lack of rigour, or even a clear consensus on a definition, Terry told us that Big Data is real.
Firstly, this is true in its most literal sense: many new sources of data now exist which are large in volume, complexity, or some other aspect. Companies such as Google and Amazon analyse such data as a key part of their business. However, this was not the focus of his talk.
Terry instead honed in on the political side of Big Data. In saying it was ‘real’, he meant that it has gained so much traction that it cannot be ignored. Many significant and substantial initiatives are being planned or already undertaken, with generous funding, after being pitched under the Big Data banner. He gave a number of examples, including some major conferences and research grants.
The big concern from Terry is that they are being done with almost no involvement, or in some cases zero involvement, from statisticians. As he put it, ‘the absence of statisticians in Big Data activities is striking (to a statistician)’.
The Big Data movement has caught many of us by surprise. Big Data centres and schools are springing up around the world, and we usually never hear a word of it until it has happened. The speed at which this is happening is causing significant alarm, especially amongst statisticians in the USA.
Luckily, here in Australia the situation is not nearly as dire. At least, not yet. But Terry warned us not to be complacent. Can you imagine your local university announcing a Big Data institute? Terry said it will happen. Sooner than we think. And we will be the last to know.
What has led to our systematic exclusion?
This is a big question and Terry didn’t pretend to know the answers, but did offer a number of suggestions:
Perhaps many problems in Big Data are (currently) poorly defined, and we tend to shy away from them?
Perhaps our profession is not well understood by society at large, and is therefore consistently excluded, either deliberately or through ignorance? (Terry wondered if the ‘lies, damned lies and statistics’ jibe has taken root and damned us.)
Perhaps many of us lack the relevant skills or experience to get involved, whether these be in computation, marketing or working in large teams?
Perhaps we are reluctant to work on anything too highly specific? (Terry quoted from Applied Statistics by Cox & Snell (1981), where ‘statistical analysis’ is said to only deal with methods that are ‘not highly specific to particular fields of study’.)
Perhaps we just happen to be particularly disconnected from the emerging ‘data science’ community, who are almost synonymous with Big Data in the eyes of the media and policymakers?
Terry noted that many projects being presented at Big Data conferences do not actually feature a ‘big’ dataset. It was just ‘small’ data showcased in a new forum. Correspondingly, some of us (statisticians) are already involved in analysing ‘big’ datasets, but without necessarily adopting the marketing gloss. Are we just not getting the word out there?
Furthermore, Big Data isn’t new to statistics. Starting in the 1990s, various groups have tackled problems in computational statistics and the analysis of ‘huge’ or ‘massive data sets’. Terry showed us papers and conferences from that era. Unfortunately, these efforts never gained traction and entered mainstream practice. Perhaps they were ahead of their time?
What should we do?
This is the hardest question of all.
Again, Terry did not offer solutions. Instead, he listed what he saw as important skills required for being involved in Big Data and Data Science:
- interpersonal, leadership and communication skills
- computational skills
- knowledge of relevant theory
- solid understanding of the subject matter
- critical thinking and common sense, when looking at data (which we often think of as being the “statisticians’ advantage”).
Terry asked if we would all be willing to learn and promote these skills? Whether we are willing to change how we train and identify ourselves ‘in order to play a much larger role in the revolution going on around us, not currently in our name?’
Would we support changing our name from Statistics to Data Science? (Terry was clearly trying to push us out of our comfort zone!)
Some of the questions following the lecture were about what this entails for how we educate statisticians. The advice from Terry was clear: we should expose students to real statistical problems, which require creativity and insight, not just blind application of standard routines. Furthermore, computational skills should be a core part of the curriculum. These insights are not necessarily new. Nevertheless, they present us with a challenge of increasing importance. Can we reform our teaching practices before it is too late?
The timing of this talk couldn’t have been more appropriate. We can look back 50 years and see that the aims of the Society are still as relevant as ever. In his first talk to the Victorian Branch, Evan Williams emphasised the importance of practical experience. How does statistical practice look today? Many things have certainly changed and we need to ensure we keep up.
This was a Big Lecture from Terry.
Many in the audience later remarked that this has opened their eyes and that this issue is one that cannot be ignored.
Terry never defined Big Data. He didn’t offer us easy answers or magic solutions. But he achieved something important. He broke the silence. He gave us permission to face up to this Big issue.
A version of this article was published in the June 2014 issue of the SSAI Newsletter.