Thomas Lumley introduced a great new phrase at his SSA Vic talk on Tue. He said we should aim for **‘robustness of meaning’** in our analyses. By this he meant that we should ensure that the quantities we are estimating are indeed answering real questions. This is particularly important with complex data, such as arise from networks and graph models, where it is difficult to formulate appropriate models and methods.

One example he gave relates to the popularity of fitting power laws to many datasets (web page views, ‘long tail’ sales data, earthquake sizes,…) and estimating the exponent. It turns out that a log-normal distribution is usually a much better fit, which means that the exponent is not actually a meaningful quantity to consider for these data.

Another example, which actually doesn’t involve very complex data at all, is the Wilcoxon rank-sum test: it is *non-transitive*. If the test shows that X > Y and that Y > Z, it doesn’t let you conclude very much about the relationship between X and Z. Thomas elaborated this in much more detail in today’s ViCBiostat seminar, explaining that it’s a major flaw of the test (a ‘bug’, as he called it) and in fact reflects a fundamental difficulty with analysing ordinal data. Interestingly, these facts are closely connected to Arrow’s impossibility theorem, which basically says you can’t have a perfect voting system. He explains all of this clearly on his blog.

Robustness of meaning can be quite elusive!