Big Data/Analytics Zone is brought to you in partnership with:

Arthur Charpentier, ENSAE, PhD in Mathematics (KU Leuven), Fellow of the French Institute of Actuaries, professor at UQàM in Actuarial Science. Former professor-assistant at ENSAE Paritech, associate professor at Ecole Polytechnique and professor assistant in economics at Université de Rennes 1. Arthur is a DZone MVB and is not an employee of DZone and has posted 160 posts at DZone. You can read more from them at their website. View Full User Profile

In Statistics, Language Matters

  • submit to reddit

In statistics, it might be difficult to know what a symbol stands for. For instance,\widehat{\theta} can either be a real value, i.e. the value taken by a statistics from a given sample. But it can also be a random variable, assuming that the sample is now a collection of i.i.d. random variables. We can usually distinguish‘s (values from a given sample) and‘s (the underlying random variables, i.e.\omega) for some\omega\in\Omega). But notations might confusing, and it is hard to distinguish random variables, and values taken by random variables (or realizations). But usually, if we look at the context, one can figure out what symbols stand for.

But sometimes, it is difficult to get a proper definition, not for some symbols, but for words. And most of the time common words. Recently, I wrote a short paper, claiming that it was difficult to model the number of bodily injuries related to car accident, since it is difficult to define death. Actually, the definition of dead did change a few years ago (as weird as it might sound), which did cause a rupture of some series.

I recently had a similar story, discussing with a pharmacist in Montréal who said to me “you French are known to be the world’s champion in terms of drug consumption“, see e.g.

  • The French are Europe’s champion medicine-takers” in…, mentioning “heavy drug-consumption culture
  • The data show that drug consumption in France remains one of the largest in Europe” in…
  • France has one of the largest drug markets in the world and the drug consumption per capita… (among so many articles)

I do not think I am a drug addict (I might be – like most of my colleagues – a coffee addict, but as Paul Erdős  – or more probably Alfréd Rényi – said once, “a mathematician is a device for turning coffee into theorems“). The main problem here is the notion of “consumption“. The economics interpretation is simply that someone buys a product or a service (see…). There is also the food-related interpretation, where consuming means ingesting, i.e. eating or drinking (see…).

So pill and drug “consumption” is ambiguous: is it the number of pills purchased, or ingested (actually consumed), or prescribed? The first thing one should remember is that the Social Security in France refunds (almost) all medications prescribed by a doctor. So it is uncommon to leave the office of a doctor without a prescription, at least of aspirin: a visit to the doctor is usually, in France, the opportunity to stock some over-the-counter drugs. The second thing is that there is a major difference between France and North America when we go to the pharmacy. In Montréal for instance, if I have a prescription for 12 pills, then the pharmacist does give me exactly 12 pills (from a big pot). In France, pills are sold in prepackaged boxes, so if the box contains 10 pills, I will get 2 boxes, just to be sure I’ll get my 12 pills. From a medical point of view, I will consume my 12 pills, but from an economic perspective, I will consume 20. So comparing statistics is extremely difficult, not because the the maths, but because it is difficult to define (even simple) concepts.

Published at DZone with permission of Arthur Charpentier, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)