Statistics has a bad name. The main reason is due to ignorance of them, how they work, and how to understand and interpret them. It is easy to dismiss statistics in general, and there are many jokes such as “78% of statistics are made up on the spot”, which has a nice self-referential ring to it. Unfortunately, this is also stretched to include scientific discourse in general. This is too bad, both for statistics and for science. It is also unnecessary. What follows is an attempt to clarify a few points regarding interpreting statistics.
Inferential vs. Descriptive Statistics
First, though, we need to understand the difference between Inferential and Descriptive statistics. Descriptive describes and inferential infers. That is, inferential statistics uses data from a smaller sample to infer characteristics of a larger population. For example, 46% of the US Congress is a millionaire (a descriptive statistic, as it includes all data from each US Congress member). If we inferred from that the percentage of the US population that is a millionaire, we would be incorrect (it is around 0.16% or around 300 times smaller a percentage). This means that in terms of personal wealth, the US Congress is unrepresentative of the US as a whole.
Sampling Size and Method
One popular way that introductory statistics courses try and teach students to think about statistics, is to have the students focus on the sample size and sample method. If the sample is unrepresentative, then one cannot infer the conclusions of the study to the larger population, such as people in general. This is a good first step, but it tends to make the students think that all studies can be dismissed out of hand if there is not a perfect sampling method and large sample size in place. Instead, what they should be taught is that the generalizability and interpretability of the findings need to include problems of sampling in their discussion and in the statistic used to interpret the data. A small sample size does not necessarily doom a study. For example, a small sample size with a very large effect size or a high p-value (probability) is still quite important and noteworthy.
The use of humans in scientific inquiry severely limits certain aspects of sampling. For example, it is not ethically possible to demand that people participate in studies, or that their freedom of movement is highly constrained, etc. (at least in most countries). This means that the individuals have to want to participate in the study, and also that there are many confounding factors involved with their participation, in general a lack of controls. These are the conditions under which human subject research is conducted and dismissing study conclusions due to self-selection or lack of other controls is misguided.
Findings vs. Evidence
When reviewing studies, it is important to be careful in interpreting those that do not find any significant difference. For example, if a new drug is administered and there is no significant difference found between that drug and a placebo, that does not mean that there is evidence against the effectiveness of the drug. This is important. One cannot prove the non-existence of something. One can only indicate a lack of evidence. In scientific inquiry there always has to be the open mind regarding evidence and the possibility of finding it in the future.
The background to the significance tests (particular statistic used) and inferential statistics in general is based on the concept of the normal distribution. In other words, all studies are based on the notion that there is a chance their sample is not representative of the larger population. And so, the evidence has to be fairly strong, akin to something like “without a shadow of a doubt” in a legal setting.
Significance, Research Design, and Statistical Tests
The significance found (or the level of significance) the design of the research and the statistical tests used, all contribute to the strength of the finding and the interpretability of a result. People who do not have quite a few statistics classes under their belts are unable to evaluate these things. This is unfortunately the case with many journals which print scientific articles. There is a lack of expertise and careful review of even published work.
Effect Size and Beneficence
The effect size is even more important to review. That is, if there is a dramatic improvement or change involved in what is studied, or if the difference from expectation is surprising, this is important. As well, especially in the case of medical research, if the beneficence — the goodness resulting — is high, then pay attention to the conclusions.
Science is hard to do, it takes time, and in general it is done badly. However, that should not doom all of science or be a reason to reject out of hand any given finding or study. In all cases, we should more closely examine the research, realize that science is a slow, lengthy work-in-progress, and temper what we find (and do not find) with cultural wisdom, and of course that uncommon virtue, common sense.