Some say that a shift from hypothesis testing to confidence intervals and estimation will lead to fewer statistical misinterpretations. Personally, I am not sure about that. But I agree with the sentiment that we should stop reducing statistical analysis to binary decision-making. The problem with CIs is that they are as unintuitive and as misunderstood *p*-values and null hypothesis significance testing. Moreover, CIs are often used to perform hypothesis tests and are therefore prone to the same misuses as *p*-values.

Some points to consider:

- 95 % confidence is a confidence that in the long-run 95 % of the CIs will include the population mean. It is a confidence in the algorithm and not a statement about a single CI.
- In frequentist terms the CI either contains the population mean or it does not. It’s just one from the dance of CIs to cite Cummings.
- There is no relationship between a sample’s variance and it’s mean. Therefor we cannot infer that a single narrow CI is more accurate. In this context “accuracy” refers to the long run coverage of the population mean. Look at the visualization above and note how much the widths of the CIs vary. They can still be narrow but far away from the true mean.

The take home message is that we must accept that our data are noisy and that our results are uncertain. A single “significant” CI or *p*-value might provide comfort and make for easy conclusions. I hope this visualization shows that instead of drawing conclusions from a single experiment, we should spend our time replication results, honing scientific arguments, polish theories and form narratives, that taken all together provide evidentiary value for our hypothesis. So that we in the end can make substantive claims about the real world.

Have any suggestion? Send them to me, my contact info can be found here.