Today’s announcement at CERN of the latest research on the Higgs boson
was truly extraordinary. Not only was the scientific achievement
remarkable, but medias reporting of 5-sigma as a measure of “certainty”
was also truly remarkable. For instance, the science editor at the
Swedish news paper Dagens Nyheter reported that a sigma of 4.9 equals a
certainty of 99.99994 %, which obviously isn’t true, simply because *p*(
*D* | *H0* ) is not the same as *p*( *H0* | *D* ). In plain english this
means that a *p*-value represents the conditional probability of getting
the data given that the null hypothesis is true. Nothing more, and it
surely does not give the probability for the alternative hypothesis being
true, i.e. the "certainty" that somethings been found that is not a
random fluctuation.

So what does physicists mean when they report 5-sigma? Well, it’s just
another convention of reporting alpha values. Sigma refers to the
population standard deviation, and 5-sigma means that they accept events
as statistical significant if they fall more than 5 standard deviations
away from the mean, given that the null hypothesis is true. And here the
null hypothesis is that the event is simply due to random noise or
fluctuations. You can get the *p*-values for 5-sigma by taking the area
under the normal curve that’s to the left of +5 sigma.

> pnorm(5) [1] 0.9999997

And then take 1 - 0.9999997 to get the *p*-value, which is 0.0000003 as
the CERN researchers performed a one-tailed test. I imagine physicists
say 5-sigma because saying "point zero zero zero zero zero zero three"
might become quite tiresome, so it's quite ironic that journalist all
over the world seem to be converting sigma back to percent.

If we want we can also use R and ggplot2 to illustrate 5-sigma by plotting the normal distribution and superimpose a line at sigma 5

library(ggplot2) x <- seq(-6,6,length=200) # sigmas y <- dnorm(x) # curve df <- data.frame("sigma" = x,"y" = y) # create data frame # plot text_block <- "A confidence level = 5-sigma represents \nthe probability of getting a result from your \nexperiment, simply from random fluctuations \nalone, equal to the area under the curve \nthat’s to the right of the dotted line. That’s an \nexceptionally rare event. However, the area to \nthe left of 5-sigma does not represent the \nprobability or certainty that the Higgs boson \nhas been found." ggplot(df, aes(sigma,y)) + geom_line(size=1) + annotate("text", x=1.7, y=0.2, label=text_block, size=4, hjust=0) + annotate("segment", x = 5, xend=5, y = 0, yend = 0.05, linetype="dashed") + annotate("text", x=5, y=0.05, label="5-sigma", vjust=-0.5)

*Note: The actual plot below has been fine-tuned in Illustrator*.

The area under the curve that's to the right of the dotted line
represents the p-value for 5-sigma. We see that observations in that
area are highly unlikely to occur *if we assume that the null hypothesis
is true.*