The Higgs boson: 5-sigma and the concept of p-values

Today’s announcement at CERN of the latest research on the Higgs boson was truly extraordinary. Not only was the scientific achievement remarkable, but medias reporting of 5-sigma as a measure of “certainty” was also truly remarkable. For instance, the science editor at the Swedish news paper Dagens Nyheter reported that a sigma of 4.9 equals a certainty of 99.99994 %, which obviously isn’t true, simply because p( D | H0 ) is not the same as p( H0 | D ). In plain english this means that a p-value represents the conditional probability of getting the data given that the null hypothesis is true. Nothing more, and it surely doesn’t give the probability for the alternative hypothesis being true, i.e. the “certainty” that somethings been found that’s not a random fluctuation.

So what does physicists mean when they report 5-sigma? Well, it’s just another convention of reporting alpha values. Sigma refers to the population standard deviation, and 5-sigma means that they accept events as statistical significant if they fall more than 5 standard deviations away from the mean, given that the null hypothesis is true. And here the null hypothesis is that the event is simply due to random noise or fluctuations. You can get the p-values for 5-sigma by taking the area under the normal curve that’s to the left of +5 sigma.

> pnorm(5)
[1] 0.9999997

And then take 1 – 0.9999997 to get the p-value, which is 0.0000003 as the CERN researchers performed a one-tailed test. I imagine physicists say 5-sigma because saying “point zero zero zero zero zero zero three” might become quite tiresome, so it’s quite ironic that journalist all over the world seem to be converting sigma back to percent.

If we want we can also use R and ggplot2 to illustrate 5-sigma by plotting the normal distribution and superimpose a line at sigma 5

library(ggplot2)
x <- seq(-6,6,length=200) # sigmas
y <- dnorm(x) # curve

df <- data.frame("sigma" = x,"y" = y) # create data frame

# plot
text_block <- "A confidence level = 5-sigma represents \nthe probability of getting a result from your \nexperiment, simply from random fluctuations \nalone, equal to the area under the curve \nthat’s to the right of the dotted line. That’s an \nexceptionally rare event. However, the area to \nthe left of 5-sigma does not represent the \nprobability or certainty that the Higgs boson \nhas been found."
ggplot(df, aes(sigma,y)) +
geom_line(size=1) +
annotate("text", x=1.7, y=0.2, label=text_block, size=4, hjust=0) +
annotate("segment", x = 5, xend=5, y = 0, yend = 0.05, linetype="dashed") +
annotate("text", x=5, y=0.05, label="5-sigma", vjust=-0.5)

Note: The actual plot below has been fine-tuned in Illustrator.
Higgs boson cern 5-sigma and p-values

The area under the curve that's to the right of the dotted line represents the p-value for 5-sigma. We see that observations in that area are highly unlikely to occur if we assume that the null hypothesis is true.

Kristoffer Magnusson

I'm a PhD-student and a clinical psychologist from Sweden with a passion for research and statistics. This is my personal blog about psychological research and statistical programming with R.

Comments (7) Write a comment

  1. Really good article. However can you please tell me how you have created this plot? I am specifically asking how you superimpose that 9-line quotation in the right side, which is not there in you ggplot code.

    Thanks,

    Reply

    • Thanks you! I probably should’ve mentioned that the text is added in Adobe Illustrator. However, something similar can be done like this:

      text_block < - "A confidence level = 5-sigma represents \nthe probability of getting a result from your \nexperiment, simply from random fluctuations \nalone, equal to the area under the curve \nthat’s to the right of the dotted line. That’s an \nexceptionally rare event. However, the area to \nthe left of 5-sigma does not represent the \nprobability or certainty that the Higgs boson \nhas been found."
      ggplot(df, aes(sigma,y)) +
      geom_line(size=1) +
      annotate("text", x=1.7, y=0.2, label=text_block, size=4, hjust=0) +
      annotate("segment", x = 5, xend=5, y = 0, yend = 0.05, linetype="dashed") +
      annotate("text", x=5, y=0.05, label="5-sigma", vjust=-0.5)

      Reply

  2. Pingback: La Tribune de Drakken » Blog Archive » Le boson de Higgs existe

  3. Pingback: El Bosón de Higgs, a ritmo de habanera | Pensamiento Arrítmico

  4. Pingback: Higgs-Boson and p-values: A Response to Wasserman » Carlisle Rainey

  5. Pingback: Lancement de la revue “Statistique et société” | Polit’bistro : des politiques, du café

  6. Pingback: ‘partikel tuhan’ apa itu? | thekoist

Leave a Reply