Interpretation of and alternatives to p-values in biomedical sciences

Cover Page

Cite item

Full Text

Abstract

Existing difficulties in interpretation of the results of statistical analysis have been repeatedly mentioned as one of the factors behind poor reproducibility of research findings in biomedical sciences followed by a series of publications presenting alternatives to improve the situation including a abandonment of p-values and significance testing. In this paper we briefly present the scope of the problem as well as Fischer and Neyman–Pearson approaches to hypothesis testing. Moreover, we present confidence intervals and effect size calculation as alternatives to dichotomization of the results as significant or not significant using a certain cut-off level. In addition, we summarize the pros and cons of suggestion to change the cut-off value from traditional 0.05 to 0.005. We also present a list of the most common misunderstandings of p-values discussed in international statistical literature.

We conclude the paper with brief recommendations on careful interpretation of the results of statistical analysis to prevent misinterpretation and misuse of p-values in biomedical studies.

About the authors

Andrej M. Grjibovski

Northern state medical university; Al-Farabi Kazakh national university; West Kazakhstan Marat Ospanov medical university; North-Eastern federal university

Author for correspondence.
Email: andrej.grjibovski@gmail.com
ORCID iD: 0000-0002-5464-0498
SPIN-code: 5118-0081

MD, MPhil, PhD

Russian Federation, Arkhangelsk; Almaty, Kazakhstan; Aktobe, Kazakhstan; Yakutsk

Anton N. Gvozdeckii

Mechnikov North-Western state medical university

Email: gvozdetskiy_an@outlook.com
ORCID iD: 0000-0001-8045-1220
SPIN-code: 4430-6841

MD, Cand. Sci. (Med.)

Russian Federation, St. Petersburg

References

  1. Polonioli A, Vega-Mendoza M, Blankinship B, Carmel D. Reporting in experimental philosophy: current standards and recommendations for future practice. Rev Philos Psychol. 2021;12(1):49–73. doi: 10.1007/s13164-018-0414-3
  2. Amrhein V, Trafimow D, Greenland S. Inferential statistics as descriptive statistics: there is no replication crisis if we don’t expect replication. The American statistician. 2019;73(supl. 1):262–270. doi: 10.1080/00031305.2018.1543137
  3. Amrhein V, Korner-Nievergelt F, Roth T. The earth is flat (p >0.05): significance thresholds and the crisis of unreplicable research. PeerJ. 2017;5:e3544. doi: 10.7717/peerj.3544
  4. Szucs D, Ioannidis J.P.A. When null hypothesis significance testing is unsuitable for research: a reassessment. Front Hum Neurosci. 2017;11:390. doi: 10.3389/fnhum.2017.00390
  5. Akanov A, Turdaliyeva BS, Izekenova AK, et al. Assessment of use of statistical methods in scientific articles of the Kazakhstan’s medical journals. Ekologiya cheloveka (Human Ecology). 2013;20(5):61–64. (In Russ).
  6. Dorey F. The p value: what is it and what does it tell you? Clin Orthop Relat Res. 2010;468(8):2297–2298. doi: 10.1007/s11999-010-1402-9
  7. Haller H. Misinterpretations of significance: a problem students share with their teachers? Methods of psychological research. 2002;7(1):1–20.
  8. Palesch YY. Some common misperceptions about p-values. Stroke. 2014;45(12):e244–e246. doi: 10.1161/STROKEAHA.114.006138
  9. Zorin NA. «Validity» or «significance» — 12 years later. Pediatric Pharmacology. 2011;8(5):13–19. (In Russ).
  10. Kmetz JL. Correcting corrupt research: recommendations for the profession to stop misuse of p-values. The American statistician. 2019;73(supl. 1):36–45. doi: 10.1080/00031305.2018.1518271
  11. McShane BB. Abandon statistical significance. The American statistician. 2019;73(supl 1):235–245. doi: 10.1080/00031305.2018.1527253
  12. Perezgonzalez JD. Fisher, Neyman–Pearson or NHST? A tutorial for teaching data testing. Front Psychol. 2015;6:223. doi: 10.3389/fpsyg.2015.00223
  13. Lew MJ. Bad statistical practice in pharmacology (and other basic biomedical disciplines): you probably don’t know p: statistical inference using p-values. Br J Pharmacol. 2012;166(5):1559–1567. doi: 10.1111/j.1476-5381.2012.01931.x
  14. Pernet C. Null hypothesis significance testing: a guide to commonly misunderstood concepts and recommendations for good practice. F1000Research. 2017;4:621. doi: 10.12688/f1000research.6963.5
  15. Serdar CC, Cihan M, Yücel D, Serdar MA. Sample size, power and effect size revisited: simplified and practical approaches in pre-clinical, clinical and laboratory studies. Biochem Med (Zagreb). 2021;31(1)010502. doi: 10.11613/BM.2021.010502
  16. Lee DK. Alternatives to p value: confidence interval and effect size. Korean J Anesthesiol. 2016;69(6):555–562. doi: 10.4097/kjae.2016.69.6.555
  17. Grissom RJ, Kim JJ. Effect sizes for research. 2nd ed. New York: Routledge; 2012. doi: 10.4324/9780203803233
  18. Sullivan GM, Feinn R. using effect size — or why the p value is not enough. J Grad Med Educ. 2012;4(3):279–282. doi: 10.4300/JGME-D-12-00156.1
  19. Colquhoun D. An investigation of the false discovery rate and the misinterpretation of p-values. R Soc Open Sci. 2014;1(3):140216. doi: 10.1098/rsos.140216
  20. Stahel WA. New relevance and significance measures to replace p-values. PLoS One. 2021;16(6):e0252991. doi: 10.1371/journal.pone.0252991
  21. Anderson N.D. Teaching signal detection theory with pseudoscience. Front Psychol. 2015;6:762. doi: 10.3389/fpsyg.2015.00762
  22. Benjamin DJ, Berger JO, Johannesson M, et al. Redefine statistical significance. Nat Hum Behav. 2018;2(1):6–10. doi: 10.1038/s41562-017-0189-z
  23. Rubanovich AV. Redefining the critical value of significance level (0.005 instead of 0.05): the bayes trace. Radiation biology. Radioecology. 2018;58(5):453–462. (In Russ). doi: 10.1134/S0869803118050156
  24. Betensky RA. The p-value requires context, not a threshold. The American statistician. 2019;73(supl. 1):115–117. doi: 10.1080/00031305.2018.1529624
  25. Lakens D, Adolfi, FG, Albers CJ, et al. Justify your alpha. Nature human behaviour. 2018;2(3):168–171. doi: 10.1038/s41562-018-0311-x
  26. Di Leo G, Sardanelli F. Statistical significance: p value, 0.05 threshold, and applications to radiomics — reasons for a conservative approach. Eur Radiol Exp. 2020;4(1):1–8. doi: 10.1186/s41747-020-0145-y
  27. Vexler A. Valid p-values and expectations of p-values revisited // Ann Inst Stat Math. 2021;73:227–248. doi: 10.1007/s10463-021-00800-8

Supplementary files

Supplementary Files
Action
1. JATS XML
2. Fig. 1. Graphical comparison of approaches to statistical hypothesis testing: а — Fisher significance testing; b — Neymann–Pearson acceptance testing; sig. — significance level, d —effect size, α — probability of Type I error, β — probability of Type II error.

Download (71KB)
3. Fig. 2. Demonstration of the confidence interval concept.

Download (177KB)
4. Fig. 3. Change in the width of the confidence interval depending on the number of observations.

Download (225KB)
5. Fig. 4. Demonstration of the effect size concept.

Download (63KB)

Copyright (c) 2022 Grjibovski A.M., Gvozdeckii A.N.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
 


This website uses cookies

You consent to our cookies if you continue to use our website.

About Cookies