Interpretation of and alternatives to p-values in biomedical sciences

Andrej M. Grjibovski; Гржибовский Андрей Мечиславович; Anton N. Gvozdeckii; Гвоздецкий Антон Николаевич

doi:10.17816/humeco97249

Interpretation of and alternatives to p-values in biomedical sciences

作者: Grjibovski A.M.¹^,2^,3^,4, Gvozdeckii A.N.⁵
隶属关系:
1. Northern state medical university
2. Al-Farabi Kazakh national university
3. West Kazakhstan Marat Ospanov medical university
4. North-Eastern federal university
5. Mechnikov North-Western state medical university
期: 卷 29, 编号 3 (2022)
页面: 209-218
栏目: Articles
URL: https://journals.rcsi.science/1728-0869/article/view/97249
DOI: https://doi.org/10.17816/humeco97249
ID: 97249

如何引用文章

全文:

详细
全文:
作者简介
参考
补充文件
统计

详细

Existing difficulties in interpretation of the results of statistical analysis have been repeatedly mentioned as one of the factors behind poor reproducibility of research findings in biomedical sciences followed by a series of publications presenting alternatives to improve the situation including a abandonment of p-values and significance testing. In this paper we briefly present the scope of the problem as well as Fischer and Neyman–Pearson approaches to hypothesis testing. Moreover, we present confidence intervals and effect size calculation as alternatives to dichotomization of the results as significant or not significant using a certain cut-off level. In addition, we summarize the pros and cons of suggestion to change the cut-off value from traditional 0.05 to 0.005. We also present a list of the most common misunderstandings of p-values discussed in international statistical literature.

We conclude the paper with brief recommendations on careful interpretation of the results of statistical analysis to prevent misinterpretation and misuse of p-values in biomedical studies.

关键词

p-value, significance level, effect size, confidence interval, biomedical research, statistical analysis

全文:

##article.viewOnOriginalSite##

作者简介

Andrej Grjibovski

Northern state medical university; Al-Farabi Kazakh national university; West Kazakhstan Marat Ospanov medical university; North-Eastern federal university

编辑信件的主要联系方式.
Email: andrej.grjibovski@gmail.com
ORCID iD: 0000-0002-5464-0498
SPIN 代码: 5118-0081

MD, MPhil, PhD

俄罗斯联邦, Arkhangelsk; Almaty, Kazakhstan; Aktobe, Kazakhstan; Yakutsk

Anton Gvozdeckii

Mechnikov North-Western state medical university

Email: gvozdetskiy_an@outlook.com
ORCID iD: 0000-0001-8045-1220
SPIN 代码: 4430-6841

MD, Cand. Sci. (Med.)

俄罗斯联邦, St. Petersburg

参考

Polonioli A, Vega-Mendoza M, Blankinship B, Carmel D. Reporting in experimental philosophy: current standards and recommendations for future practice. Rev Philos Psychol. 2021;12(1):49–73. doi: 10.1007/s13164-018-0414-3
Amrhein V, Trafimow D, Greenland S. Inferential statistics as descriptive statistics: there is no replication crisis if we don’t expect replication. The American statistician. 2019;73(supl. 1):262–270. doi: 10.1080/00031305.2018.1543137
Amrhein V, Korner-Nievergelt F, Roth T. The earth is flat (p >0.05): significance thresholds and the crisis of unreplicable research. PeerJ. 2017;5:e3544. doi: 10.7717/peerj.3544
Szucs D, Ioannidis J.P.A. When null hypothesis significance testing is unsuitable for research: a reassessment. Front Hum Neurosci. 2017;11:390. doi: 10.3389/fnhum.2017.00390
Akanov A, Turdaliyeva BS, Izekenova AK, et al. Assessment of use of statistical methods in scientific articles of the Kazakhstan’s medical journals. Ekologiya cheloveka (Human Ecology). 2013;20(5):61–64. (In Russ).
Dorey F. The p value: what is it and what does it tell you? Clin Orthop Relat Res. 2010;468(8):2297–2298. doi: 10.1007/s11999-010-1402-9
Haller H. Misinterpretations of significance: a problem students share with their teachers? Methods of psychological research. 2002;7(1):1–20.
Palesch YY. Some common misperceptions about p-values. Stroke. 2014;45(12):e244–e246. doi: 10.1161/STROKEAHA.114.006138
Zorin NA. «Validity» or «significance» — 12 years later. Pediatric Pharmacology. 2011;8(5):13–19. (In Russ).
Kmetz JL. Correcting corrupt research: recommendations for the profession to stop misuse of p-values. The American statistician. 2019;73(supl. 1):36–45. doi: 10.1080/00031305.2018.1518271
McShane BB. Abandon statistical significance. The American statistician. 2019;73(supl 1):235–245. doi: 10.1080/00031305.2018.1527253
Perezgonzalez JD. Fisher, Neyman–Pearson or NHST? A tutorial for teaching data testing. Front Psychol. 2015;6:223. doi: 10.3389/fpsyg.2015.00223
Lew MJ. Bad statistical practice in pharmacology (and other basic biomedical disciplines): you probably don’t know p: statistical inference using p-values. Br J Pharmacol. 2012;166(5):1559–1567. doi: 10.1111/j.1476-5381.2012.01931.x
Pernet C. Null hypothesis significance testing: a guide to commonly misunderstood concepts and recommendations for good practice. F1000Research. 2017;4:621. doi: 10.12688/f1000research.6963.5
Serdar CC, Cihan M, Yücel D, Serdar MA. Sample size, power and effect size revisited: simplified and practical approaches in pre-clinical, clinical and laboratory studies. Biochem Med (Zagreb). 2021;31(1)010502. doi: 10.11613/BM.2021.010502
Lee DK. Alternatives to p value: confidence interval and effect size. Korean J Anesthesiol. 2016;69(6):555–562. doi: 10.4097/kjae.2016.69.6.555
Grissom RJ, Kim JJ. Effect sizes for research. 2nd ed. New York: Routledge; 2012. doi: 10.4324/9780203803233
Sullivan GM, Feinn R. using effect size — or why the p value is not enough. J Grad Med Educ. 2012;4(3):279–282. doi: 10.4300/JGME-D-12-00156.1
Colquhoun D. An investigation of the false discovery rate and the misinterpretation of p-values. R Soc Open Sci. 2014;1(3):140216. doi: 10.1098/rsos.140216
Stahel WA. New relevance and significance measures to replace p-values. PLoS One. 2021;16(6):e0252991. doi: 10.1371/journal.pone.0252991
Anderson N.D. Teaching signal detection theory with pseudoscience. Front Psychol. 2015;6:762. doi: 10.3389/fpsyg.2015.00762
Benjamin DJ, Berger JO, Johannesson M, et al. Redefine statistical significance. Nat Hum Behav. 2018;2(1):6–10. doi: 10.1038/s41562-017-0189-z
Rubanovich AV. Redefining the critical value of significance level (0.005 instead of 0.05): the bayes trace. Radiation biology. Radioecology. 2018;58(5):453–462. (In Russ). doi: 10.1134/S0869803118050156
Betensky RA. The p-value requires context, not a threshold. The American statistician. 2019;73(supl. 1):115–117. doi: 10.1080/00031305.2018.1529624
Lakens D, Adolfi, FG, Albers CJ, et al. Justify your alpha. Nature human behaviour. 2018;2(3):168–171. doi: 10.1038/s41562-018-0311-x
Di Leo G, Sardanelli F. Statistical significance: p value, 0.05 threshold, and applications to radiomics — reasons for a conservative approach. Eur Radiol Exp. 2020;4(1):1–8. doi: 10.1186/s41747-020-0145-y
Vexler A. Valid p-values and expectations of p-values revisited // Ann Inst Stat Math. 2021;73:227–248. doi: 10.1007/s10463-021-00800-8

补充文件

附件文件

动作

1. JATS XML

下载

2. Fig. 1. Graphical comparison of approaches to statistical hypothesis testing: а — Fisher significance testing; b — Neymann–Pearson acceptance testing; sig. — significance level, d —effect size, α — probability of Type I error, β — probability of Type II error.