Sample size calculation for cross-sectional studies

Nikita A. Mitkin; Митькин Никита Андреевич; Sergei N. Drachev; Драчев Сергей Николаевич; Ekaterina A. Krieger; Кригер Екатерина Анатольевна; Vitaly A. Postoev; Постоев Виталий Александрович; Andrej M. Grjibovski; Гржибовский Андрей Мечиславович

doi:10.17816/humeco569406

Sample size calculation for cross-sectional studies

作者: Mitkin N.A.¹, Drachev S.N.¹, Krieger E.A.¹, Postoev V.A.¹, Grjibovski A.M.¹^,2
隶属关系:
1. Northern State Medical University
2. M.V. Lomonosov Northern (Arctic) Federal University
期: 卷 30, 编号 7 (2023)
页面: 509-522
栏目: НАУЧНЫЙ ОБЗОР
URL: https://journals.rcsi.science/1728-0869/article/view/232041
DOI: https://doi.org/10.17816/humeco569406
ID: 232041

如何引用文章

全文:

详细
全文:
作者简介
参考
补充文件
统计

详细

The cross-sectional study design is widely prevalent in Russian medical literature. However, a significant number of these studies neglect to calculate the sample size during the planning phase, and the analysis often relies solely on basic bivariate statistics. This compromises the validity of the findings and increases the risk of drawing inaccurate conclusions.

The scientific rigor of a study depends on a quality of planning, a clear problem statement, and precise formulation of statistical hypotheses, which are then tested using the most appropriate analytical methods. At the core of this process lies the determination of the appropriate sample size. The primary objective of this article is to provide a comprehensive, step-by-step guide for the sample size calculation process. By adhering to our guidelines, researchers can ensure that their cross-sectional studies possess sufficient statistical power to generate meaningful results. We acknowledge the significance of tailoring sample size calculations to the specific objectives and data characteristics of each study. Therefore, our approach is designed to be flexible and adaptable, accommodating the unique requirements of diverse research endeavors.

There are several software options available for sample size calculation; however, we use the G*Power software for all the examples presented in this paper. Our guide is designed to provide practical understanding of the topic, with each step being accompanied by illustrative examples and detailed screenshots. This approach ensures that the material is not only understandable but also applicable in real-world scenarios. Furthermore, we take the extra step of interpreting every dialog box and screenshot, aiming to create a comfortable user experience with the software. We hope that this paper will serve as a valuable guide in the planning stage of a study, helping researchers to address a wider range of issues and reliably estimate the associations between selected exposures and the outcomes of interest with sufficient statistical power.

关键词

cross-sectional studies, sample size, regression analysis, G*Power

全文:

##article.viewOnOriginalSite##

作者简介

Nikita Mitkin

Northern State Medical University

编辑信件的主要联系方式.
Email: n.a.mitkin@gmail.com
ORCID iD: 0000-0002-0027-8155
俄罗斯联邦, 51 Troickij avenue, 163061 Arhangel'sk

Sergei Drachev

Northern State Medical University

Email: drachevsn@mail.ru
ORCID iD: 0000-0002-1548-690X

md, cand. sci. (med.), mph, phd, associate professor

俄罗斯联邦, 51 Troickij avenue, 163061 Arhangel'sk

Ekaterina Krieger

Northern State Medical University

Email: kate-krieger@mail.ru
ORCID iD: 0000-0001-5179-5737

md, cand. sci. (med.), mph, associate professor

俄罗斯联邦, 51 Troickij avenue, 163061 Arhangel'sk

Vitaly Postoev

Northern State Medical University

Email: ispha@nsmu.ru
ORCID iD: 0000-0003-4982-4169

md, cand. sci. (med.), mph, phd, associate professor

俄罗斯联邦, 51 Troickij avenue, 163061 Arhangel'sk

Andrej Grjibovski

Northern State Medical University; M.V. Lomonosov Northern (Arctic) Federal University

Email: a.grjibovski@yandex.ru
ORCID iD: 0000-0002-5464-0498

md, mphil, phd

俄罗斯联邦, 51 Troickij avenue, 163061 Arhangel'sk; Arhangel'sk

参考

Kholmatova KK, Gorbatova MA, Kharkova OA, Grjibovski AM. Cross-sectional studies: planning, sample size, data analysis. Ekologiya cheloveka (Human Ecology). 2016;23(2):49–56. (In Russ). doi: 10.33396/1728-0869-2016-2-49-56
Chan YH. Biostatistics 102: quantitative data — parametric & non-parametric tests. Singapore Med J. 2003;44(8):391–396.
Kim HY. Analysis of variance (ANOVA) comparing means of more than two groups. Restor Dent Endod. 2014;39(1):74–77. doi: 10.5395/rde.2014.39.1.74.
Rothman KJ, Greenland S, Lash TL. Modern Epidemiology. 3rd ed. Lippincott Williams & Wilkins; 2008. 758 p.
Groenwold RH, Klungel OH, Grobbee DE, Hoes AW. Selection of confounding variables should not be based on observed associations with exposure. Eur J Epidemiol. 2011;26(8):589–593. doi: 10.1007/s10654-011-9606-1
Duleba AJ, Olive DL. Regression analysis and multivariate analysis. Semin Reprod Endocrinol. 1996;14(2):139–153. doi: 10.1055/s-2007-1016322
Sharashova EE, Kholmatova KK, Gorbatova MA, Grjibovski AM. Multivariable logistic regression using SPSS in health research. Science & Healthcare. 2017;(4):5–26. (In Russ).
Agresti A. An introduction to categorical data analysis. 3rd ed. John Wiley & Sons; 2019. 400 p.
Cameron A, Pravin K. Regression analysis of count data. 2nd ed. 1999. doi: 10.1017/CBO9780511814365
Krieger EA, Drachev SN, Mitkin NA, et al. Sample size calculation using G*Power software. Marine Medicine. 2023;9(2):111–125. (In Russ). doi: 10.22328/2413-5747-2023-9-2-111-125
Bewick V, Cheek L, Ball J. Statistics review 14: Logistic regression. Crit Care. 2005;9(1):112–118. doi: 10.1186/cc3045
Adler NE, Epel ES, Castellazzo G, Ickovics JR. Relationship of subjective and objective social status with psychological and physiological functioning: preliminary data in healthy white women. Health Psychol. 2000;19(6):586–592. doi: 10.1037//0278-6133.19.6.586
Neverlien PO. Assessment of a single-item dental anxiety question. Acta Odontol Scand. 1990;48(6):365–369. doi: 10.3109/00016359009029067
Hsieh FY, Bloch DA, Larsen MD. A simple method of sample size calculation for linear and logistic regression. Stat Med. 1998;17(14):1623–1634. doi: 10.1002/(sici)1097-0258(19980730)17:14<1623::aid-sim871>3.0.co;2-s
Steyerberg EW, Vergouwe Y. Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur Heart J. 2014;35(29):1925–1931. doi: 10.1093/eurheartj/ehu207
Grjibovski AM, Ivanov SV, Gorbatova MA. Univariate regression analysis using Statistica and SPSS software. Science & Healthcare. 2017;(2):5–33. (In Russ).
Ziegel ER, Neter J, Kutner M, et al. Applied linear statistical models. Technometrics. 1997;39(3):342. doi: 10.2307/1271154
Novotny J, Bilokon P, Galiotos A, Délèze F. Machine learning and big data with kdb+/q. 2019. doi: 10.1002/9781119404729
Cohen J. Statistical power analysis for the behavioral sciences. Hillsdale: Lawrence Erlbaum Associates; 1998.
Kang H. Sample size determination and power analysis using the G*Power software. J Educ Eval Health Prof. 2021;18:17. doi: 10.3352/jeehp.2021.18.17
International Committee of Medical Journal Editors. Uniform requirements for manuscripts submitted to biomedical journals. JAMA. 1997;277(11):927–934.

补充文件

附件文件

动作

1. JATS XML

下载

2. Fig. 1. G*Power main dialog box.

下载 (67KB)

索引源数据

3. Fig. 2. G*Power dialog box with entered parameters and calculated sample size for logistic regression model with one independent numeric variable.

下载 (55KB)

索引源数据

4. Fig. 3. G*Power dialog box with entered parameters and calculated sample size for logistic regression model with one independent binary variable.

下载 (55KB)

索引源数据

5. Fig. 4. G*Power dialog box with entered parameters and calculated sample size for logistic regression model with several independent variables.

下载 (54KB)

索引源数据

6. Fig. 5. Relationship between sample size and statistical power for odds ratios of 1.5 and 2.0, a two-tailed test, an outcome prevalence of 50%, an α-error level of 0.05, a risk factor prevalence of 60%, and a multivariate model determination coefficient of 0.2.

下载 (145KB)

索引源数据

7. Fig. 6. G*Power dialog box for conducting post hoc analysis aimed at assessing statistical power for a logistic regression.

下载 (52KB)

索引源数据

8. Fig. 7. G*Power dialog box evaluating statistical power of a study with the following input parameters: two-tailed test; odds ratios of 1.5 or greater; outcome prevalence of 25%; α-error level of 0.05; sample size of 1011; determination coefficient of 0.2; and risk factor prevalence of 15%.

下载 (53KB)

索引源数据

9. Fig. 8. G*Power dialog box for calculating the "Effect size f2" in a linear regression analysis.

下载 (25KB)

索引源数据

10. Fig. 9. G*Power dialog box presenting entered calculation parameters and an output for a simple linear regression model.

下载 (56KB)

索引源数据

11. Fig. 10. G*Power dialog box displaying entered calculation parameters and results for a multiple linear regression model with several independent variables.

下载 (58KB)

索引源数据

12. Fig. 11. G*Power dialog box displaying entered calculation parameters for squared multiple correlation coefficient for a multiple linear regression model with several independent variables.

下载 (36KB)

索引源数据

13. Fig. 12. G*Power dialog box presenting entered calculation parameters and an outcome for a multiple linear regression model.

下载 (62KB)

索引源数据

用户名
密码
记住我

忘记您的密码?	注册

用户名
密码
记住我

忘记您的密码?	注册