Sample size calculation for cross-sectional studies

Cover Page

Cite item

Full Text

Abstract

The cross-sectional study design is widely prevalent in Russian medical literature. However, a significant number of these studies neglect to calculate the sample size during the planning phase, and the analysis often relies solely on basic bivariate statistics. This compromises the validity of the findings and increases the risk of drawing inaccurate conclusions.

The scientific rigor of a study depends on a quality of planning, a clear problem statement, and precise formulation of statistical hypotheses, which are then tested using the most appropriate analytical methods. At the core of this process lies the determination of the appropriate sample size. The primary objective of this article is to provide a comprehensive, step-by-step guide for the sample size calculation process. By adhering to our guidelines, researchers can ensure that their cross-sectional studies possess sufficient statistical power to generate meaningful results. We acknowledge the significance of tailoring sample size calculations to the specific objectives and data characteristics of each study. Therefore, our approach is designed to be flexible and adaptable, accommodating the unique requirements of diverse research endeavors.

There are several software options available for sample size calculation; however, we use the G*Power software for all the examples presented in this paper. Our guide is designed to provide practical understanding of the topic, with each step being accompanied by illustrative examples and detailed screenshots. This approach ensures that the material is not only understandable but also applicable in real-world scenarios. Furthermore, we take the extra step of interpreting every dialog box and screenshot, aiming to create a comfortable user experience with the software. We hope that this paper will serve as a valuable guide in the planning stage of a study, helping researchers to address a wider range of issues and reliably estimate the associations between selected exposures and the outcomes of interest with sufficient statistical power.

About the authors

Nikita A. Mitkin

Northern State Medical University

Author for correspondence.
Email: n.a.mitkin@gmail.com
ORCID iD: 0000-0002-0027-8155
Russian Federation, 51 Troickij avenue, 163061 Arhangel'sk

Sergei N. Drachev

Northern State Medical University

Email: drachevsn@mail.ru
ORCID iD: 0000-0002-1548-690X

md, cand. sci. (med.), mph, phd, associate professor

Russian Federation, 51 Troickij avenue, 163061 Arhangel'sk

Ekaterina A. Krieger

Northern State Medical University

Email: kate-krieger@mail.ru
ORCID iD: 0000-0001-5179-5737

md, cand. sci. (med.), mph, associate professor

Russian Federation, 51 Troickij avenue, 163061 Arhangel'sk

Vitaly A. Postoev

Northern State Medical University

Email: ispha@nsmu.ru
ORCID iD: 0000-0003-4982-4169

md, cand. sci. (med.), mph, phd, associate professor

Russian Federation, 51 Troickij avenue, 163061 Arhangel'sk

Andrej M. Grjibovski

Northern State Medical University; M.V. Lomonosov Northern (Arctic) Federal University

Email: a.grjibovski@yandex.ru
ORCID iD: 0000-0002-5464-0498

md, mphil, phd

Russian Federation, 51 Troickij avenue, 163061 Arhangel'sk; Arhangel'sk

References

  1. Kholmatova KK, Gorbatova MA, Kharkova OA, Grjibovski AM. Cross-sectional studies: planning, sample size, data analysis. Ekologiya cheloveka (Human Ecology). 2016;23(2):49–56. (In Russ). doi: 10.33396/1728-0869-2016-2-49-56
  2. Chan YH. Biostatistics 102: quantitative data — parametric & non-parametric tests. Singapore Med J. 2003;44(8):391–396.
  3. Kim HY. Analysis of variance (ANOVA) comparing means of more than two groups. Restor Dent Endod. 2014;39(1):74–77. doi: 10.5395/rde.2014.39.1.74.
  4. Rothman KJ, Greenland S, Lash TL. Modern Epidemiology. 3rd ed. Lippincott Williams & Wilkins; 2008. 758 p.
  5. Groenwold RH, Klungel OH, Grobbee DE, Hoes AW. Selection of confounding variables should not be based on observed associations with exposure. Eur J Epidemiol. 2011;26(8):589–593. doi: 10.1007/s10654-011-9606-1
  6. Duleba AJ, Olive DL. Regression analysis and multivariate analysis. Semin Reprod Endocrinol. 1996;14(2):139–153. doi: 10.1055/s-2007-1016322
  7. Sharashova EE, Kholmatova KK, Gorbatova MA, Grjibovski AM. Multivariable logistic regression using SPSS in health research. Science & Healthcare. 2017;(4):5–26. (In Russ).
  8. Agresti A. An introduction to categorical data analysis. 3rd ed. John Wiley & Sons; 2019. 400 p.
  9. Cameron A, Pravin K. Regression analysis of count data. 2nd ed. 1999. doi: 10.1017/CBO9780511814365
  10. Krieger EA, Drachev SN, Mitkin NA, et al. Sample size calculation using G*Power software. Marine Medicine. 2023;9(2):111–125. (In Russ). doi: 10.22328/2413-5747-2023-9-2-111-125
  11. Bewick V, Cheek L, Ball J. Statistics review 14: Logistic regression. Crit Care. 2005;9(1):112–118. doi: 10.1186/cc3045
  12. Adler NE, Epel ES, Castellazzo G, Ickovics JR. Relationship of subjective and objective social status with psychological and physiological functioning: preliminary data in healthy white women. Health Psychol. 2000;19(6):586–592. doi: 10.1037//0278-6133.19.6.586
  13. Neverlien PO. Assessment of a single-item dental anxiety question. Acta Odontol Scand. 1990;48(6):365–369. doi: 10.3109/00016359009029067
  14. Hsieh FY, Bloch DA, Larsen MD. A simple method of sample size calculation for linear and logistic regression. Stat Med. 1998;17(14):1623–1634. doi: 10.1002/(sici)1097-0258(19980730)17:14<1623::aid-sim871>3.0.co;2-s
  15. Steyerberg EW, Vergouwe Y. Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur Heart J. 2014;35(29):1925–1931. doi: 10.1093/eurheartj/ehu207
  16. Grjibovski AM, Ivanov SV, Gorbatova MA. Univariate regression analysis using Statistica and SPSS software. Science & Healthcare. 2017;(2):5–33. (In Russ).
  17. Ziegel ER, Neter J, Kutner M, et al. Applied linear statistical models. Technometrics. 1997;39(3):342. doi: 10.2307/1271154
  18. Novotny J, Bilokon P, Galiotos A, Délèze F. Machine learning and big data with kdb+/q. 2019. doi: 10.1002/9781119404729
  19. Cohen J. Statistical power analysis for the behavioral sciences. Hillsdale: Lawrence Erlbaum Associates; 1998.
  20. Kang H. Sample size determination and power analysis using the G*Power software. J Educ Eval Health Prof. 2021;18:17. doi: 10.3352/jeehp.2021.18.17
  21. International Committee of Medical Journal Editors. Uniform requirements for manuscripts submitted to biomedical journals. JAMA. 1997;277(11):927–934.

Supplementary files

Supplementary Files
Action
1. JATS XML
2. Fig. 1. G*Power main dialog box.

Download (67KB)
3. Fig. 2. G*Power dialog box with entered parameters and calculated sample size for logistic regression model with one independent numeric variable.

Download (55KB)
4. Fig. 3. G*Power dialog box with entered parameters and calculated sample size for logistic regression model with one independent binary variable.

Download (55KB)
5. Fig. 4. G*Power dialog box with entered parameters and calculated sample size for logistic regression model with several independent variables.

Download (54KB)
6. Fig. 5. Relationship between sample size and statistical power for odds ratios of 1.5 and 2.0, a two-tailed test, an outcome prevalence of 50%, an α-error level of 0.05, a risk factor prevalence of 60%, and a multivariate model determination coefficient of 0.2.

Download (145KB)
7. Fig. 6. G*Power dialog box for conducting post hoc analysis aimed at assessing statistical power for a logistic regression.

Download (52KB)
8. Fig. 7. G*Power dialog box evaluating statistical power of a study with the following input parameters: two-tailed test; odds ratios of 1.5 or greater; outcome prevalence of 25%; α-error level of 0.05; sample size of 1011; determination coefficient of 0.2; and risk factor prevalence of 15%.

Download (53KB)
9. Fig. 8. G*Power dialog box for calculating the "Effect size f2" in a linear regression analysis.

Download (25KB)
10. Fig. 9. G*Power dialog box presenting entered calculation parameters and an output for a simple linear regression model.

Download (56KB)
11. Fig. 10. G*Power dialog box displaying entered calculation parameters and results for a multiple linear regression model with several independent variables.

Download (58KB)
12. Fig. 11. G*Power dialog box displaying entered calculation parameters for squared multiple correlation coefficient for a multiple linear regression model with several independent variables.

Download (36KB)
13. Fig. 12. G*Power dialog box presenting entered calculation parameters and an outcome for a multiple linear regression model.

Download (62KB)

Copyright (c) 2023 Eco-Vector

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
 


This website uses cookies

You consent to our cookies if you continue to use our website.

About Cookies