Reference medical datasets (MosMedData) for independent external evaluation of algorithms based on artificial intelligence in diagnostics

Cover Page

Cite item

Abstract

The article describes a novel approach to creating annotated medical datasets for testing artificial intelligence-based diagnostic solutions. Moreover, there are four stages of dataset formation described: planning, selection of initial data, marking and verification, and documentation. There are also examples of datasets created using the described methods. The technique is scalable and versatile, and it can be applied to other areas of medicine and healthcare that are being automated and developed using artificial intelligence and big data technologies.

About the authors

Nikolay A. Pavlov

Moscow Center for Diagnostics and Telemedicine

Author for correspondence.
Email: n.pavlov@npcmr.ru
ORCID iD: 0000-0002-4309-1868
SPIN-code: 9960-4160
https://pavlov.rocks
Russian Federation, 28-1, Srednyaya Kalitnikovskaya street, 109029, Moscow

Anna E. Andreychenko

Moscow Center for Diagnostics and Telemedicine

Email: a.andreychenko@npcmr.ru
ORCID iD: 0000-0001-6359-0763
SPIN-code: 6625-4186

PhD

Russian Federation, 28-1, Srednyaya Kalitnikovskaya street, 109029, Moscow

Anton V. Vladzymyrskyy

Moscow Center for Diagnostics and Telemedicine

Email: a.vladzimirsky@npcmr.ru
ORCID iD: 0000-0002-2990-7736
SPIN-code: 3602-7120

MD, Dr. Sci. (Med.)

Russian Federation, 28-1, Srednyaya Kalitnikovskaya street, 109029, Moscow

Anush A. Revazyan

Moscow Center for Diagnostics and Telemedicine

Email: anushrevazyan@gmail.com
ORCID iD: 0000-0003-1589-2382
Russian Federation, 28-1, Srednyaya Kalitnikovskaya street, 109029, Moscow

Yury S. Kirpichev

Moscow Center for Diagnostics and Telemedicine

Email: y.kirpichev@npcmr.ru
ORCID iD: 0000-0002-9583-5187
SPIN-code: 3362-3428
Russian Federation, 28-1, Srednyaya Kalitnikovskaya street, 109029, Moscow

Sergey P. Morozov

Moscow Center for Diagnostics and Telemedicine

Email: morozov@npcmr.ru
ORCID iD: 0000-0001-6545-6170
SPIN-code: 8542-1720

MD, Dr. Sci. (Med.), Professor

Russian Federation, 28-1, Srednyaya Kalitnikovskaya street, 109029, Moscow

References

  1. Gusev AV. Prospects for neural networks and deep machine learning in creating health solutions (Compex medical information system, Russian). Vrach i Informatsionnye Tekhnologii. 2017;(3):92–105. (In Russ).
  2. Ranschaert ER, Morozov S, Algra PR, eds. Artificial intelligence in medical imaging. Cham: Springer International Publishing; 2019. doi: 10.1007/978-3-319-94878-2
  3. Griffith B, Kadom N, Straus CM. Radiology Education in the 21st Century: Threats and Opportunities. J Am Coll Radiol. 2019;16(10):1482–1487. doi: 10.1016/j.jacr.2019.04.003
  4. Savadjiev P, Chong J, Dohan A, et al. Demystification of AI-driven medical image interpretation: past, present and future. Eur Radiol. 2019:29(3):1616–1624. doi: 10.1007/s00330-018-5674-x
  5. Ng А. What artificial intelligence can and can’t do right now. Harvard Business Review; 2016. Available from: https://hbr.org/2016/11/what-artificial-intelligence-can-and-cant-do-right-now
  6. Renear H, Sacchi S, Wickett KM. Definitions of dataset in the scientific and technical literature. Proceedings of the American Society for Information Science and Technology. 2010;47(1):1-4. doi: 10.1002/meet.14504701240
  7. Tan SL, Gao G, Koch S. Big data and analytics in healthcare. Methods Inf Med. 2015;54(6):546–547. doi: 10.3414/ME15-06-1001
  8. Kohli MD, Summers RM, Geis JR. Medical image data and datasets in the era of machine learning—whitepaper from the 2016 C- MIMI meeting dataset session. J Digit Imaging. 2017;30(4):392–399. doi: 10.1007/s10278-017-9976-3
  9. Willemink MJ, Koszek WA, Hardell C, et al. Preparing medical imaging data for machine learning. Radiology. 2020;295(1):4–15. doi: 10.1148/radiol.2020192224
  10. Morozov SP, Shelekhov PV, Vladzymyrsky AV. Modern approaches to the radiology service improvement. Health Care Standardization Problems. 2019;(5-6):30−34. (In Russ). doi: 10.26347/1607-2502201905-06030-034
  11. Kulberg NS, Gusev MA, Reshetnikov RV, et al. Methodology and tools for creating training samples for artificial intelligence systems for recognizing lung cancer on CT images. Health Care Russian Federation. 2021;64(6):343–350. doi: 10.46563/0044-197x-2020-64-6-343-350
  12. Preston-Werner T. Semantic Versioning 2.0.0 [Internet]. Available from: https://semver.org
  13. Morozov SP, Protsenko DN, Smetanina SV, et al. Radiation diagnostics of coronavirus disease (COVID-19): organization, methodology, interpretation of results: Preprint No.CDT ― 2020 ― II. Version 2 from 17.04.2020. The series “Best practices of radiation and instrumental diagnostics”. Issue 65. Moscow : Scientific and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Department of Health; 2020. 80 p. (In Russ). Avalable from: https://tele-med.ai/biblioteka-dokumentov/luchevaya-diagnostika-koronavirusnoj-bolezni-covid-19-organizaciya-metodologiya-interpretaciya-rezultatov
  14. Pavlov N. ECR 2021: Value of technical stratification of medical datasets for AI services. Moscow, 2021. [Internet]. Available from: https://connect.myesr.org/course/ai-in-breast-imaging/
  15. Morozov SP, Vladzymyrskyy A, Andreychenko A, et al. Moscow experiment on computer vision in radiology: involvement and participation of radiologists. Vrach i informacionnye tehnologii. 2020;(4):14–23. doi: 10.37690/1811-0193-2020-4-14-23
  16. Morozov SP, Vladzymyrskyy AV, Klyashtornyy VG, et al. Clinical acceptance of software based on artificial intelligence technologies (radiology). Series “Best practices in medical imaging”. Issue 57. Moscow; 2019. 45 p.
  17. Morozov SP, Andreychenko AE, Pavlov NA, et al. MosMedData: Chest CT scans with COVID-19 related findings dataset. medRxiv. 2020. doi: 10.1101/2020.05.20.20100362
  18. Sushentsev N, Bura V, Kotniket M, et al. A head-to-head comparison of the intra- and interobserver agreement of COVID-RADS and CO-RADS grading systems in a population with high estimated prevalence of COVID-19. BJR Open. 2020;2(1):20200053. doi: 10.1259/bjro.20200053
  19. Jin C, Chen W, Caoet Y, et al. Development and evaluation of an artificial intelligence system for COVID-19 diagnosis. Nat Commun. 2020;11(1):5088. doi: 10.1038/s41467-020-18685-1

Supplementary files

Supplementary Files
Action
1. JATS XML
2. Fig. 1. Stages of forming a medical dataset.

Download (88KB)
3. Fig. 2. Relationships among the clinical task, dataset, and success in the implementation of a solution based on artificial intelligence (AI) in routine clinical practice.

Download (162KB)
4. Fig. 3. Datasets of the Moscow experiment on the use of innovative technologies in the field of computer vision for the analysis of medical images and further use in the healthcare system of Moscow, prepared according to this method.

Download (434KB)
5. Fig. 4. Classification of markup by labor costs and degree of verification

Download (293KB)
6. Fig. 5. Basic structure of the README file.

Download (222KB)

Copyright (c) 2021 Pavlov N.A., Andreychenko A.E., Vladzymyrskyy A.V., Revazyan A.A., Kirpichev Y.S., Morozov S.P.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

This website uses cookies

You consent to our cookies if you continue to use our website.

About Cookies