ONTOLOGIES AS A FOUNDATION FOR FORMALIZATION OF SCIENTIFIC INFORMATION AND EXTRACTING NEW KNOWLEDGE

A. S. Bubnov; Бубнов А. С.; N. I. Gallini; Галлини Н. И.; I. Yu. Grishin; Гришин И. Ю.; I. M. Kobozeva; Кобозева И. М.; N. V. Lukashevich; Лукашевич Н. В.; M. B. Panich; Панич М. Б.; E. N. Raevsky; Раевский Е. Н.; F. A. Sadkovsky; Садковский Ф. А.; R. R. Timirgaleeva; Тимиргалеева Р. Р.

doi:10.31857/S2686954324060122

ONTOLOGIES AS A FOUNDATION FOR FORMALIZATION OF SCIENTIFIC INFORMATION AND EXTRACTING NEW KNOWLEDGE

Authors: Bubnov A.S.¹, Gallini N.I.², Grishin I.Y.³, Kobozeva I.M.⁴, Lukashevich N.V.⁵, Panich M.B.³, Raevsky E.N.⁶, Sadkovsky F.A.³, Timirgaleeva R.R.³
Affiliations:
1. Knowledge Engineering Laboratory, Institute for Mathematical Research of Complex Systems, Lomonosov Moscow State University
2. Vernadsky Crimean Federal University
3. Branch of Lomonosov Moscow State University in the city of Sevastopol
4. Faculty of Philology, Lomonosov Moscow State University
5. Research Computing Center, Lomonosov Moscow State University
6. Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University
Issue: Vol 520, No 1 (2024)
Pages: 82-89
Section: COMPUTER SCIENCE
URL: https://journals.rcsi.science/2686-9543/article/view/280135
DOI: https://doi.org/10.31857/S2686954324060122
EDN: https://elibrary.ru/KKGRGT
ID: 280135

Cite item

Full Text

Abstract
About the authors
References
Supplementary files
Statistics

Abstract

“Ark of Knowledge” is a digital project developed by M. V. Lomonosov Moscow State University. It provides access to fundamental knowledge in Russian and should play a key role in the preservation and dissemination of Russia’s cultural and scientific heritage. “Ark of Knowledge” is an ontological information system. The article discusses modern ideas about ontology, stages of creation, ontological features of BDT and Wikidata, as well as the design of an information system and the use of language models for training. The initial working prototype of this information system is briefly described. Work on creating the system is being carried out by researchers and programmers from the Knowledge Engineering Laboratory of the Institute for Mathematical Research of Complex Systems of Moscow State University, as well as scientists from the Faculty of Philology, Mechanics and Mathematics, the Faculty of Computational Mathematics and Cybernetics, and the Branch of Moscow State University in Sevastopol.

Keywords

ontology, information system, fundamental knowledge, ontology design, information system “Ark of Knowledge”, Great Russian Encyclopedia

References

Еременко Г. О. Elibrary.ru: курс на повышение качества контента // Университетская книга, 2016, 3. С. 62–68.
Ginsparg P. ArXiv at 20 // Nature, 2011, 476(7359). P. 145–147. https://doi.org/10.1038/476145a
Jain S. M. Introduction to transformers for NLP: With the Hugging Face library and models to solve problems // Berkeley, CA: Apress, 2022. P. 51–67. ISBN: 9781484288443.
Wang K., Shen Z., Huang C.-Y. et al. Microsoft academic graph: When experts are not enough // Quantitative Science Studies, 2020, 1(1). P. 396–413. https://doi.org/10.1162/qss_a_00021
Lund B. D., Wang T. Chatting about ChatGPT: how may AI and GPT impact academia and libraries? // Library hi tech news, 2023, 40(3). P. 26–29. https://doi.org/10.1108/LHTN-01-2023-0009
Haider J., Söderström K. R. Ekström B. et al. GPTfabricated scientific papers on Google Scholar: Key features, spread, and implications for preempting evidence manipulation // Harvard Kennedy School Misinformation Review, 2024, 5(5). P. 1–16.
Dadkhah M., Oermann M. H., Hegedüs M. et al. Detection of fake papers in the era of artificial intelligence // Diagnosis, 2023, 10(4). P. 390–397. https://doi.org/10.1515/dx-2023-0090
Wittau J., Seifert R. How to fight fake papers: a review on important information sources and steps towards solution of the problem // NaunynSchmiedeberg’s archives of pharmacology, 2024. P. 1–14. https://doi.org/10.1007/s00210-024-03272-8
Kendall G., da Silva J. A. T. Risks of abuse of large language models, like ChatGPT, in scientific publishing: Authorship, predatory publishing, and paper mills // Learned Publishing, 2024, 37(1). P. 55–62. https://doi.org/10.1002/leap.1578
Tirumala K., Simig D., Aghajanyan A. et al. D4: Improving LLM pretraining via document deduplication and diversification // Advances in Neural Information Processing Systems, 2023, 36. P. 53983–53995. https://doi.org/10.48550/arXiv.2308.12284
Beltagy I., Lo K., Cohen A. SciBERT: A Pretrained Language Model for Scientific Text // Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019. P. 3615–3620. https://doi.org/10.18653/v1/D19-1371
Gerasimenko N. A., Chernyavsky A. S., Nikiforova M. A. RuSciBERT: A transformer language model for obtaining semantic embeddings of scientific texts in Russian // Doklady Mathematics, 2022, 106, Suppl 1. P. S95–S96. https://doi.org/10.1134/S1064562422060072
Горячко В. В., Бубнов А. С., Раевский Е. В., Семенов А. Л. Цифровой ковчег знаний // Доклады Российской академии наук. Математика, информатика, процессы управления, 2022, 508(1). С. 128–133. https://doi.org/10.31857/S2686954322070098
Hogan A., Blomqvist E., Cochez M, et al. Knowledge graphs // ACM Computing Surveys (CSUR), 2021, 54(4). P. 1–37. https://doi.org/10.1145/344777
Dong X., Gabrilovich E., Heitz G., et al. Knowledge vault: A web-scale approach to probabilistic knowledge fusion // Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, 2014. P. 601–610. https://doi.org/10.1145/2623330.2623623
Vrandečić D., Krötzsch M. Wikidata: a free collaborative knowledgebase // Communications of the ACM, 2014, 57(10). P. 78–85. https://doi.org/10.1145/2629489
Shenoy K., Ilievski F., Daniel Garijo D., et al. A study of the quality of Wikidata // Journal of Web Semantics, 2022, 72. P. 100679. https://doi.org/10.1016/j.websem.2021.100679
Hug S. E., Ochsner M., Brändle M. P. Citation analysis with Microsoft academic // Scientometrics, 2017, 111. P. 371–378. https://doi.org/10.1007/s11192-017-2247-8
Васенин В. А. Афонин С. А., Голомазова Д. Д. и др. Интеллектуальная система тематического исследования научно-технической информации (ИСТИНА) // Информационное общество, 2013, 1–2. С. 39–57.
Козицын А. С., Афонин С. А. Алгоритм разрешения неоднозначности имен авторов в ИАС ИСТИНА // Современные информационные технологии и ИТ-образование, 2020, 16(1). С. 108–117. https://doi.org/10.25559/SITITO.16.202001.108-117
Семенов А. Л. Искусственный интеллект в обществе // Доклады РАН. Математика, информатика, процессы управления. Специальный выпуск “Технологии искусственного интеллекта и машинного обучения”. 2023, 514(2). С. 6–19. https://doi.org/10.31857/S2686954323350023
Wille R. Formal Concept Analysis as Mathematical Theory of Concepts and Concept Hierarchies // In: Ganter B., Stumme G., Wille R. (eds) Formal Concept Analysis. Lecture Notes in Computer Science, 2005, 3626. Springer, Berlin, Heidelberg. P. 1–33. https://doi.org/10.1007/11528784_1
Лукашевич Н. В., Добров Б. В., Павлов А. М., Штернов С. В. Онтологические ресурсы и информационно-аналитическая система в предметной области “безопасность” // Онтология проектирования, 2018, 1(27). https://cyberleninka.ru/article/n/ontologicheskie-resursy-i-informionno-analiticheskaya-sis-tema-v-predmetnoy-oblasti-bezopasnost (дата обращения: 01.10.2024).
Семенов А. Л., Раевский Е. Н., Бубнов А. С. и др. Универсальная энциклопедическая платформа работы со знанием // Современные информационные технологии и ИТ-образование. 2023, 19(3). С. 696–703.
https://doi.org/10.25559/SITITO.019.202303.696-703

Supplementary files

Supplementary Files

Action

1. JATS XML

Download

Username
Password
Remember me

Forgot password?	Register

Username
Password
Remember me

Forgot password?	Register

ONTOLOGIES AS A FOUNDATION FOR FORMALIZATION OF SCIENTIFIC INFORMATION AND EXTRACTING NEW KNOWLEDGE

Full Text

Abstract

Keywords

About the authors

References

Supplementary files