Developing a computer system for student learning based on vision-language models

Eugeny Yu. Shchetinin; Щетинин Е. Ю.; Anastasia G. Glushkova; Глушкова А. Г.; Anastasia V. Demidova; Демидова А. В.

doi:10.22363/2658-4670-2024-32-2-234-241

Developing a computer system for student learning based on vision-language models

Authors: Shchetinin E.Y.¹, Glushkova A.G.², Demidova A.V.³
Affiliations:
1. Financial University under the Government of the Russian Federation
2. Endeavor
3. RUDN University
Issue: Vol 32, No 2 (2024)
Pages: 234-241
Section: Articles
URL: https://journals.rcsi.science/2658-4670/article/view/315422
DOI: https://doi.org/10.22363/2658-4670-2024-32-2-234-241
EDN: https://elibrary.ru/CUCXTY
ID: 315422

Cite item

Full Text

Abstract
About the authors
References
Supplementary files
Statistics

Abstract

In recent years, artificial intelligence methods have been developed in various fields, particularly in education. The development of computer systems for student learning is an important task and can significantly improve student learning. The development and implementation of deep learning methods in the educational process has gained immense popularity. The most successful among them are models that consider the multimodal nature of information, in particular the combination of text, sound, images, and video. The difficulty in processing such data is that combining multimodal input data by different channel concatenation methods that ignore the heterogeneity of different modalities is an inefficient approach. To solve this problem, an inter-channel attention module is proposed in this paper. The paper presents a computer vision-linguistic system of student learning process based on the concatenation of multimodal input data using the inter-channel attention module. It is shown that the creation of effective and flexible learning systems and technologies based on such models allows to adapt the educational process to the individual needs of students and increase its efficiency.

Keywords

deep learning, vision-language learning model, neural networks-transformers, throughchannel attention module

About the authors

Eugeny Yu. Shchetinin

Financial University under the Government of the Russian Federation

Author for correspondence.
Email: riviera-molto@mail.ru
ORCID iD: 0000-0003-3651-7629
Scopus Author ID: 16408533100
ResearcherId: O-8287-2017

Doctor of Physical and Mathematical Sciences, lecturer of Department of Mathematics

49 Leningradsky Ave, Moscow, 125993, Russian Federation

Anastasia G. Glushkova

Endeavor

Email: aglushkova@endeavorco.com
ORCID iD: 0000-0002-8285-0847
Scopus Author ID: 57485591900

researcher

Federation 2 Endeavor, London

Anastasia V. Demidova

RUDN University

Email: demidova-av@rudn.ru
ORCID iD: 0000-0003-1000-9650

Candidate of Physical and Mathematical Sciences, Assistant professor of Department of Probability Theory and Cyber Security

6 Miklukho-Maklaya St, Moscow, 117198, Russian Federation

References

Devlin, J., Chang, M., Lee, K. & K., T. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 2018.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł. & Polosukhin, I. Attention is All you Need in Advances in Neural Information Processing Systems (eds Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S. & Garnett, R.) 30 (Curran Associates, Inc., 2017), 5998-6008.
Liu Y. andOtt, M., Goyal N. andDu, J., Joshi, M., Chen, D., Levy, O., Lewis M. andZettlemoyer, L. & V., S. RoBERTa: A Robustly Optimized BERT Pretraining Approach 2019.
Clark, E. & Gardner, M. Simple and Effective Multi-Paragraph Reading Comprehension 2018.
Klein, G., Kim, Y., Deng, Y., Senellart, J. & Rush, A. OpenNMT: Open-Source Toolkit for Neural Machine Translation in Proceedings of ACL 2017, System Demonstrations (eds Bansal, M. & Ji, H.) 28 (Association for Computational Linguistics, Vancouver, Canada, July 2017), 67-72. doi: 10.18653/V1/P17-4012.
Radford, A., Narasimhan, K., Salimans, T. & Sutskever, I. Improving language understanding by generative pre-training https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.
Nogueira, R. & Cho, K. Passage Re-ranking with BERT 2019.
Schröder, S., Niekler, A. & Potthast, M. Revisiting Uncertainty-based Query Strategies for Active Learning with Transformers 2021.
Yang, F., Wang, X., Ma, H. & Li, J. Transformers-sklearn: a toolkit for medical language understanding with transformer-based models. BMC Medical Informatics and Decision Making 21, 141-157. doi: 10.1186/s12911-021-01459-0 (2021).
Rashid, M., Höhne, J., Schmitz, G. & Müller-Putz, G. A Review of Humanoid Robots Controlled by Brain-Computer Interfaces. Frontiers in Neurorobotics, 1-28 (2020).

Supplementary files

Supplementary Files

Action

1. JATS XML

Download

Username
Password
Remember me

Forgot password?	Register

Username
Password
Remember me

Forgot password?	Register

Vol 33, No 2 (2025)

Vol 33, No 2 (2025)

Developing a computer system for student learning based on vision-language models

Full Text

Abstract

Keywords

About the authors

Eugeny Yu. Shchetinin

Anastasia G. Glushkova

Anastasia V. Demidova

References

Supplementary files