Approaches and tools for Russian text linguistic profiling
- Авторлар: Solnyshkina M.I.1, Solovyev V.D.1, Ebzeeva Y.N.2
-
Мекемелер:
- Kazan (Volga Region) Federal University
- RUDN University
- Шығарылым: Том 22, № 4 (2024): LINGUISTIC PROFILES OF RUSSIAN TEXTS: GOING FROM FORM TO MEANING
- Беттер: 501-517
- Бөлім: Editorial Note
- URL: https://journals.rcsi.science/2618-8163/article/view/324744
- DOI: https://doi.org/10.22363/2618-8163-2024-22-4-501-517
- EDN: https://elibrary.ru/AMYSNF
- ID: 324744
Дәйексөз келтіру
Аннотация
Approaches and tools for assessing linguistic and cognitive complexity of educational texts are in demand both in science and teaching. Predicting difficulties of perception and understanding and ranking texts by classes, i.e. the number of years of learning or levels of language proficiency (A1-C2), are of particular importance for education. The study is aimed at demonstrating modern methodologies, algorithms, and tools for analyzing Russian texts in text profiler and automatic analyzer RuLingva and at presenting articles from the thematic issue on comprehensive analysis of Russian language textbooks for Russian and Belarusian schools. The research demonstrates that the modern paradigm of discourse complexology is based on the methods of stylistic statistics, which identifies functional characteristics of language units and verifies them using big data. The services on RuLingva are designed for teachers and researchers; they automatically analyze educational texts and predict their target audience based on readability, lexical diversity, abstractness, frequency, and terminological density. In “Russian as a Foreign Language” mode, RuLingva downloads lists of words from the text according to each level of language proficiency and estimates their proportion. This provides material for pre- and post-text work. RuLingva algorithm is based on the typology of educational texts and is to be supplied with tools for assessing a person’s verbal intelligence and reading literacy. The nearest prospect of RuLingva lies in widening the range of complexity predictors and installing automatic subject area discriminator. Both directions are planned to be implemented using neural networks, classification models, “typological passports” of educational texts with different complexity, and thematic orientation.
Авторлар туралы
Marina Solnyshkina
Kazan (Volga Region) Federal University
Хат алмасуға жауапты Автор.
Email: mesoln@yandex.ru
ORCID iD: 0000-0003-1885-3039
SPIN-код: 6480-1830
Scopus Author ID: 56429529500
ResearcherId: E-3863-2015
Doctor Habil. of Philology, Professor of the Department of Theory and Practice of Teaching Foreign Languages, Head of “Multidisciplinary Text Investigation” Research Lab, Institute of Philology and Intercultural Communication
18 Kremlevskaya St, Kazan, 420008, Russian FederationValery Solovyev
Kazan (Volga Region) Federal University
Email: maki.solovyev@mail.ru
ORCID iD: 0000-0003-4692-2564
SPIN-код: 5791-3820
Scopus Author ID: 26665013000
ResearcherId: C-8023-2015
Doctor Habil. of Physical and Mathematical Sciences, Professor, a member of Presidium of Multidisciplinary Association for Cognitive Research, the author of four monographs and over 70 publications on text complexity, Chief Researcher of “Multidisciplinary Text Investigation” Research Lab, Institute of Philology and Intercultural Communication
18 Kremlevskaya St, Kazan, 420008, Russian FederationYulia Ebzeeva
RUDN University
Email: ebzeeva-jn@rudn.ru
ORCID iD: 0000-0002-0043-7590
SPIN-код: 3316-4356
Doctor of Social Sciences, PhD in Philology, First Vice-Rector - Vice Rector for Education and Head of Foreign Language Department
6 Miklukho-Maklaya St, Moscow, 117198, Russian FederationӘдебиет тізімі
- Blinova, O., & Tarasov, N. (2022). A hybrid model of complexity estimation: Evidence from Russian legal texts. Frontiers in Artificial Intelligence, 5. http://doi.org/10.3389/frai.2022.1008530
- Chang, T.A., Arnett, C., Tu, Z., & Bergen, B.K. (2023). When is multilinguality a curse? language modeling for 250 high-and low-resource languages. arXiv preprint. https://doi.org/10.48550/arXiv.2311.09205
- Corlatescu, D., Ruseti S., & Dascalu, M. (2022). ReaderBench: Multilevel analysis of Russian text characteristics. Russian Journal of Linguistics, 26(2), 342-370. https://doi.org/10.22363/2687-0088-30145
- Cvrček, V., & Chlumská, L. (2015). Simplification in translated Czech: a new approach to type-token ratio. Russian Linguistics, 39, 309-325. https://doi.org/10.1007/s11185-015-9151-8
- Dmitrieva, A., Laposhina, A., & Lebedeva, M. (2021). A comparative study of educational texts for native, foreign, and bilingual young speakers of russian: are simplified texts equally simple? Frontiers in Psychology, 12, 703690. https://doi.org/10.3389/fpsyg.2021.703690
- Gatiyatullina, G., Solnyshkina, M., Solovyev, V., Danilov, A., Martynova, E., & Yarmakeev, I. (2020). Computing Russian morphological distribution patterns using RusAC online server. In 2020 13th International Conference on Developments in eSystems Engineering (DeSE) (pp. 393-398). IEEE Publ. https://doi.org/10.1109/DeSE51703.2020.9450753
- Golovin, B.N. (1971). Language and statistics. Moscow: Prosveshchenie Publ. (In Russ.).
- Karakanta, A., Dehdari, J., & van Genabith, J. Neural machine translation for low-resource languages without parallel corpora. Machine Translation, 32, 167-189. https://doi.org/10.1007/s10590-017-9203-5
- Kolmogorova, A.V., Kolmogorova, P.A., & Kulikova, E.R. (2024). About the past, but at different times: computer analysis of textbooks on the history of the USSR / Russia for six generations of students. Tomsk State University Journal of Philology, (89), 73-103. (In Russ.). http://doi.org/10.17223/19986645/89/4
- Kormilitsyna, M.A., & Sirotinina, O.B. (2013). Functional stylistics and its place in modern linguistics. In L.R. Duskaeva (Ed.), Slavic stylistics. The 21st century: collection of articles (pp. 101-111). Saint Petersburg: SPbU Publ. (In Russ.).
- Kozhina, M.N. (1989). On functional semantic-stylistic categories in the aspect of the communicative theory of language. In Varieties and genres of scientific prose. Linguostylistic features (pp. 3-27). Moscow: Nauka Publ. (In Russ.).
- Krongauz, M.A. (2009). Russian language on the verge of a nervous breakdown. Moscow: Languages of Slavic cultures Publ. (In Russ.).
- Kupriyanov, R.V., Solnyshkina, M.I., Dascalu, M., & Soldatkina, T.A. (2022). Lexical and syntactic features of academic Russian texts: a discriminant analysis. Research Result. Theoretical and Applied Linguistics, 8(4), 105-122. http://dx.doi.org/10.18413/2313-8912-2022-8-4-0-8
- Kuznetsova, I. (2015). Linguistic profiles: going from form to meaning via statistics. De Gruyter Mouton. http://doi.org/10.1515/9783110361858
- Laposhina, A.N., Veselovskaya, T.S., Lebedeva, M.Yu., & Kupreshchenko, O.F. Lexical composition of the Russian language textbooks for primary school: corpus study. In Computational linguistics and intellectual technologies: based on the materials of the international conference “Dialogue 2019”. Vol. 18 (pp. 351-363). (In Russ.).
- Laposhina, A.N., & Lebedeva, M.Yu. (2021). Textometer: an online tool for determining the difficulty level of a text in Russian as a foreign language. Russian Language Studies, 19(3), 331-345. (In Russ.). http://doi.org/10.22363/2618-8163-2021-19-3-331-345
- Lipmann, W. (1922). Public Opinion. New York: Macmillan.
- Lukashevich, N.V., & Dobrov, B.V. (2015). Designing linguistic ontologies for information systems in broad subject areas. Ontology of Designing, (1), 47-69.
- Lyashevskaya, O.N., & Sharov, S.A. (2009). Frequency Dictionary of the Modern Russian Language (based on materials from the Russian National Corpus). Moscow: Azbukovnik Publ. (In Russ.).
- Lyashevskaya, O., Panteleeva, I., & Vinogradova, O. (2021). Automated assessment of learner text complexity. Assessing Writing, 49, 100529. https://doi.org/10.1016/j.asw.2021.100529
- McNamara, D.S., Graesser, A.C., McCarthy, P.M., & Cai, Z. (2014). Automated Evaluation of Text and Discourse with Coh-Metrix. Cambridge University Press.
- Mikheev, M.Yu., & Erlich, L.I. (2018). Idiostyle profile and determination of text authorship by frequencies of function words. Automatic Documentation and Mathematical Linguistics, (2), 25-34. (In Russ.).
- Morozov, D.A., Glazkova, A.V., & Iomdin, B.L. (2022). Text complexity and linguistic features: Their correlation in English and Russian. Russian Journal of Linguistics, 26(2), 426-448. https://doi.org/10.22363/2687-0088-30132
- Namestnikov, A.M., Pirogova, N.D., & Filippov, A.A. (2021). An approach to the automatic construction of a linguistic ontology for determining the interests of social network users. Ontology of design, 11(3), 351-363. (In Russ.). http://doi.org/10.18287/2223-9537-2021-11-3-351-36
- Oborneva, I.V. (2006). Automated assessment of the complexity of educational texts based on statistical parameters. (Candidate dissertation, Moscow). (In Russ.).
- Paraschiv, A., Dascalu, M., & Solnyshkina, M.I. (2023). Classification of Russian textbooks by grade level and topic using ReaderBench. Research Result. Theoretical and Applied Linguistics, 9(1), 50-63. https://doi.org/10.18413/2313-8912-2023-9-1-0-4
- Sakhovskiy, A., Solovyev, V., & Solnyshkina, M. Topic modeling for assessment of text complexity in Russian textbooks. In Proceedings of 2020 Ivannikov Ispras Open Conference (ISPRAS) (pp. 102-108). IEEE Publ. https://doi.org/10.1109/ISPRAS51486.2020.00022
- Saussure, F. de. (1977). Trudy po iazykoznaniiu [Writings in General Linguistics]. Moscow: Progress, 695 p.
- Serdobolskaya, N.V., & Toldova, S.Yu. Evaluation predicates: type of evaluation and syntax of the construction. In “Computer linguistics and intellectual technologies”: proceedings of the International Conference ‘Dialogue’ 2005 (pp. 436-443). Moscow: Nauka Publ. (In Russ.).
- Solnyshkina, M.I., Solovyev, V.D., Gafiyatova, E.V., & Martynova, E.V. (2022). Text complexity as an interdisciplinary problem. Issues of Cognitive Linguistics, (1), 18-39. https://doi.org/10.20916/1812-3228-2022-1-18-39
- Solovyev, V., Ivanov, V., & Solnyshkina, M. (2018). Assessment of reading difficulty levels in Russian academic texts: Approaches and metrics. Journal of Intelligent & Fuzzy Systems, 34(5), 3049-3058 http://doi.org/10.3233/JIFS-169489
- Solovyev, V., Solnyshkina, M., & McNamara, D. (2022). Computational linguistics and discourse complexology: Paradigms and research methods. Russian Journal of Linguistics, 26(2), 275-316. https://doi.org/10.22363/2687-0088-31326
- Toldova, S., Anastasiya, A.B., Lyashevskaya, O., & Ionov, M. (2015). Evaluation for morphologically rich language: Russian NLP. In Int'l Conf. Artificial Intelligence. ICAI'15 (pp. 300-306).
- Valeev, A., Gibadullin, I., Khusainova, A., & Khan, A. (2019). Application of Low-resource Machine Translation Techniques to Russian-Tatar Language Pair. arXiv preprint. http://doi.org/10.48550/arXiv.1910.00368
- Vinogradov, V.V. (1938). Modern Russian language. Grammatical doctrine of the word. Moscow; Leningrad State educational-pedagogical publishing house of the People's Commissariat of Education of the RSFSR. (In Russ.).
- Virk, S.M., Hammarström, H., Borin, L., Forsberg, M., & Wichmann, S. (2020). From Linguistic Descriptions to Language Profiles. In Proceedings of the 7th Workshop on Linked Data in Linguistics (LDL-2020) (p. 23-27). Marseille: European Language Resources Association Publ.
- Young, T., Hazarika, D., Poria, S., & Cambria, E. (2018). Recent Trends In Deep Learning Based Natural Language Processing. IEEE Computational intelligenсe magazine, 13(3), 55-75. http://doi.org/10.1109/MCI.2018.2840738
- Zinder, L.R., & Stroeva, T.V. (1968). Historical morphology of the German language. Leningrad: Prosveshchenie Publ. (In Russ.).
Қосымша файлдар
