Lexical enrichment of philological textbooks: corpus and statistical approaches
- Авторлар: Galimova K.N.1, Martynova E.V.1, Moskvitcheva S.A.2
-
Мекемелер:
- Kazan (Volga Region) Federal University
- RUDN University
- Шығарылым: Том 22, № 4 (2024): LINGUISTIC PROFILES OF RUSSIAN TEXTS: GOING FROM FORM TO MEANING
- Беттер: 579-597
- Бөлім: Key Issues of Russian Language Research
- URL: https://journals.rcsi.science/2618-8163/article/view/324749
- DOI: https://doi.org/10.22363/2618-8163-2024-22-4-579-597
- EDN: https://elibrary.ru/AYFCCE
- ID: 324749
Дәйексөз келтіру
Толық мәтін
Аннотация
The relevance of the study is determined by the need to study objective data on vocabulary frequency in Russian language textbooks and mastering vocabulary in teaching Russian as the native language at school. The article describes the experience of creating a frequency dictionary of philological textbooks based on the linguistic corpus of textbooks on the Russian language and literature for 5-7 grades. Philological textbooks present an average model of the Russian language and literature, reflecting topics relevant to the student and gradually increasing the volume of lexical complexity. The aim of the article is to assess lexical enrichment in philological textbooks for 5-7 grades and to improve the methodology for compiling frequency lists. The study was carried out on the material of a corpus including 66 textbooks on the Russian language and Literature with the total size of 1,553,224 tokens. Methods of corpus and computational linguistics methods, comparative-contrastive, and statistical methods (IKSWEB program, the Google Colab environment, the Pandas, NLTK and Pymorphy libraries) revealed that the frequency list of the 5th grade comprises 8984 lemmas; the 6th grade, 7572 lemmas; the 7th grade, 7321 lemmas. Vocabulary “enrichment” in the 6th grade consists of 258 lexemes, and in the 7th grade, 150 lexemes. The lexical core of the three frequency lists are words of the thematic groups “Philological terms”, “Verbs denoting educational actions”, “Nature”, “Family and friendly relations”, “Art”, and “Time”. The 6th grade vocabulary “enrichment” includes archaisms and historicisms, terms denoting forms of the national language, and word-formation terms. The 7th grade “enrichment” comprises of linguistic terms on the themes “Names of verb forms”, “Religion”, and socio-political vocabulary. The frequency lists confirmed the hypothesis about the thematic balance of texts in modern textbooks on the Russian language and Literature and linguistics terminology being the core in the textbooks. The prospects of the study are seen in conducting a similar research of educational texts in Philology and other subjects form the textbooks for senior school in order to define intra- and meta-subject links.
Авторлар туралы
Khalida Galimova
Kazan (Volga Region) Federal University
Хат алмасуға жауапты Автор.
Email: galikha@mail.ru
ORCID iD: 0000-0003-1817-5004
SPIN-код: 7931-3389
PhD in Philology, Senior Researcher at the Multidisciplinary Text Investigation Research Institute of Philology and Intercultural Communication
18 Kremlevskaya St, Kazan, 420008, Russian FederationEkaterina Martynova
Kazan (Volga Region) Federal University
Email: katerinamarty@yandex.ru
ORCID iD: 0000-0001-5883-0718
SPIN-код: 9431-7981
Senior Lecturer at the Department of Theory and Practice of Teaching Foreign Languages, Junior Researcher at the Multidisciplinary Text Investigation Research Institute of Philology and Intercultural Communication
18 Kremlevskaya St, Kazan, 420008, Russian FederationSvetlana Moskvitcheva
RUDN University
Email: moskvitcheva-sa@rudn.ru
ORCID iD: 0000-0002-8047-7030
SPIN-код: 9596-7692
PhD in Philology, Associate Professor of the General and Russian Linguistics Department, Faculty of Philology
6 Miklukho-Maklaya St, Moscow, 117198, Russian FederationӘдебиет тізімі
- Arapov, M.V. (1982). Text and language — integrity and organization. Scientific Journal of the Tartu University. Tartu. 628. (In Russ.).
- Baroni, M., Bernardini, S., Ferraresi, A., & Zanchetta, E. (2009). The WaCky Wide Web: A collection of very large linguistically processed webcrawled corpora. Language Resources and Evaluation, 43, 209–226. https://doi.org/10.1007/s10579-009-9081-4
- Blinova, O.V. (2019). Russian low-frequency words and approaches to modeling general language frequency. Socio- and Psycholinguistic Studies, (7), 7–13. (In Russ.).
- Churunina, A.A., Solnyshkina, M.I., & Yarmakeev, I.E. (2023). Lexical diversity as a predictor of the complexity of textbooks on the Russian language. Russian Language Studies, 21(2), 212–227. (In Russ.). https://doi.org/10.22363/2618-8163-2023-21-2-212-227
- Generalova, E.V. (2019). Obsolescent vocabulary of the Russian language: educational and lexicographic interpretation issues. Journal of Applied Linguistics and Lexicography, (2), 371–380. (In Russ.). https://doi.org/10.33910/2687-0215-2019-1-2-371-380
- Gindin, S.I. (1982). The frequency of the word and its significance in the language system. Tartu Ülikooli Toimetised, (658), 22–54. (In Russ.).
- Glinkina, L.A. (2011). Frequency as an important characteristic of lexicography and phraseography. Journal of Historical, Philological and Cultural Studies, (3), 7–11.
- Josselson, H. (1953). The Russian word count and frequency analysis of grammatical categories of standard literary Russian. Detroit: Wayne University Press.
- Kazachkova, M.B., & Galimova, H.N. (2022). A linguistic corpus of English textbooks creation. Foreign Languages at School, 2, 32–38. (In Russ.).
- Korosteleva, L.V. (2013). High-frequency nouns, adjectives and numerals in modern Russian (based on the materials of lexicography): monograph. Nizhnevartovsk: Publishing House of Nizhnevartovsk State University. (In Russ.).
- Laposhina, A.N., Veselovskaya, T.S., Lebedeva, M.Yu., & Kupreshchenko, O.F. Lexical composition of the Russian language textbooks for primary school: corpus study. In Computational linguistics and intellectual technologies: based on the materials of the international conference “Dialogue 2019”. Vol. 18 (pp. 351–363). (In Russ.).
- Laposhina, A.N., & Lebedeva, M.Yu. (2022). Developing a Russian frequency core vocabulary list for foreign children based on corpus data. Mir Russkogo Slova, (3), 90–99. (In Russ.). https://doi.org/10.24412/1811-1629-2022-3-90-99
- Laposhina, A.N., & Lebedeva, M.Yu. (2021). Textometr: an online tool for automated complexity level assessment of texts for Russian language learners. Russian Language Studies, (3), 331–345. (In Russ.). https://doi.org/10.22363/2618-8163-2021-19-3-331-345
- Malmkjær, K. (2002). The linguistics encyclopedia. 2nd ed. London; New York: Routledge.
- Martynova, E.V., Solnyshkina, M.I, & Merzlyakova, A.R. (2020). Lexical parameters of the academic text (based on the texts of the academic corpus of the Russian language). Philology and Culture, (3), 72–80. https://doi.org/10.26907/2074-0239-2020-61-3-72-80
- Nagel, O.V. (2008). Corpus linguistics and its use in computer-based language teaching. Language and Culture, 4, 53–59. (In Russ.).
- Nemova, A.N. (2015). Case texts as a cultural code in the process of studying the literature. Nizhny Novgorod Education, (1), 22–26. (In Russ.).
- Nesova, N.M., & Bobritskikh, L.Ya. (2018). Representation of the dictionary in theoretical and educational lexicography. RUDN Journal of Language Studies, Semiotics and Semantics, 9(2), 439–450. (In Russ.). https://doi.org/10.22363/2313-2299-2018-9-2-439-450
- Orlov, Yu.K. (1978). A model of the frequency structure of vocabulary. Research in computational linguistics and linguostatistics. Moscow State University, 59–118. (In Russ.).
- Rudell, A. (1993). Frequency of word usage and perceived word difficulty: Ratings of Kucera and Francis words. Behaviour Research Methods, Instruments, & Computers, (25), 455–463.
- Shteifeldt, E. (1963). Frequency dictionary of a modern Russian literary language: 2500 most common words. Tallin.
- Solnyshkina, M., & Gafiyatova, E. (2014). Modern forestry English: Macro- and microstructure of low register dictionary. Journal of Language and Literature, 5(4), 220–224. https://doi.org/10.7813/jll.2014/5-4/47
- Solnyshkina, M.I., & Gatiyatullina, G.M. (2020). The history of corpus linguistics (on the example of the English language corpora). Tomsk State University Journal of Philology, 63, 133–157. (In Russ.). https://doi.org/10.17223/19986645/63/8
- Soloviev, V.D., Solnyshkina, M.I., & McNamara, D.S. (2022). Computational linguistics and discursive complexology: paradigms and research methods. Russian Journal of Linguistics, 26(2), 275–316. (In Russ.). https://doi.org/10.22363/2687-0088-30161
- Solovyev, V., Islamov, M., Solnyshkina, M., Kupriyanov, R., & Gafiyatova, E. (2021). Sentiment Analysis for Russian Academic Texts: A Lexicon-Based Approach. In CEUR Workshop Proceedings, 3090 (pp. 89–97).
- Turygina, L.A. (1988). Modeling of language structures by means of computer technology. Moscow. (In Russ.).
- Tvorogov, O.V. (1995). Gapaks “Words”. In Encyclopedia “Words on Igor's Regiment”. In 5 vol. Vol. 2 (pp.12–15). St. Petersburg: Dmitry Bulanin. (In Russ.).
Қосымша файлдар
