Lexico-semantic analysis of the literary text and its translation with Python tools
- Authors: Safina Z.M.1, Lukahina D.N.1
-
Affiliations:
- Issue: No 12 (2025)
- Pages: 275-285
- Section: Articles
- URL: https://journals.rcsi.science/2454-0749/article/view/372124
- DOI: https://doi.org/10.7256/2454-0749.2025.12.76918
- EDN: https://elibrary.ru/VBLFUI
- ID: 372124
Cite item
Full Text
Abstract
The article examines the potential of automated lexico-semantic analysis of literary texts and their translations using the NLTK library in the Python programming language. Programming languages significantly accelerate linguistic research and enable systematic, structured organization of collected data. The study focuses on the 19th-century American author William Gilmore Simms’ story “Grayling: or, ‘Murder Will Out’,” and its Russian translation by M.L. Pavlycheva. Particular attention is paid to identifying differences in lexical structure, word and bigram frequency, part-of-speech distribution, and lexical diversity. The research aims to detect translation transformations that affect the semantic and stylistic organization of the text and to evaluate the capabilities of automated analysis in comparing the original and its translation. The study employs Natural Language Processing (NLP) methods using the NLTK library within the Python environment, including text normalization, part-of-speech tagging, frequency analysis, bigram modeling, and calculation of the lexical diversity index. The findings demonstrate that automated lexico-semantic analysis enables objective identification of key differences between the original and its translation: the Russian version exhibits higher lexical diversity, attributable to the inflectional nature of the language and the active use of translation transformations; the frequency of cohesive elements increases; and thematically marked bigrams are replaced by more neutral constructions. Furthermore, significant limitations of standard NLP tools in processing Russian-language texts are revealed, underscoring the need to adapt computational methods to the specific features of the Russian language. The study confirms the need for an integrated approach in the analysis of the original and translation of a literary text, combining computational methods and linguistic interpretation. Future research directions include the application of advanced morphological analyzers, expansion of the text corpus, and integration of machine learning techniques for in-depth comparative analysis of original literary texts and their translations.
About the authors
Zarema Miniaminovna Safina
Email: safinazarem@yandex.ru
ORCID iD: 0009-0009-3486-7757
Dar'ya Nikolaevna Lukahina
Email: dlukaxina@mail.ru
References
Van Der Post H. Natural Language Processing with Python: A comprehensive guide to NLP in the age of AI for 2024. Reactive Publishing, 2023. Hammond M. Python for Linguists. Padstow, Cornwall: TJ International Ltd, 2020. Сафина З.М. Методы квантитативной лингвистики при исследовании оригинала и перевода художественного текста // Филология: научные исследования. 2025. № 10. С. 56-64. doi: 10.7256/2454-0749.2025.10.76298 EDN: NGJXWK URL: https://nbpublish.com/library_read_article.php?id=76298 Rana Y. Python: Simple though an Important Programming language // International Research Journal of Engineering and Technology (IRJET). 2019. Vol. 06, Iss. 2. Pp. 1856–1858. Сафина З. М. Переводческий анализ художественного текста на языке Python // Глобальный научный потенциал. 2024. № 11 (164), Т. 1. С. 177-180. EDN: RTJTGQ. Ладушина М. И. Как язык Python помогает лексикографам // Journal of Applied Linguistics and Lexicography. 2022. Т. 4, № 2. С. 107-121. doi: 10.33910/2687-0215-2022-4-2-107-121. EDN: UIYYDM. Гагарин С. Н. Базовые методики анализа языковых картин политики с помощью языка программирования Python и библиотеки NLTK (на материалах корпусов британского парламентского дискурса) // Филологические науки в МГИМО. 2024. 10(2). С. 125-140. doi: 10.24833/2410-2423-2024-2-39-125-140. EDN: GDGMAO. Simms G.W. Grayling; or Murder Will Out // The Wigwam and the Cabin. New York: Redfield, 1856. Pp. 2-36. Симмс У. Г. Грейлинг, или "Тайное становится явным" (пер. М. Л. Павлычевой) // Вигвам и хижина. Санкт-Петербург: Дмитрий Буланин, 2018. С. 27-55. Bird S., Klein E., Loper E. Natural Language Processing with Python. O'Reilly Media, 2009. Хайрова Н. Ф., Мамырбаев О. Ж., Петрасова С. В., Мухсина К. Ж. Современные технологии обработки текстовых данных на базе пакета NLTK Python: учеб. пособ. Харьков: ООО "В деле", 2020. Сафина З. М., Корнилова А. Д., Смакова А. Л. Количественный и статистический анализ лексических единиц в художественном переводе // Вестник Башкирского университета. 2022. Т. 27, № 3. С. 741-746. doi: 10.33184/bulletin-bsu-2022.3.42. EDN: FGZGYW. Морозкина Е. А., Воробьев В. В., Сафина З. М. Статистические методы исследования в художественном переводе // Доклады Башкирского университета. 2023. Т. 8, № 3. С. 130-137. doi: 10.33184/dokbsu-2023.3.15. EDN: KHORTY. McCarthy P.M., Jarvis S. vocd: A theoretical and empirical evaluation // Language Testing. 2007. 24 (4). Pp. 459-488. doi: 10.1177/0265532207080767. EDN: JTJVXN. Jurafsky D., Martin J.H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. New Jersey: Prentice Hall, 2009. Мифтахова Р. Г., Морозкина Е. А. Нейронное представление семантического поля // Вестник Башкирского университета. 2021. Т. 26, № 4. С. 1130–1135. doi: 10.33184/bulletin-bsu-2021.4.48. EDN: KWAPJJ.
Supplementary files

