Basic Algorithm for Automatic Spelling Correction of Russian Texts: Development, Evaluation and Prospects

E. V. Isaeva; Исаева Е. В.; B. Z. Safarbekov; Сафарбеков Б. З.

doi:10.17072/1993-0550-2025-1-91-108

Basic Algorithm for Automatic Spelling Correction of Russian Texts: Development, Evaluation and Prospects

Авторлар: Isaeva E.V.¹, Safarbekov B.Z.²
Мекемелер:
1. Perm State University
2. National University of Science and Technology "MISIS"
Шығарылым: № 1 (68) (2025)
Беттер: 91-108
Бөлім: Computer science
URL: https://journals.rcsi.science/1993-0550/article/view/326435
DOI: https://doi.org/10.17072/1993-0550-2025-1-91-108
EDN: https://elibrary.ru/ccjape
ID: 326435

Дәйексөз келтіру

Толық мәтін

Аннотация
Авторлар туралы
Әдебиет тізімі
Қосымша файлдар
Статистика

Аннотация

Automatic spelling check and correction of texts in Russian is an urgent task in the field of natural language processing. Our research is aimed at developing, evaluating, and describing a computer programme for correcting spelling errors with high accuracy. The proposed method is based on line-by-line text processing using rules for spelling and capitalisation accuracy and a probabilistic model for proposing candidate words for error correction. Our algorithm operates at the level of individual words, which limits its ability to take context into account. The metrics used to test the quality of the model are Precision, Recall, and F1 Score. For ease of use and program refinement, we integrated automated error analysis and detailed report generation to identify the strengths and weaknesses of the algorithm. The detailed development description ensures the reproducibility of the algorithm and is in line with the Open-source ideology. The results showed that the algorithm has high Precision = 1.00, i.e., it corrects only those spelling errors that were specified in the reference text. However, the Recall = 0.84 emphasises the need for further refinement, including handling context-dependent errors and processing stable expressions. The F1 Score = 0.91 confirms the balanced performance of the algorithm and justifies its use as a basic model of text correction in Russian. The conclusions of the study emphasise the potential of the algorithm in the tasks of automatic correction of Russian-language text, and suggest prospective areas for improving the source code, such as the use of n-grams and language models. This work lays the foundation for further research in the field of automatic correction of Russian-language texts.

Негізгі сөздер

spelling errors, grammatical errors, Russian language, automatic text correction, natural language processing, accuracy, completeness, F1 Score

Авторлар туралы

E. Isaeva

Perm State University

Email: ekaterinaisae@psu.ru
Scopus Author ID: 57204498718
ResearcherId: O-6777-2015
Perm

B. Safarbekov

National University of Science and Technology "MISIS"

Email: behruzsafarbekov3@gmail.com
Moscow

Әдебиет тізімі

Zukarnain N. et al. Spelling Checker Algorithm Methods for Many Languages // 2019 International Conference on Information Management and Technology (ICIMTech). IEEE, 2019. P. 198-201. doi: 10.1109/ICIMTech.2019.8843801.
Hamrouni B.M. Logic compression of dictionaries for multilingual spelling checkers // Proceedings of the 15th conference on Computational linguistics Morristown, NJ, USA: Association for Computational Linguistics, 1994. P. 292. doi: 10.3115/991886.991936.
Lokhande H.A. et al. Enhancing Text Quality with Bi-LSTM: An Approach for Automated Spelling and Grammar Correction // 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS). IEEE, 2024. P. 01-07. doi: 10.1109/ADICS58448.2024.10533521.
Mangu L., Brill E. Automatic Rule Acquisition for Spelling Correction // Proceedings of the Fourteenth International Conference on Machine Learning (ICML '97). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc, 1997. P. 187-194.
Davrondjon G., Janowski T. Developing a Spell-Checker for Tajik Using RAISE // Formal Methods and Software Engineering. ICFEM 2002. Lecture Notes in Computer Science / ed. George C., Miao H. Berlin, Heidelberg: Springer, 2002. Vol. 2495. P. 401-405. doi: 10.1007/3-540-36103-0_41 EDN: XNBRMD.
Atawy S.M. El, ElGhany A.A. Automatic Spelling Correction based on n-Gram Model // Int J Comput Appl. 2018. Vol. 182, No 11. P. 5-9.
Chen K.-Y., Wang H.-M., Chen H.-H. A Probabilistic Framework for Chinese Spelling Check // ACM Transactions on Asian and Low-Resource Language Information Processing. 2015. Vol. 14, No 4. P. 1-17. doi: 10.1145/2826234.
Sasu L. A Probabilistic Model for Spelling Correction // Bulletin of the Transilvania University of Brasov. Series III: Mathematics, Informatics, Physics. 2011. Vol. 4(53), No 2. P. 141-146.
Kashyap R.L., Oommen B.J. Spelling correction using probabilistic methods // Pattern Recognit Lett. 1984. Vol. 2, No 3. P. 147-154. doi: 10.1016/0167-8655(84)90038-2
Chen S.F. Building Probabilistic Models for Natural Language. 1996.
Flachs S., Lacroix O., Søgaard A. Noisy Channel for Low Resource Grammatical Error Correction // Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications. Stroudsburg, PA, USA: Association for Computational Linguistics, 2019. P. 191-196. doi: 10.18653/v1/W19-4420.
Li Y., Anastasopoulos A., Black A.W. Towards Minimal Supervision BERT-Based Grammar Error Correction (Student Abstract) // Proceedings of the AAAI Conference on Artificial Intelligence. 2020. Vol. 34, No 10. P. 13859-13860. doi: 10.1609/aaai.v34i10.7202 EDN: QHFSCJ.
Khabutdinov I.A. et al. RuGECToR: Rule-Based Neural Network Model for Russian Language Grammatical Error Correction // Programming and Computer Software. Pleiades Publishing, 2024. Vol. 50, No 4. P. 315-321. doi: 10.1134/S0361768824700129 EDN: XCUPYE.
Martynov N. et al. A Methodology for Generative Spelling Correction via Natural Spelling Errors Emulation across Multiple Domains and Languages. 2023.
Kaggle: Your Machine Learning and Data Science Community [Electronic resource]. URL: https://www.kaggle.com/ (accessed: 07.02.2025).
Hugging Face - The AI community building the future. [Electronic resource]. URL: https://huggingface.co/ (accessed: 07.02.2025).
Language-tool-python.PyPI [Electronic resource]. URL: https://pypi.org/project/language-tool-python/ (accessed: 11.12.2024).
Hunspell download SourceForge.net [Electronic resource]. URL: https://sourceforge.net/projects/hunspell/ (accessed: 02.02.2025).
Goslin K., Hofmann M. English Language Spelling Correction as an Information Retrieval Task Using Wikipedia Search Statistics // Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022). Marseille: European Language Resources Association (ELRA), 2022. P. 458-464.
SymSpell, Github [Electronic resource]. URL: https://github.com/wolfgarbe/SymSpell (accessed: 02.02.2025).
Audah H.A., Yuliawati A., Alfina I. A Comparison Between SymSpell and a Combination of Damerau-Levenshtein Distance with the Trie Data Structure // 2023 10th International Conference on Advanced Informatics: Concept, Theory and Application (ICAICTA). IEEE, 2023. P. 1-6. doi: 10.1109/ICAICTA59291.2023.10390399.
Проверка орфографии, грамматики и стилистики онлайн - LanguageTool [Electronic resource]. URL: https://languagetool.org/ru (accessed: 02.02.2025).
Sorokin A.A., Shavrina T. Automatic spelling correction for Russian social media texts // Dialogue, International Conference on Computational Linguistics. Moscow: URL: https://www.researchgate.net/publication/303813582_Automatic_spelling_correction_for_Russian_social_media_texts, 2016. EDN: XMWHDK (accessed: 02.02.2025).
Pandas 2.2.3 documentation [Electronic resource]. URL: https://pandas.pydata.org/docs/ (accessed: 02.02.2025).

Қосымша файлдар

Әрекет

1. JATS XML

Жүктеу

Пайдаланушының аты
Құпиясөз
Мені есте сақтау

Құпия сөзді ұмыттыңыз ба?	Тіркеу

Пайдаланушының аты
Құпиясөз
Мені есте сақтау

Құпия сөзді ұмыттыңыз ба?	Тіркеу

№ 1 (68) (2025)

№ 1 (68) (2025)

Basic Algorithm for Automatic Spelling Correction of Russian Texts: Development, Evaluation and Prospects

Толық мәтін

Аннотация

Негізгі сөздер

Авторлар туралы

E. Isaeva

B. Safarbekov

Әдебиет тізімі

Қосымша файлдар