The Ethical Aspect of Using Artificial Intelligence Technologies in the Field of Indigenous Languages Preservation
- Authors: Kolomatsky D.I.1, Korovina E.V.1
-
Affiliations:
- Institute of Linguistics of the Russian Academy of Sciences
- Issue: No 4(123) (2025)
- Pages: 46-55
- Section: SOCIETY OF COEXISTENCE OF NATURAL AND ARTIFICIAL INTELLIGENCE
- URL: https://journals.rcsi.science/2587-6090/article/view/368456
- DOI: https://doi.org/10.22204/2587-8956-2025-123-04-46-55
- ID: 368456
Cite item
Full Text
Abstract
The paper discusses the ethical aspects of applying artificial intelligence technologies to indigenous languages. The analysis focuses on several prominent projects in this field. Some initiatives are not primarily aimed at language revitalization or cultural preservation, which may attract criticism from the language communities themselves. The paper also explores examples of successful projects created and/or actively supported by indigenous individuals. The proposed initiatives cover a wide range of regions across the globe, including Africa, North and South America, Southeast Asia, and Oceania. The article also provides recommendations for further development and support of these initiatives, emphasizing the importance of ethical considerations and respecting the rights and interests of indigenous peoples. This work aims to raise awareness about the significance of preserving indigenous languages through modern technology, which is particularly relevant for the Russian Federation with its unique linguistic diversity.
About the authors
D. I. Kolomatsky
Institute of Linguistics of the Russian Academy of Sciences
Author for correspondence.
Email: dk@iling-ran.ru
candidate of philological sciences, researcher
Russian Federation, MoscowE. V. Korovina
Institute of Linguistics of the Russian Academy of Sciences
Email: evkorovina@iling-ran.ru
junior research assistant
Russian Federation, MoscowReferences
- Good J. Ethics in Language Documentation and Revitalization // The Oxford Hand-book of Endangered Languages / Ed. K.L. Rehg, L. Campbell. Oxford University Press, 2018. P. 418–440. doi: 10.1093/oxfordhb/9780190610029.013.21.
- Holton G., Leonard W.Y., Pulsifer P.L. Indigenous Peoples, Ethics, and Linguistic Data // The Open Handbook of Linguistic Data Management / Ed. A.L. Berez-Kroeker et al. The MIT Press, 2022. P. 49–60. doi: 10.7551/mitpress/12200.003.0008.
- Marley T.L. Indigenous Data Sovereignty and the role of universities // Indigenous Data Sovereignty and Policy. 1st ed. London: Routledge, 2020. P. 157–168. doi: 10.4324/9780429273957-11.
- Ruckstuhl K. Trust in Scholarly Communications and Infrastructure: Indigenous Data Sovereignty // Front. Res. Metr. Anal. 2022. Vol. 6. doi: 10.3389/frma.2021.752336.
- Ortenzi K.M. et al. Good data relations key to Indigenous research sovereignty: A case study from Nunatsiavut // Ambio. 2025. Vol. 54, № 2. P. 256–269. doi: 10.1007/s13280-024-02077-6.
- Visser E.A grammar of Kalamang. Berlin: Language Science Press, 2022. https://zenodo.org/record/6499927 (Access date 10.05.2025).
- Tanzer G. et al. A Benchmark for Learning to Translate a New Language from One Grammar Book: arXiv:2309.16575. arXiv, 2024. doi: 10.48550/arXiv.2309.16575.
- Kornilov A., Shavrina T. From MTEB to MTOB: Retrieval-Augmented Classification for Descriptive Grammars: arXiv:2411.15577. arXiv, 2024. doi: 10.48550/arXiv.2411.15577.
- Aycock S. et al. Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book? arXiv:2409.19151. arXiv, 2025. doi: 10.48550/arXiv.2409.19151.
- Brixey J. Using Artificial Intelligence to Preserve Indigenous Languages. USC Insti-tute for Creative Technologies, 2025. https://ict.usc.edu/news/essays/using-artificial-intelligence-to-preserve-indigenous-languages/ (Access date 10.05.2025).
- Brixey J., Pincus E., Artstein R. Chahta Anumpa: A Multimodal Corpus of the Choc-taw Language // Proceedings of the Eleventh International Conference on Language Re-sources and Evaluation (LREC 2018). 2018. P. 3371–3376. https://aclanthology.org/L18-1532.pdf. (Access date 10.05.2025).
- Brixey J., Artstein R. ChoCo: a multimodal corpus of the Choctaw language // Lang Resources & Evaluation. 2021. Vol. 55, № 1. P. 241–257. doi: 10.1007/s10579-020-09494-5.
- Brixey J., Traum D. Masheli: A Choctaw-English Bilingual Chatbot // Conversational Dialogue Systems for the Next Decade / Ed. L.F. D’Haro, Z. Callejas, S. Nakamura Singa-pore: Springer Singapore, 2021. Vol. 704. P. 41–50. doi: 10.1007/978-981-15-8395-7_4.
- Brixey J., Traum D. Does a code-switching dialogue system help users learn conver-sational fluency in Choctaw? // Proceedings of the Fifth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP). Albuquerque, New Mexico: Association for Computational Linguistics, 2025. P. 8–17. https://aclanthology.org/2025.americasnlp-1.2/ (Access date 10.05.2025).
- Leuski A., Traum D. NPCEditor: Creating Virtual Human Dialogue Using Infor-mation Retrieval Techniques // AI Magazine. 2011. Vol. 32, № 2. P. 42–56. doi: 10.1609/aimag.v32i2.2347.
- Orife I. et al. Masakhane – Machine Translation For Africa: arXiv:2003.11529. arXiv, 2020. doi: 10.48550/arXiv.2003.11529.
- Martinus L., Abbott J.Z. A Focus on Neural Machine Translation for African Lan-guages: arXiv:1906.05685. arXiv, 2019. doi: 10.48550/arXiv.1906.05685.
- Nekoto W. et al. Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages: arXiv:2010.02353. arXiv, 2020. doi: 10.48550/arXiv.2010.02353.
- Rajab J. et al. The Esethu Framework: Reimagining Sustainable Dataset Governance and Curation for Low-Resource Languages: arXiv:2502.15916. arXiv, 2025. doi: 10.48550/arXiv.2502.15916.
- Jones P.-L. et al. Kia tangata whenua: Artificial intelligence that grows from the land and people // Ethical Space: International Journal of Communication Ethics. 2023. Vol. 2023, № 2/3. doi: 10.21428/0af3f4c0.9092b177.
- Leoni G. et al. Solving Failure Modes in the Creation of Trustworthy Language Technologies // Proceedings of the 3rd Annual Meeting of the Special Interest Group on Un-der-resourced Languages @ LREC-COLING 2024. Torino, Italia, 2024. https://aclanthology.org/2024.sigul-1.39/ (Access date 11.05.2025).
- Xu X. et al. A Survey on Knowledge Distillation of Large Language Models: arXiv:2402.13116. arXiv, 2024. doi: 10.48550/arXiv.2402.13116.
- Jiang A.Q. et al. Mistral 7B: arXiv:2310.06825. arXiv, 2023. doi: 10.48550/arXiv.2310.06825.
- Pinhanez C. et al. Harnessing the Power of Artificial Intelligence to Vitalize Endan-gered Indigenous Languages: Technologies and Experiences: arXiv:2407.12620. arXiv, 2024. doi: 10.48550/arXiv.2407.12620.
Supplementary files

