Multilevel Language Processing for Intelligent Retrieval and Text Mining
- Authors: Smirnov I.V.1
-
Affiliations:
- Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences
- Issue: No 1 (2023)
- Pages: 90-99
- Section: Analysis of Textual and Graphical Information
- URL: https://journals.rcsi.science/2071-8594/article/view/269819
- DOI: https://doi.org/10.14357/20718594230109
- ID: 269819
Cite item
Full Text
Abstract
The paper considers the problem of applying methods for multilevel natural language processing to information retrieval and text mining. The problem of using linguistic information about the structure of text and sentences obtained as a result of syntactic, semantic and discursive analysis of texts is investigated. The results of the development of methods for multi-level processing of the Russian language and their application in the tasks of semantic and question-answering search, information extraction from texts, text classification and psycholinguistic analysis of texts are presented.
Full Text

About the authors
Ivan V. Smirnov
Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences
Author for correspondence.
Email: ivs@isa.ru
Candidate of physical and mathematical sciences, docent. Head of Department, Federal Research Center “Computer Science and Control”
Russian Federation, MoscowReferences
- Kamath U., Liu J., Whitaker J. Deep learning for NLP and speech recognition. – Cham, Switzerland: Springer, 2019. – V. 84.
- Glavaš G., Vulić I. Is supervised syntactic parsing beneficial for language understanding tasks? an empirical investigation //Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. – 2021. – pp. 3090-3104.
- Sachan D. S. et al. Do syntax trees help pre-trained transformers extract information? //arXiv preprint arXiv:2008.09084. – 2020.
- Mohebbi M., Razavi S. N., Balafar M. A. Computing semantic similarity of texts based on deep graph learning with ability to use semantic role label information//Scientific reports. – 2022. – V. 12. – №. 1. – pp. 1-11.
- Yang J. et al. Measuring the short text similarity based on semantic and syntactic information //Future Generation Computer Systems. – 2021. – V. 114. – pp. 169-180.
- Tymoshenko K., Moschitti A. Assessing the impact of syntactic and semantic structures for answer passages reranking //Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. – 2015. – pp. 1451-1460.
- Galitsky B. A., De La Rosa J. L., Dobrocsi G. Inferring the semantic properties of sentences by mining syntactic parse trees //Data & Knowledge Engineering. – 2012. – V. 81. – pp. 21-45.
- Galitsky B. Machine learning of syntactic parse trees for search and classification of text //Engineering Applications of Artificial Intelligence. – 2013. – V. 26. – №. 3. – pp. 1072-1091.
- Reddy S. et al. Universal semantic parsing //arXiv preprint arXiv:1702.03196. – 2017.
- Galitsky B., Ilvovsky D. Chatbot with a discourse struc-ture-driven dialogue management //Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics. – 2017. – pp. 87-90.
- Hou S., Zhang S., Fei C. Rhetorical structure theory: A comprehensive review of theory, parsing methods and applications //Expert Systems with Applications. – 2020. – V. 157. – Pp. 113421.
- Vargas F. et al. Rhetorical structure approach for online deception detection: A survey //Proceedings of the Thirteenth Language Resources and Evaluation Conference. – 2022. – pp. 5906-5915.
- Green N. L. Representation of argumentation in text with rhetorical structure theory //Argumentation. – 2010. – V. 24. – №. 2. – pp. 181-196.
- Small S. G., Medsker L. Review of information extraction technologies and applications //Neural computing and applications. – 2014. – V. 25. – №. 3. – pp. 533-548.
- Xiang W., Wang B. A survey of event extraction from text//IEEE Access. – 2019. – V. 7. – pp. 173111-173137.
- Adnan K., Akbar R. An analytical study of information extraction from unstructured and multidimensional big data //Journal of Big Data. – 2019. – V. 6. – №. 1. – pp. 1-38.
- Zadgaonkar A. V., Agrawal A. J. An overview of information extraction techniques for legal document analysis and processing //International Journal of Electrical & Computer Engineering (2088-8708). – 2021. – V. 11. – №. 6.
- Tian Y. et al. Improving biomedical named entity recognition with syntactic information //BMC bioinformatics. – 2020. – V. 21. – №. 1. – pp. 1-17.
- Chinsha T. C., Joseph S. A syntactic approach for aspect based opinion mining //Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015). – IEEE, 2015. – pp. 24-31.
- Rahimi Z., Noferesti S., Shamsfard M. Applying data mining and machine learning techniques for sentiment shifter identification //Language Resources and Evaluation. – 2019. – V. 53. – №. 2. – pp. 279-302.
- Feldman D. G., Vorontsov K. V., Sadekova T. R. Combining facts, semantic roles and sentiment lexicon in a generative model for opinion mining //Computational Linguistics and Intellectual Technologies. – 2020. – pp. 283-298.
- Mohammad S., Zhu X., Martin J. Semantic role labeling of emotions in tweets //Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. – 2014. – pp. 32-41.
- Campagnano C., Conia S., Navigli R. SRL4E–Semantic Role Labeling for Emotions: A Unified Evaluation Framework //Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). – 2022. – pp. 4586-4601.
- Xu K. et al. Exploiting rich syntactic information for semantic parsing with graph-to-sequence model //arXiv preprint arXiv:1808.07624. – 2018.
- Osipov G.S., Smirnov I.V., Tikhomirov I.A. 2008. Relyatsionno-situatsionniy metod poiska i analiza tekstov i ego prilozheniya [Relational-situational method for text search and analysis and its applications]. Iskusstvenniy intellekt i prinyatie resheniy [Artificial intelligence and decision making] 2:3-10.
- Smirnov I.V., Shelmanov A.O., Kuznetsova E.S., Khramoin I.V. 2014. Semantiko-sintaksicheskiy analiz estestvennykh yazykov. Chast' II. Metod semantikosintaksicheskogo analiza tekstov [Semantic-syntactic analysis of natural languages. Part II. Method of semantic-syntactic analysis of texts]. Iskusstvenniy intellekt i prinyatie resheniy [Artificial intelligence and decision making] 1:11-24.
- Shelmanov A. O., Smirnov I. V., Methods for Semantic Role Labeling of Russian Texts // Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference "Dialogue" (2014). Issue 13 (20). – 2014. – pp. 580-592.
- Larionov D., Shelmanov A., Chistova E., Smirnov I. Semantic role labeling with pretrained language models for known and unknown predicates // Proceedings of International Conference on Recent Advances of Natural Language Processing. – 2019. – pp. 619-628.
- Mann W. C., Thompson S. A. Rhetorical structure theory: Toward a functional theory of text organization // Text-Interdisciplinary Journal for the Study of Discourse. – 1988. – V. 8. – №. 3. – pp. 243-281.
- Chistova E., Shelmanov A., Pisarevskaya D., Kobozeva M., Isakov V., Panchenko A., Toldova S., Smirnov I. RST Discourse Parser for Russian: an Experimental Study of Deep Learning Models //International Conference on Analysis of Images, Social Networks and Texts. – Lecture Notes in Computer Science, vol 12602, Springer, Cham, 2021, pp. 105-119.
- Osipov G.S. 1997. Priobretenie znanij intellektual'nymi sistemami: Osnovy teorii i tekhnologii [Knowledge Acquisition by Intelligent Systems: Fundamentals of Theory and Technology]. Moscow: Fizmatlit. 117 p.
- Tikhomirov I.A, Smirnov I.V. 2008. Integratsiya lingvisticheskikh i statisticheskikh metodov poiska v poiskovoj mashine Exactus [Integration of linguistic and statistical search methods in the search engine Exactus]. Trudy mezhdunarodnoy konferentsii Dialog-2008 [Proceedings of the international conference Dialogue-2008]. Moscow: Izdatel'skiy tsentr RGGU [Publishing Center of the Russian State University for the Humanities]. 2008. 485-491.
- Smirnov I.V., Sochenkov I.V., Murav'ev V.V., Tikhomirov I.A. 2008. Rezul'taty i perspektivy poiskovogo algoritma Exactus [Results and prospects of the search algorithm Exactus]. Trudy rossiyskogo seminara po otsenke metodov informatsionnogo poiska ROMIP'2007-2008 [Proceedings of the Russian seminar on the evaluation of information retrieval methods ROMIP'2007-2008]. Saint Petersburg. 66-76.
- A.O. SHelmanov, M.I. Kamenskaya, I.V. Anan'eva, I.V. Smirnov. 2016. Semantiko-sintaksicheskij analiz tekstov v zadachah voprosno-otvetnogo poiska i izvlecheniya opredelenij [Semantic-syntactic analysis of texts for question-answering and extraction of definitions]. Iskusstvenny`j intellekt i prinyatie reshenij [Artificial intelligence and decision making] 4: 47–61.
- SHelmanov A.O., Devyatkin D.A., Isakov V.A., Smirnov I.V. 2019. Otkrytoe izvlechenie informacii iz tekstov. CHast' II. Izvlechenie semanticheskih otnoshenij s pomoshch'yu mashinnogo obucheniya bez uchitelya [Open information extraction from texts. Part II. Extracting Semantic Relationships with Unsupervised Machine Learning]. Iskusstvenny`j intellekt i prinyatie reshenij [Artificial intelligence and decision making] 2: 39–49.
- CHistova E. V., Larionov D. S., SHelmanov A. O., Latypova E. A., Smirnov I. V. 2021. Otkrytoe izvlechenie informacii iz tekstov. CHast' III. Sistema voprosno-otvetnogo poiska [Open information extraction from texts. Part III. Question and answer search system]. Iskusstvenny`j intellekt i prinyatie reshenij [Artificial intelligence and decision making] 4: 35-49.
- Kuznetsova Yu.M., Osipov G.S., Chudova N.V., Shvets A.V. 2012. Avtomaticheskoe ustanovlenie sootvetstviya statey trebovaniyam k nauchnym publikatsiyam [Automatic detection of compliance of articles with the requirements for scientific publications]. Trudy ISA RAN [Proceedings of ISA RAS] 62(3): 132-138.
- Chistova E. and Smirnov I. Discourse-aware text classification for argument mining // Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference "Dialogue" (2022). – 2022. – pp. 93-105.
- Enikolopov, S.N., Kuznecova, YU.M., Osipov, G.S., Smirnov, I.V., Chudova, N.V. 2021. Metod relyacionno-situacionnogo analiza teksta v psihologicheskih issledovaniyah [The method of relational-situational text analysis in psychological research]. Psihologiya. ZHurnal Vysshej shkoly ekonomiki [Psychology. Journal of the Higher School of Economics] 18(4):748-769.
- Enikolopov S.N., Medvedeva T.I., Vorontsova O.Yu., Chudova N.V., Kuznetsova Yu.M., Penkina M.Yu., Minin A.N., Stankevich M.A., Smirnov I.V., Lyubavskaya A.A. 2018. Lingvisticheskie kharakteristiki tekstov psikhicheski bol'nyh i zdorovykh lyudey [Linguistic characteristics of texts of mentally ill and healthy people]. Psikhologicheskie issledovaniya [Psychological research] 61(11): 1.
- Smirnov I., Stankevich M., Kuznetsova Y., Suvorova M., Larionov D., Nikitina E., Savelov M., Grigoriev O. TITANIS: A Tool for Intelligent Text Analysis in Social Media // In: Kovalev S.M., Kuznetsov S.O., Panov A.I. (eds) Artificial Intelligence. RCAI 2021. Lecture Notes in Computer Science, Springer, Cham, vol 12948. pp 232-247.
- Osipov G.S., Smirnov I.V. 2016. Semanticheskij analiz nauchnyh tekstov i ih bol'shih massivov [Semantic analysis of scientific texts and their large-scale collections]. Sistemy vysokoj dostupnosti [High availability systems] 1: 41-44.
- Kuznetsova Yu.M., Smirnov I.V., Stankevich M.A., Chudova N.V. 2019. Sozdanie instrumenta avtomaticheskogo analiza teksta v interesakh socio-gumanitarnykh issledovaniy. Chast' 2. Mashina RSA i opyt ee ispol'zovaniya [Creating a Text Analysis Tool for Socio-Humanitarian Research. Part 2. The RSA Machine and the Experience in Using It]. Iskusstvenniy intellekt i prinyatie resheniy [Artificial intelligence and decision making] 3: 40-51.
Supplementary files
