Algorithm for Constructing an Oriented Acyclic Graph of Words
- Authors: Shiyan V.I.1
-
Affiliations:
- Kuban State University
- Issue: No 3 (2024)
- Pages: 104-112
- Section: Analysis of Textual and Graphical Information
- URL: https://journals.rcsi.science/2071-8594/article/view/265362
- DOI: https://doi.org/10.14357/20718594240308
- EDN: https://elibrary.ru/JUSFEQ
- ID: 265362
Cite item
Full Text
Abstract
The algorithm presented in the article makes it possible to efficiently build and modify minimal deterministic finite automata for recognizing a given set of words, including when processing a large amount of information in real time. The key feature of the algorithm is the ability to add new words to the machine and its subsequent minimization on the fly. The algorithm is based on lexicographic ordering of a set of input words and has a low computational complexity compared to traditional algorithms such as the Hopcroft algorithm or an algorithm using the construction of pairs of distinguishable states. The development of this algorithm is aimed at increasing the speed of constructing minimal deterministic finite automata and their modification for effective natural language processing and real-time web content analysis.
About the authors
Valery I. Shiyan
Kuban State University
Author for correspondence.
Email: shiyanvi@yandex.ru
Lecturer, Department of Computing Technologies
Russian Federation, KrasnodarReferences
- Daciuk J., Mihov S., Watson B.W., Watson R.E. Incremental construction of minimal acyclic finite-state automata // Computational Linguistics. 2000. V. 26. No 1. P. 3-16.
- Watson B.W. A taxonomy of finite automata construction algorithms // Computing Science Notes. V. 9343, 1993.
- Black P.E., Pieterse V. Directed acyclic word graph // Dictionary of Algorithms and Data Structures. 1998.
- Blumer A.C., Blumer J.A., Haussler D., Ehrenfeucht A., Chen M.-T., Seiferas J.I. The smallest automation recognizing the subwords of a text // Theoretical Computer Science. 1985. P. 31-55.
- Lucchesi C.L., Kowaltowski T. Applications of finite automata representing large vocabularies // Software: Practice and Experience. 1993. V. 23. No 1. P. 15-30.
- Ciura M.G., Deorowicz S. How to squeeze a lexicon. Software: Practice and Experience. 2001. V. 31. No 11. P. 1077-1090.
- Crochemore M., Vérin R. On compact directed acyclic word graphs // In Structures in Logic and Computer Science. 1997. P. 192-211.
- Daciuk J., Watson B.W., Watson R.E. Incremental construction of minimal acyclic finite state automata and transducers // Proceedings of the International Workshop on Finite State Methods in Natural Language Processing. 1998. P. 48-56.
- Mihov S. Direct building of minimal automaton for given list // Annuaire de l’Université de Sofia. 1998. V. 91. No 1. P. 38-40.
- Kuznetsov O. P., Adelson-Velsky G. M. Avtomaty// Diskretnaya matematika dlya ingenerov [Automata // Discrete Mathematics for Engineers]. Moscow: Energiya, 1980. 344 p.
- Moll R.N., Arbib M.A., Kfoury A.J. An introduction to formal language theory. Springer-Verlag, 1988.
- Watson B.W. Taxonomies and Toolkits of Regular Language Algorithms. Ph.D. thesis, Eindhoven University of Technology, the Netherlands. 1995.
- Hopcroft D., Motwani R., Ullman D. Vvedenie v teoriyu avtomatov, yazykov i vychisleniy. Uchebnik. 2-e izdanie. [Introduction to Automata Theory, Languages, and Computation: Textbook: 2nd Edition]. Moscow: Williams Publishers. 2002. 61 p.
Supplementary files
