Methods for Combining Multiple Text Recognition Results
- Authors: Arlazarov V.V.1,2
-
Affiliations:
- Computer Science and Control Federal Research Center of the Russian Academy of Sciences
- Smart Engines Service LLC
- Issue: No 3 (2022)
- Pages: 106-116
- Section: Analysis of Textual and Graphical Information
- URL: https://journals.rcsi.science/2071-8594/article/view/270477
- DOI: https://doi.org/10.14357/20718594220309
- ID: 270477
Cite item
Full Text
Abstract
The task of per-frame combination of text recognition results from multiple images is an important component of video stream document recognition systems. Currently there is no unified approach to solving this problem which would yield a high precision of text recognition. In this paper a comparative study is presented of known approaches to the combination of recognition results for identity document fields. It was demonstrated that different approaches are advantageous on different parts of the data sets, while a sepection of the potential best single result can still significantly outperform all the analyzed methods.
About the authors
Vladimir V. Arlazarov
Computer Science and Control Federal Research Center of the Russian Academy of Sciences; Smart Engines Service LLC
Author for correspondence.
Email: vva777@gmail.com
Candidate of Technical Sciences, Head of the Department
Russian Federation, Moscow; MoscowReferences
- S. C. Kosaraju, M. Masum, N. Z. Tsaku, P. Patel, T. Bayramoglu, G. Modgil, M. Kang. “DoT-Net: Document layout classification using texture-based CNN” // International Conference on Document Analysis and Recognition (ICDAR), 2019, P. 1029-1034. doi: 10.1109/ICDAR.2019.00168.
- D. He, D. Cohen, B. Price, D. Kifer, C. L. Giles. “Multiscale multi-task FCN for semantic page segmentation and table detection” // International Conference on Document Analysis and Recognition (ICDAR), 2017, P. 254-261. doi: 10.1109/ICDAR.2017.50.
- F. Jia, C. Shi, Y. Wang, C. Wang, B. Xiao. “Grayscaleprojection based optimal character segmentation for camera-captured faint text recognition” // International Conference on Document Analysis and Recognition, 2017, P. 1301-1306. doi: 10.1109/ICDAR.2017.214.
- J. Baek et al., "What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis" // IEEE/CVF International Conference on Computer Vision (ICCV), 2019, P. 4714-4722. doi: 10.1109/ICCV.2019.00481.
- H. Li, S. Wang, A. C. Kot. “Image recapture detection with convolutional and recurrent neural networks” // Electronic Imaging, 2017 (7): P. 87-91. doi: 10.2352/ISSN.2470-1173.2017.7.MWSF-329.
- N. Yusoff, L. Alamro. “Implementation of feature extraction algorithms for image tampering detection” // International Journal of Advanced Computer Research, 2019, 9(43), P. 197-211. doi: 10.19101/IJACR.PID37.
- D. Wemhoener, I. Z. Yalniz, R. Manmatha, "Creating an Improved Version Using Noisy OCR from Multiple Editions" // International Conference on Document Analysis and Recognition (ICDAR), 2013, P. 160-164. doi: 10.1109/ICDAR.2013.39.
- R. Wang, S. M. Pizer, J.-M. Frahm, “Recurrent neural network for (un-)supervised learning of monocular video visual odometry and depth” // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, P. 5555–5564.
- J. Jeong, Y. H. Yoon, J. H. Park, “Reliable road scene interpretation based on itom with the integrated fusion of vehicle and lane tracker in dense traffic situation” // Sensors 20, 2020, Article No. 2457. doi: 10.3390/s20092457.
- K. B. Bulatov, N. V. Fedotova and V. V. Arlazarov, “An approach to road scene text recognition with per-frame accumulation and dynamic stopping decision,” // International Conference on Machine Vision (ICMV 2020), 2021, V. 11605, P. 116051S1-116051S10. doi: 10.1117/12.2586912.
- K. B. Bulatov, P. V. Bezmaternykh, D. P. Nikolaev and V. V. Arlazarov, “Towards a unified framework for identity documents analysis and recognition” // Computer Optics, V. 46, N. 3, P. 436-454, 2022. doi: 10.18287/2412-6179CO-1024.
- D. V. Polevoy, K. B. Bulatov, N. S. Skoryukina, T. S. Chernov, V. V. Arlazarov, A. V. Sheshkus, “Key Aspects of Document Recognition Using Small Digital Cameras” // Vestnik RFFI, N. 4, P. 97-108, 2016. doi: 10.22204/2410-4639-2016-092-04-97-108.
- T. Kohonen, “Median strings” // Pattern Recognition Letters, V. 3, N. 5, 1985, P. 309-313. doi: 10.1016/01678655(85)90061-3.
- J. G. Fiscus. “A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER).” // IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings, 1997, P. 347-354.
- J. Kittler, M. Hatef, R. P. W. Duin, J. Matas, "On combining classifiers" // IEEE Transactions on Pattern Analysis and Machine Intelligence, V. 20, N. 3, P. 226239, 1998. doi: 10.1109/34.667881.
- A. B. Petrovsky. Methods of group classification of multifeatured objects (part 1) // Artificial intelligence and decision making. 2009. No 3. P. 3-14.
- A. B. Petrovsky. Methods of group classification of multifeatured objects (part 2) // Artificial intelligence and decision making. 2009. No 4. P. 3-14.
- D. V. Polevoy, M. A. Aliev, D. P. Nikolaev. “Choosing the best image of the document owner’s photograph in the video stream on the mobile device” // International Conference on Machine Vision (ICMV 2020), V. 11605, P. 1-9, 2021. doi: 10.1117/12.2586939.
- C. Zhanzhan, L. Jing, N. Yi, P. Shiliang, W. Fei, Z. Shuigeng. “You only recognize once: Towards fast video text spotting” // 27th ACM International Conference, 2019, P. 855-863. doi: 10.1145/3343031.3351093.
- V. L. Arlazarov, O. A. Slavin, V. V. Farsobina. “Algorithms for finding optional position of images upon summation” // Artificial intelligence and decision making, V. 2, P. 25-34, 2015.
- M. Haris, G. Shakhnarovich, N. Ukita. “Recurrent backprojection network for video super-resolution” // Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019, P. 3897-3906. doi: 10.1109/CVPR.2019.00402.
- K. Mehregan, A. Ahmadyfard, H. Khosravi. “Superresolution of license-plates using frames of low-resolution video” // 5th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), 2019, P. 1-6. doi: 10.1109/ICSPIS48872.2019.9066104.
- C. Merino-Gracia, M. Mirmehdi. “Real-time text tracking in natural scenes” // IET Computer Vision, 2014, 8(6), P. 670-681. doi: 10.1049/iet-cvi.2013.0217.
- S. Zhang, P. Li, Y. Meng, L. Li, Q. Zhou, X. Fu. “A video deblurring algorithm based on motion vector and an encoder-decoder network” // IEEE Access, 2019, V. 7, P. 86778-86788. doi: 10.1109/ACCESS.2019.2923759.
- V. V. Myasnikov, E. A. Dmitriev. “The accuracy dependency investigation of simultaneous localization and mapping on the errors from mobile device sensors” // Computer Optics, 2019, V. 43, N. 3, P. 492-503. doi: 10.18287/2412-6179-2019-43-3-492-503.
- K. B. Bulatov, “Selecting optimal strategy for combining per-frame character recognition results in video stream” // Information technologies and computing systems, V. 3, 2017, P. 45-55.
- R. Polikar. “Ensemble based systems in decision making” // IEEE Circuits and Systems Magazine, 2006, V. 6, N. 3, P. 21-45. doi: 10.1109/MCAS.2006.1688199.
- Z. H. Zhou “Ensemble methods: Foundations and algorithms”. New York: Chapman and Hall/CRC, 2012, ISBN: 978-1-4398-3003-1.
- K. B. Bulatov, V. Y. Kirsanov, V. V. Arlazarov, D. P. Nikolaev, D. V. Polevoy, “Methods for integration of document text field recognition results in a videostream of a mobile device” // Vestnik RFFI, N 4, P. 109-115, 2016. doi: 10.22204/2410-4639-2016-092-04-109-115.
- T. I. Buldakova, O. A. Slavin, D. N. Putintsev, “Algorithms for integration of video stream recognition results of identity document fields” // Mezhdunarodnyy zhurnal prikladnykh i fundamentalnykh issledovaniy, N. 7, part 2, P. 172-175, 2017.
- K. B. Bulatov, “A Method to Reduce Errors of String Recognition Based on Combination of Several Recognition Results with Per-Character Alternatives” // Bulletin of the South Ural State University, Series: Mathematical Modelling, Programming and Computer Software, V. 12, N 3, P. 74-88, 2019, doi: 10.14529/mmp190307.
- O. Petrova, K. Bulatov, V. L. Arlazarov, “Methods of weighted combination for text field recognition in a video stream” // International Conference on Machine Vision (ICMV 2019), V. 11433, 11433 2L, P. 1-7, 2020, doi: 10.1117/12.2559378.
- E. Andreeva, V. V. Arlazarov, O. Slavin, I. Janiszewski. “Experimental modeling the flow of character recognition results in video stream for document recognition” // International Conference on Machine Vision (ICMV 2018), V. 11041, 110411L, P. 1-6, 2019, doi: 10.1117/12.2522970.
- V. V. Arlazarov, O. A. Slavin, A. V. Uskov, I. M. Janiszewski, “Modelling the flow of character recognition results in video stream” // Bulletin of the South Ural State University, Series: Mathematical Modelling, Programming and Computer Software, V. 11, N 2, P. 14-28, 2018. doi: 10.14529/mmp180202.
- S. Reddy, M. Mathew, L. Gomez, M. Rusinol, D. Karatzas., C. V. Jawahar, “Roadtext-1k: Text detection & recognition dataset for driving videos” // arXiv preprint 2005.09496, 2020.
- T. S. Chernov, N. P. Razumnyy, A. S. Kozharinov, D. P. Nikolaev, V. V. Arlazarov, “Image quality assessment for video stream recognition systems.” // Infiormation technologies and computing systems, N 4, P. 71-82, 2017.
- K. Bulatov, D. Polevoy, “Reducing overconfidence in neural networks by dynamic variation of recognizer relevance” // European Conference on Modelling and Simulation (ECMS 2015), 2015, P. 488-491. doi: 10.7148/2015-0488.
- V. V. Arlazarov, K. Bulatov, T. Chernov, V. L. Arlazarov, “MIDV-500: A Dataset for Identity Document Analysis and Recognition on Mobile Devices in Video Stream” // Computer Optics, V. 43, N 5, P. 818-824, 2019. doi: 10.18287/2412-6179-2019-43-5-818-824.
- K. Bulatov, D. Matalov, V. V. Arlazarov, “MIDV-2019: Challenges of the Modern Mobile-Based Document OCR” // International Conference on Machine Vision (ICMV 2019), V. 11433, 114332N, P. 1-6, 2020. DOI:
- 1117/12.2558438.
- L. Yujian, L. Bo. “A normalized Levenshtein distance metric” // IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(6), P. 1091-1095. doi: 10.1109/TPAMI.2007.1078.
- Y. S. Chernyshova, A. V. Sheshkus, V. V. Arlazarov, “Two-step CNN framework for text line recognition in camera-captured images” // IEEE Access, V. 8, P. 3258732600, 2020. doi: 10.1109/ACCESS.2020.2974051.
Supplementary files
