Synthetic Sample Extension in Implementation of Tangut Character Databases


如何引用文章

全文:

开放存取 开放存取
受限制的访问 ##reader.subscriptionAccessGranted##
受限制的访问 订阅存取

详细

The Tangut script was a logographic writing system used for the extinct Tangut language of the Western Xia Dynasty, which spanned 1038 to 1227. The technic of optical character recognition, machine learning, and computer vision will help greatly in the unscrambling of the character in the ancient scripts. But all these technics are based on the character database, which provides learning samples and test standards. In the process of building the Tangut Character Databases using the ancient Tangut scripts as a data source, it is found that the problem of imbalanced class distribution significantly compromises the performance of learning algorithms. A method of synthetic sample generation was proposed in this paper to improve the performance of learning and recognition of Tangut characters. The comparison of recognition accuracy between the learning base in the original data set and the synthetic generated data set was demonstrated, and presented an impressive superiority utilizing the researchers’ method. The organization of Tangut character databases was also introduced in this paper.

作者简介

Yifei Meng

School of Electronic and Information Engineering Beijing Jiaotong University; School of Physics and Electronic-Electrical Engineering Ningxia University

编辑信件的主要联系方式.
Email: river_dance@163.com
中国, Beijing; Yinchuan

Xue Yuan

School of Electronic and Information Engineering Beijing Jiaotong University

Email: river_dance@163.com
中国, Beijing

Xueye Wei

School of Electronic and Information Engineering Beijing Jiaotong University

Email: river_dance@163.com
中国, Beijing

Wenhui Yang

School of Physics and Electronic-Electrical Engineering Ningxia University

Email: river_dance@163.com
中国, Yinchuan

Yan Chen

School of Physics and Electronic-Electrical Engineering Ningxia University

Email: river_dance@163.com
中国, Yinchuan

补充文件

附件文件
动作
1. JATS XML

版权所有 © Allerton Press, Inc., 2018