Categorization of text documents taking into account some structural features
- 作者: Gulin V.1, Frolov A.1
-
隶属关系:
- National Research University Moscow Energy Institute
- 期: 卷 55, 编号 1 (2016)
- 页面: 96-105
- 栏目: Artificial Intelligence
- URL: https://journals.rcsi.science/1064-2307/article/view/219545
- DOI: https://doi.org/10.1134/S1064230715060088
- ID: 219545
如何引用文章
详细
This paper reviews the possibility of upgrading the conventional “bag-of-words” model to reflect the structural features of text documents and take them into account in the process of categorization by means of machine learning theory methods. It is suggested to use these features to characterize the relationships within a set of tokens. It is also proposed to use the names of such relationships as features, along with the names of tokens. The proposed models differ from the traditional approach, which only reflects unary relations. The efficiency of the upgraded methods of machine learning is tested by means of computer experiments run for the Reuters-21578 set classes by using eight common classifiers. The relevance of applying such a modernized approach to categorize text documents with the help of simple classifiers is demonstrated.
作者简介
V. Gulin
National Research University Moscow Energy Institute
编辑信件的主要联系方式.
Email: Gulin.vladimir@gmail.com
俄罗斯联邦, Moscow
A. Frolov
National Research University Moscow Energy Institute
Email: Gulin.vladimir@gmail.com
俄罗斯联邦, Moscow
![](/img/style/loading.gif)