Classification of Text Documents Based on a Probabilistic Topic Model

S. N. Karpovich; A. V. Smirnov; N. N. Teslya

doi:10.3103/S0147688219050034

Classification of Text Documents Based on a Probabilistic Topic Model

作者: Karpovich S.N.¹, Smirnov A.V.², Teslya N.N.²
隶属关系:
1. Olymp Corporation
2. St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences (SPIIRAS)
期: 卷 46, 编号 5 (2019)
页面: 314-320
栏目: Article
URL: https://journals.rcsi.science/0147-6882/article/view/175524
DOI: https://doi.org/10.3103/S0147688219050034
ID: 175524

如何引用文章

全文:

开放存取

##reader.subscriptionAccessGranted##
受限制的访问

订阅存取

详细
作者简介
参考
补充文件
统计

详细

An approach to text document classification that utilizes a probabilistic topic model, which is characterized by the fact that its training document set contains objects of only one class, is proposed. This approach makes it possible to identify positive samples (samples resembling the target class) in collections and streams of text documents. This article considers models created for solving the problems of text document classification and trained on samples of a single class, describes their key features. The Positive Example Based Learning-TM classification model is presented and a software prototype that implements it as a basis for classification of text documents is developed. Despite having no information about negative document samples, the model demonstrates a high level of classification accuracy that exceeds the performance of alternative approaches. The superiority of the Positive Example Based Learning-TM model with respect to the classification accuracy criterion when using a small training set is experimentally proven.

关键词

classification, binary classification, topic modeling, natural language processing

补充文件

附件文件

动作

1. JATS XML

下载

用户名
密码
记住我

忘记您的密码?	注册

用户名
密码
记住我

忘记您的密码?	注册

Classification of Text Documents Based on a Probabilistic Topic Model

全文:

详细

关键词

作者简介

S. Karpovich

A. Smirnov

N. Teslya

补充文件