Classification of Text Documents Based on a Probabilistic Topic Model
- Autores: Karpovich S.N.1, Smirnov A.V.2, Teslya N.N.2
-
Afiliações:
- Olymp Corporation
- St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences (SPIIRAS)
- Edição: Volume 46, Nº 5 (2019)
- Páginas: 314-320
- Seção: Article
- URL: https://journals.rcsi.science/0147-6882/article/view/175524
- DOI: https://doi.org/10.3103/S0147688219050034
- ID: 175524
Citar
Resumo
An approach to text document classification that utilizes a probabilistic topic model, which is characterized by the fact that its training document set contains objects of only one class, is proposed. This approach makes it possible to identify positive samples (samples resembling the target class) in collections and streams of text documents. This article considers models created for solving the problems of text document classification and trained on samples of a single class, describes their key features. The Positive Example Based Learning-TM classification model is presented and a software prototype that implements it as a basis for classification of text documents is developed. Despite having no information about negative document samples, the model demonstrates a high level of classification accuracy that exceeds the performance of alternative approaches. The superiority of the Positive Example Based Learning-TM model with respect to the classification accuracy criterion when using a small training set is experimentally proven.
Palavras-chave
Sobre autores
S. Karpovich
Olymp Corporation
Autor responsável pela correspondência
Email: cims@yandex.ru
Rússia, Moscow, 121205
A. Smirnov
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences (SPIIRAS)
Autor responsável pela correspondência
Email: smir@iias.spb.su
Rússia, St. Petersburg, 199178
N. Teslya
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences (SPIIRAS)
Autor responsável pela correspondência
Email: teslya@iias.spb.su
Rússia, St. Petersburg, 199178
Arquivos suplementares
