On the classification of text documents taking into account their structural features

V. V. Gulin; A. B. Frolov

doi:10.1134/S1064230716030102

On the classification of text documents taking into account their structural features

Autores: Gulin V.V.¹, Frolov A.B.¹
Afiliações:
1. Moscow Power Engineering Institute (National Research University)
Edição: Volume 55, Nº 3 (2016)
Páginas: 394-403
Seção: Pattern Recognition and Image Processing
URL: https://journals.rcsi.science/1064-2307/article/view/219634
DOI: https://doi.org/10.1134/S1064230716030102
ID: 219634

Citar

Texto integral

Acesso aberto
Acesso é fechado

Acesso está concedido
Acesso é fechado

Somente assinantes

Resumo
Sobre autores
Bibliografia
Arquivos suplementares
Estatísticas

Resumo

A modification of the conventional bag of words model that can take into account the structural features of text documents in their classification (categorization) using machine learning techniques is studied. It is proposed to describe these features by relations on the set of certain lexemes and use the relation names, along with the lexeme names, as features. This is a distinction from the conventional model in which only unary relations are used. The effectiveness of the proposed machine learning techniques is analyzed using computer experiments on the class of the Reuters-21578 collection with eight known classifiers. It is shown that it is reasonable to apply the proposed models to classify documents using simple classifiers.

Sobre autores

V. Gulin

Moscow Power Engineering Institute (National Research University)

Autor responsável pela correspondência
Email: gulin.vladimir@gmail.com
Rússia, Moscow, 111250

A. Frolov

Moscow Power Engineering Institute (National Research University)

Email: gulin.vladimir@gmail.com
Rússia, Moscow, 111250

Arquivos suplementares

Ação

1. JATS XML

Baixar

Nome de usuário
Senha
Lembrar usuário

Esqueceu a senha?	Cadastro

Nome de usuário
Senha
Lembrar usuário

Esqueceu a senha?	Cadastro