Information-Theoretic method for classification of texts


Citar

Texto integral

Acesso aberto Acesso aberto
Acesso é fechado Acesso está concedido
Acesso é fechado Somente assinantes

Resumo

We consider a method for automatic (i.e., unmanned) text classification based on methods of universal source coding (or “data compression”). We show that under certain restrictions the proposed method is consistent, i.e., the classification error tends to zero with increasing text lengths. As an example of practical use of the method we consider the classification problem for scientific texts (research papers, books, etc.). The proposed method is experimentally shown to be highly efficient.

Sobre autores

B. Ryabko

Institute of Computational Technologies; Novosibirsk State University

Autor responsável pela correspondência
Email: boris@ryabko.net
Rússia, Novosibirsk; Novosibirsk

A. Gus’kov

Institute of Computational Technologies; Russian National Public Library for Science and Technnology

Email: boris@ryabko.net
Rússia, Novosibirsk; Novosibirsk

I. Selivanova

Novosibirsk State University; Russian National Public Library for Science and Technnology

Email: boris@ryabko.net
Rússia, Novosibirsk; Novosibirsk

Arquivos suplementares

Arquivos suplementares
Ação
1. JATS XML

Declaração de direitos autorais © Pleiades Publishing, Inc., 2017