Categorization of text documents taking into account some structural features

V. V. Gulin; A. B. Frolov

doi:10.1134/S1064230715060088

Categorization of text documents taking into account some structural features

Авторлар: Gulin V.V.¹, Frolov A.B.¹
Мекемелер:
1. National Research University Moscow Energy Institute
Шығарылым: Том 55, № 1 (2016)
Беттер: 96-105
Бөлім: Artificial Intelligence
URL: https://journals.rcsi.science/1064-2307/article/view/219545
DOI: https://doi.org/10.1134/S1064230715060088
ID: 219545

Дәйексөз келтіру

Толық мәтін

Ашық рұқсат
Рұқсат жабық

Рұқсат берілді
Рұқсат жабық

Тек жазылушылар үшін

Аннотация
Авторлар туралы
Әдебиет тізімі
Қосымша файлдар
Статистика

Аннотация

This paper reviews the possibility of upgrading the conventional “bag-of-words” model to reflect the structural features of text documents and take them into account in the process of categorization by means of machine learning theory methods. It is suggested to use these features to characterize the relationships within a set of tokens. It is also proposed to use the names of such relationships as features, along with the names of tokens. The proposed models differ from the traditional approach, which only reflects unary relations. The efficiency of the upgraded methods of machine learning is tested by means of computer experiments run for the Reuters-21578 set classes by using eight common classifiers. The relevance of applying such a modernized approach to categorize text documents with the help of simple classifiers is demonstrated.

Негізгі сөздер

System Science International, Text Document, Topological Form, Word Model, Unstructured Model

Авторлар туралы

V. Gulin

National Research University Moscow Energy Institute

Хат алмасуға жауапты Автор.
Email: Gulin.vladimir@gmail.com
Ресей, Moscow

A. Frolov

National Research University Moscow Energy Institute

Email: Gulin.vladimir@gmail.com
Ресей, Moscow

Қосымша файлдар

Әрекет

1. JATS XML

Жүктеу

Пайдаланушының аты
Құпиясөз
Мені есте сақтау

Құпия сөзді ұмыттыңыз ба?	Тіркеу

Пайдаланушының аты
Құпиясөз
Мені есте сақтау

Құпия сөзді ұмыттыңыз ба?	Тіркеу