Investigation of features for extraction of named entities from texts in Russian

V. A. Mozharova; N. V. Lukashevich

doi:10.3103/S0005105517030049

Investigation of features for extraction of named entities from texts in Russian

Authors: Mozharova V.A.¹, Lukashevich N.V.²
Affiliations:
1. Department of Computational Mathematics and Cybernetics
2. Scientific Research Computational Center
Issue: Vol 51, No 3 (2017)
Pages: 127-134
Section: Text Processing Automation
URL: https://journals.rcsi.science/0005-1055/article/view/150171
DOI: https://doi.org/10.3103/S0005105517030049
ID: 150171

Cite item

Full Text

Open Access
Restricted Access

Access granted
Restricted Access

Subscription Access

Abstract
About the authors
References
Supplementary files
Statistics

Abstract

This paper considers various features for extracting named entities from texts in Russian, which are used within the approaches based on machine learning, including the features of a token itself (lexeme), as well as vocabulary, contextual, cluster, and two-stage features. The contribution of each feature to improving the quality of extraction of named entities is studied. The CRF-classifier is used as a method of machine learning in the experiments that are described in this paper. The contribution of features is compared based on two open collections using the F-measure.

Keywords

named entity, information extraction, machine learning

About the authors

V. A. Mozharova

Department of Computational Mathematics and Cybernetics

Author for correspondence.
Email: valerie.mozharova@gmail.com
Russian Federation, Moscow, 119991

N. V. Lukashevich

Scientific Research Computational Center

Email: valerie.mozharova@gmail.com
Russian Federation, Moscow, 119991

Supplementary files

Supplementary Files

Action

1. JATS XML

Download

Username
Password
Remember me

Forgot password?	Register

Username
Password
Remember me

Forgot password?	Register