Evaluation of the efficiency of the chi-square metric
- Authors: Yatsko V.A.1
-
Affiliations:
- Katanov Khakassia State University
- Issue: Vol 50, No 4 (2016)
- Pages: 173-178
- Section: Article
- URL: https://journals.rcsi.science/0005-1055/article/view/150138
- DOI: https://doi.org/10.3103/S0005105516040051
- ID: 150138
Cite item
Abstract
The efficiency of using the chi-square metrics to weigh terms used in text documents is evaluated. The procedure includes the selection and advanced processing of class C and ~C texts, compilation of a reference dictionary and calculation of scores for all the terms in the dictionary, calculation of χ2 coefficients for terms from a class C text, and calculation of the general efficiency factor by the sum of the coefficients found for the terms from the reference dictionary. The weighting by the χ2 formula, odds-ratio (OR) formula, and on the basis of probabilistic variables is analyzed and compared. It was found that the best result is yielded by the OR-based weighting.
About the authors
V. A. Yatsko
Katanov Khakassia State University
Author for correspondence.
Email: viacheslav-yatsko@rambler.ru
Russian Federation, pr. Lenina 92, Abakan, Khakassia, 655000
Supplementary files
