Open Access Open Access  Restricted Access Access granted  Restricted Access Subscription Access

Vol 50, No 4 (2016)

Article

On determining semantic similarity based on relationships of a combined thesaurus

Golitsyna O.L., Maksimov N.V., Fedorova V.A.

Abstract

Problems of the use of thesauruses for fuzzy comparisons of conceptual patterns are considered. A measure of semantic similarity that can be calculated using hierarchical and association relationships of a thesaurus is proposed, as well as an algorithm to compile a semantic intersection of conceptual patterns based on the coinciding maximum principle. A massive of texts and conceptual search patterns of thesis papers was used for experimental studies, which proved that the use of the lexis of different subject fields of a multi-area thesaurus produced a more precise identification of sematic similarity. The power of the pattern intersection increased significantly through pairs of descriptors linked by the semantic similarity measure; however, the average degree of pairwise intersection only increased by 1–2%, which implies an insignificant “expansion” of a conceptual pattern as it is used as a search pattern in creating search-result outputs in automated search mechanisms.

Automatic Documentation and Mathematical Linguistics. 2016;50(4):139-153
pages 139-153 views

The cognitive approach to a document and the document sphere

Sokolov A.V.

Abstract

This paper considers the problems of using the cognitive approach to uncovering the essence of a document and the field of the existence of documents. The need for the “document sphere” concept is substantiated and different interpretations of this concept are proposed. The article is written at the interface of information science, document science, and library science.

Automatic Documentation and Mathematical Linguistics. 2016;50(4):154-160
pages 154-160 views

The principles of the design of the state scientometric system

Kalachikhin P.A.

Abstract

This paper reviews the methodology that underlies the development of the state scientometric system in the Russian Federation. International practices of scientometric system design are explored. The goals and targets of the Russian scientometric system are defined. The selection principles are formulated. A set of scientometric indicators is proposed for the inclusion in the Russian scientometric system.

Automatic Documentation and Mathematical Linguistics. 2016;50(4):161-172
pages 161-172 views

Evaluation of the efficiency of the chi-square metric

Yatsko V.A.

Abstract

The efficiency of using the chi-square metrics to weigh terms used in text documents is evaluated. The procedure includes the selection and advanced processing of class C and ~C texts, compilation of a reference dictionary and calculation of scores for all the terms in the dictionary, calculation of χ2 coefficients for terms from a class C text, and calculation of the general efficiency factor by the sum of the coefficients found for the terms from the reference dictionary. The weighting by the χ2 formula, odds-ratio (OR) formula, and on the basis of probabilistic variables is analyzed and compared. It was found that the best result is yielded by the OR-based weighting.

Automatic Documentation and Mathematical Linguistics. 2016;50(4):173-178
pages 173-178 views