


Vol 50, No 4 (2016)
- Year: 2016
- Articles: 4
- URL: https://journals.rcsi.science/0005-1055/issue/view/8963
Article
On determining semantic similarity based on relationships of a combined thesaurus
Abstract
Problems of the use of thesauruses for fuzzy comparisons of conceptual patterns are considered. A measure of semantic similarity that can be calculated using hierarchical and association relationships of a thesaurus is proposed, as well as an algorithm to compile a semantic intersection of conceptual patterns based on the coinciding maximum principle. A massive of texts and conceptual search patterns of thesis papers was used for experimental studies, which proved that the use of the lexis of different subject fields of a multi-area thesaurus produced a more precise identification of sematic similarity. The power of the pattern intersection increased significantly through pairs of descriptors linked by the semantic similarity measure; however, the average degree of pairwise intersection only increased by 1–2%, which implies an insignificant “expansion” of a conceptual pattern as it is used as a search pattern in creating search-result outputs in automated search mechanisms.



The cognitive approach to a document and the document sphere
Abstract
This paper considers the problems of using the cognitive approach to uncovering the essence of a document and the field of the existence of documents. The need for the “document sphere” concept is substantiated and different interpretations of this concept are proposed. The article is written at the interface of information science, document science, and library science.



The principles of the design of the state scientometric system
Abstract
This paper reviews the methodology that underlies the development of the state scientometric system in the Russian Federation. International practices of scientometric system design are explored. The goals and targets of the Russian scientometric system are defined. The selection principles are formulated. A set of scientometric indicators is proposed for the inclusion in the Russian scientometric system.



Evaluation of the efficiency of the chi-square metric
Abstract
The efficiency of using the chi-square metrics to weigh terms used in text documents is evaluated. The procedure includes the selection and advanced processing of class C and ~C texts, compilation of a reference dictionary and calculation of scores for all the terms in the dictionary, calculation of χ2 coefficients for terms from a class C text, and calculation of the general efficiency factor by the sum of the coefficients found for the terms from the reference dictionary. The weighting by the χ2 formula, odds-ratio (OR) formula, and on the basis of probabilistic variables is analyzed and compared. It was found that the best result is yielded by the OR-based weighting.


