Relevance of a Set of Topical Texts to a Knowledge Unit and the Estimation of the Closeness of Linguistic Forms of Its Expression to a Semantic Pattern


Cite item

Full Text

Open Access Open Access
Restricted Access Access granted
Restricted Access Subscription Access

Abstract

Interrelated problems of completeness of knowledge extraction from a set (corpus) of subject-oriented texts are analyzed through the relevance to a source phrase and the search for the most rational linguistic variant of the description of a selected knowledge fragment. These problems are topical for constructing systems of information processing, analysis, estimation, and understanding. In addition, the basis for extracting image components of a source phrase is the joint estimation of the coupling strength of its word combinations encountered in the phrases of a text analyzed, and the splitting of these words into classes by the value of the TF-IDF metric relative to the corpus texts. The relevance of a text corpus to a source knowledge unit by the degree of covering the words of a source phrase with the most relevant sets of relations relative to documents in which its image components are represented most fully is introduced by expanding word relations to three and more elements (using the base of known syntactic relations and without using it). This estimation is proposed for the targeted selection of text-corpus phrases that are either mutually equivalent or semantically complementary to each other and represent the same image. To rank the selected phrases by the degree of closeness to a semantic pattern (i.e. sense standard), three alternative estimations are introduced: based on splitting the source-phrase words into classes by the meaning of the TF-IDF metric and based on the numerical estimation of their binding strength (considering prepositions and conjunctions and without them). In addition, the text information necessary to represent a selected knowledge unit is compressed at least two times preserving its meaning.

About the authors

G. M. Emelyanov

Novgorod State University

Author for correspondence.
Email: Gennady.Emelyanov@novsu.ru
Russian Federation, ul. B. S.-Peterburgskaya 41, Velikii Novgorod, 173003

D. V. Mikhailov

Novgorod State University

Email: Gennady.Emelyanov@novsu.ru
Russian Federation, ul. B. S.-Peterburgskaya 41, Velikii Novgorod, 173003

A. P. Kozlov

Novgorod State University

Email: Gennady.Emelyanov@novsu.ru
Russian Federation, ul. B. S.-Peterburgskaya 41, Velikii Novgorod, 173003


Copyright (c) 2018 Pleiades Publishing, Ltd.

This website uses cookies

You consent to our cookies if you continue to use our website.

About Cookies