Modified Nonparametric Algorithm for Automatic Classification of Large-Volume Statistical Data and its Application
- Authors: Tuboltsev V.P.1, Lapko A.V.1,2, Lapko V.A.1,2
-
Affiliations:
- Reshetnev Siberian State University of Science and Technology
- Institute of Computational Modelling of the Siberian Branch of the Russian Academy of Sciences
- Issue: No 4 (2023)
- Pages: 49-57
- Section: Computational Intelligence
- URL: https://journals.rcsi.science/2071-8594/article/view/269743
- DOI: https://doi.org/10.14357/20718594230405
- EDN: https://elibrary.ru/QHNFRU
- ID: 269743
Cite item
Full Text
Abstract
A modified nonparametric algorithm for automatic classification of large-volume statistical data is proposed. Its application makes it possible to detect classes corresponding to unimodal fragments of the probability density of a multidimensional random variable. The compression of the initial information is carried out on the basis of the decomposition of the multidimensional space of features into a data array composed of the centers of the sampling intervals and the corresponding frequencies of belonging to the values of the random variable. Based on these data, a regression estimate of the probability density is synthesized. The information obtained is the basis for the algorithmization of the automatic classification procedure. A class is a compact group of observations of a random variable corresponding to a single-modal fragment of probability density. The computational efficiency of the modified nonparametric algorithm for automatic classification of large-volume statistical data is provided by the compression procedure of the source data, improvement and algorithmization of the traditional nonparametric method of class detection. The computational efficiency of the modified non-parametric algorithm for automatic classification of large volume statistical data is provided by the initial data compression procedure, improvement and algorithmization of the traditional nonparametric method for detecting compact groups of observations of a random variable. The effectiveness of the developed method of automatic classification is confirmed by the results of its application in the analysis of remote sensing data of forests damaged by the Siberian silkworm.
About the authors
Vitaly P. Tuboltsev
Reshetnev Siberian State University of Science and Technology
Author for correspondence.
Email: vitalya.98@mail.ru
Graduate Student
Russian Federation, KrasnoyarskAlexander V. Lapko
Reshetnev Siberian State University of Science and Technology; Institute of Computational Modelling of the Siberian Branch of the Russian Academy of Sciences
Email: lapko@icm.krasn.ru
Doctor of Technical Sciences, Professor, Chief Researcher
Russian Federation, Krasnoyarsk; KrasnoyarskVasily A. Lapko
Reshetnev Siberian State University of Science and Technology; Institute of Computational Modelling of the Siberian Branch of the Russian Academy of Sciences
Email: valapko@yandex.ru
Doctor of Technical Sciences, Professor, Leading Researcher
Russian Federation, Krasnoyarsk; KrasnoyarskReferences
- Dorofeyuk А.А. Algoritmy avtomaticheskoy klassifikatsii (obzor) [Algorithms of automatic classification (review)] // Avtomatika i telemekhanika [Automation and Remote Control]. 1971. No 12. P. 78-113.
- Dorofeyuk А.А. Metodologiya ekspertno-klassifikatsionnogo analiza v zadachakh upravleniya i obrabotki slozhnoorganizovannykh dannykh (istoriya i perspektivy razvitiya) [Methodology of expert classification analysis in the management and processing of complex data (history and prospects of development)] // Problemy upravleniya [Control sciences]. 2009. No 3.1. P. 19-28.
- TSypkin Ya.Z. Osnovy teorii obuchayushchikhsya sistem [Fundamentals of the theory of learning systems]. Moscow: Nauka, 1970.
- Vasil'ev V.I., EHsh S.N. Osobennosti algoritmov samoobucheniya i klasterizatsii [Features of self-learning algorithms and clustering] // Upravlyayushchiye sistemy i mashiny [Control systems and machines]. 2011. No 3. P. 3-9.
- Parzen E. On estimation of a probability density function and mode // Annals of Mathematical Statistics. 1962. V. 33. No 3. P. 1065-1076.
- Epanechnikov V.A. Neparametricheskaya ocenka mnogomernoj plotnosti veroyatnosti [Non-parametric estimation of a multivariate probability density]. // Teoriya veroyatnosti i ee primeneniya [Theory of Probability & Its Applications]. 1969. V. 14. No 1. P. 156-161.
- Lapko A.V., Lapko V.A. Neparametricheskiy algoritm avtomaticheskoy klassifikatsii v usloviyakh statisticheskikh dannykh bol'shogo ob"yema [Nonparametric algorithm of automatic classification under conditions of large-scale statistical data] // Informatika i sistemy upravleniya [Informatics and control systems]. 2018. V. 57. No 3. P. 59-70.
- Zenkov I.V., Lapko A.V., Lapko V.A., Im S.T., Tuboltsev V.P., Avdeenok V.L. A nonparametric algorithm for automatic classification of large multivariate statistical data sets and its application // Computer Optics. 2021. V. 45. No 2. P. 253–260.
- Vasilyeva I.K., Popov A.V. Metod avtomaticheskoj klasterizacii dannyh distancionnogo zondirovaniya [Method for automatic clustering of remote sensing data]// Aviacionnokosmicheskaya tekhnika i tekhnologiya [Aerospace Technic and Technology]. 2019. V. 155. No 3. P. 64-75.
- Lapko A.V., Lapko V.A. Regressionnaya ocenka mnogomernoj plotnosti veroyatnosti i eyo svojstva [Regression estimate of the multidimensional probability density and its properties] // Avtometriya [Optoelectronics, Instrumentation and Data Processing]. 2014. V. 50. No 2. P.148–153.
- Lapko A.V., Lapko V.A. Yadernyye otsenki plotnosti veroyatnosti i ikh primeneniye [Kernel probability density estimates and their application]. Krasnoyarsk: Reshetnev University. 2021.
- Rudemo M. Empirical choice of histogram and kernel density estimators // Scandinavian Journal of Statistics. 1982. V. 9. No 2. P. 65-78.
- Hall P. Large-sample optimality of least squares cross-validation in density estimation // Annals of Statistics. 1983. V. 11. No 4. P. 1156-1174.
- Bowman A.W. An alternative method of cross-validation for the smoothing of density estimates // Biometrika. 1984. V. 71. No 2. P. 353-360.
- Heinhold I., Gaede K.W. Ingeniur statistic. München: Wien, Springler Verlag Publs, 1964.
- Chavez P.S. Image-based atmospheric correction revisited and improved // Photogrammetric Engineering and Remote Sensing. 1996. V. 62. No 9. P. 1025-1036.
- Lapko A.V., Lapko V.A., Im S.T., Tuboltsev V.P., Avdeenok V.L. Programma avtomaticheskoy klassifikatsii dannykh distantsionnogo zondirovaniya Zemli na osnove neparametricheskikh algoritmov prinyatiya resheniy (NAC v. 2.0) [The program for automatic classification of Earth remote sensing data based on nonparametric decision-making algorithms (NAC v. 2.0)] // Certificate of state registration of the computer program RF No. 2022619023, 2022.
- Lemenkova P. ISO Cluster classifier by ArcGIS for unsupervised classification of the Landsat TM image of Reykjavík. University thought // Bulletin of Natural Sciences Research. 2021. V. 11. No 1. P. 29-37.
Supplementary files
