Convergence of a multilayer perceptron to histogram Bayesian regression

Cover Page

Cite item

Full Text

Open Access Open Access
Restricted Access Access granted
Restricted Access Subscription Access

Abstract

The problem of enhancing the interpretability and consistency of Baysesian classifier solutions in approximating the empirical data by means of a multilayer perceptron is under consideration. Histogram regression preserves transparency and statistical interpretation but is limited by memory requirements ($O(n)$) and weak scalability, while a multilayer perceptron provides a memory efficient representation ($O(1)$)and high computational efficiency in combination with limited interpretability. The focus is on a unary learning scheme, when the training sample consists of examples in the same target class and additional background points which are uniformly distributed over a compact subset of the feature space. This approach enables one to treat each class separately and implement the failure mechanism outside the data support, which enhances the model reliability. It is proposed to consider the perceptron output as a consistent analogue of the histogram class interval induced by the linearity cells of the perceptron. It is proved that under the natural assumptions of regularity and controlled growth of architecture the output function of a multilayer perseptron is consistent and equivalent to a histogram estimator. Theoretical consistency is rigorously ðroved in the case of a fixed first layer, while numerical experiments confirm the applicability of the results to models all of whose layers are trained. Thus histogram interpretation ensures the statistical verification of the consistency of perceptron approximation and addscredibility to classification solutions in the framework of a unary model.

About the authors

Nikita Aleksandrovich Eliseev

Ivannikov Institute for System Programming of the Russian Academy of Sciences

Email: neliseev@ispras.ru

Andrey Igorevich Perminov

Ivannikov Institute for System Programming of the Russian Academy of Sciences

Email: perminov@ispras.ru
ORCID iD: 0000-0001-8047-0114

Denis Yur'evich Turdakov

Ivannikov Institute for System Programming of the Russian Academy of Sciences; Research Center of the Trusted Artificial Intelligence ISP RAS

Email: turdakov@ispras.ru
ORCID iD: 0000-0001-8745-0984

References

  1. M. Csikos, N. H. Mustafa, A. Kupavskii, “Tight lower bounds on the VC-dimension of geometric set systems”, J. Mach. Learn. Res., 20 (2019), 81, 8 pp.
  2. G. Cybenko, “Approximation by superpositions of a sigmoidal function”, Math. Control Signals Systems, 2:4 (1989), 303–314
  3. Bing Gao, Qiyu Sun, Yang Wang, Zhiqiang Xu, “Phase retrieval from the magnitudes of affine linear measurements”, Adv. in Appl. Math., 93 (2018), 121–141
  4. R. Giryes, G. Sapiro, A. M. Bronstein, “Deep neural networks with random Gaussian weights: a universal classification strategy?”, IEEE Trans. Signal Process., 64:13 (2016), 3444–3457
  5. A. Goujon, A. Etemadi, M. Unser, “On the number of regions of piecewise linear neural networks”, J. Comput. Appl. Math., 441 (2024), 115667, 22 pp.
  6. Feng Guo, Liguo Jiao, Do Sang Kim, “On continuous selections of polynomial functions”, Optimization, 73:2 (2024), 295–328
  7. M. Imaizumi, K. Fukumizu, “Deep neural networks learn non-smooth functions effectively”, Proceedings of the 22nd international conference on artificial intelligence and statistics, Proc. Mach. Learn. Res. (PMLR), 89, 2019, 869–878
  8. A. Janosi, W. Steinbrunn, M. Pfisterer, R. Detrano, Heart disease [Dataset], UCI Machine Learning Repository, 1989
  9. A. Nobel, “Histogram regression estimation using data-dependent partitions”, Ann. Statist., 24:3 (1996), 1084–1105
  10. Y. Plan, R. Vershynin, “Dimension reduction by random hyperplane tessellations”, Discrete Comput. Geom., 51:2 (2014), 438–461
  11. B. Ramana, N. Venkateswarlu, ILPD (Indian liver patient dataset) [Dataset], UCI Machine Learning Repository, 2022
  12. S. Scholtes, “Piecewise affine functions”, Introduction to piecewise differentiable equations, SpringerBriefs Optim., Springer, New York, 2012, 13–63
  13. W. Wolberg, O. Mangasarian, N. Street, W. Street, Breast cancer Wisconsin (Diagnostic) [Dataset], UCI Machine Learning Repository, 1993

Supplementary files

Supplementary Files
Action
1. JATS XML

Copyright (c) 2025 Eliseev N.A., Perminov A.I., Turdakov D.Y.

Согласие на обработку персональных данных

 

Используя сайт https://journals.rcsi.science, я (далее – «Пользователь» или «Субъект персональных данных») даю согласие на обработку персональных данных на этом сайте (текст Согласия) и на обработку персональных данных с помощью сервиса «Яндекс.Метрика» (текст Согласия).