GRADIENT-FREE ALGORITHMS FOR SOLVING STOCHASTIC SADDLE OPTIMIZATION PROBLEMS WITH THE POLYAK–LOYASIEVICH CONDITION

S. I. Sadykov; Садыков С. И.; A. V. Lobanov; Лобанов А. В.; A. M. Raigorodskii; Райгородский А. М.

doi:10.31857/S0132347423060079

GRADIENT-FREE ALGORITHMS FOR SOLVING STOCHASTIC SADDLE OPTIMIZATION PROBLEMS WITH THE POLYAK–LOYASIEVICH CONDITION

Authors: Sadykov S.I.¹, Lobanov A.V.¹^,2, Raigorodskii A.M.¹^,3
Affiliations:
1. Moscow Institute of Physics and Technology
2. Trusted Artificial Intelligence Research Center of ISP RAS
3. Caucasian Mathematical Center of the Adyghe State University
Issue: No 6 (2023)
Pages: 60-74
Section: DATA ANALYSIS
URL: https://journals.rcsi.science/0132-3474/article/view/148120
DOI: https://doi.org/10.31857/S0132347423060079
EDN: https://elibrary.ru/DSVULZ
ID: 148120

Cite item

Full Text

Abstract
About the authors
References
Supplementary files
Statistics

Abstract

This paper focuses on solving a subclass of a stochastic nonconvex-concave black box optimization problem with a saddle point that satisfies the Polyak–Loyasievich condition. To solve such a problem, we provide the first, to our knowledge, gradient-free algorithm, the approach to which is based on applying a gradient approximation (kernel approximation) to the oracle-shifted stochastic gradient descent algorithm. We present theoretical estimates that guarantee a global linear rate of convergence to the desired accuracy. We check the theoretical results on a model example, comparing with an algorithm using Gaussian approximation.

References

Heaton J. Ian Goodfellow, Yoshua Bengio, and Aaron Courville: Deep learning: The MIT Press, 2016, 800 pp, ISBN: 0262035618 // Genetic programming and evolvable machines. 2018. V. 19. № 1–2. P. 305–307.
Dai B. et al. SBEED: Convergent reinforcement learning with nonlinear function approximation // International Conference on Machine Learning. PMLR, 2018. P. 1125–1134.
Namkoong H., Duchi J.C. Variance-based regularization with convex objectives // Advances in neural information processing systems. 2017. V. 30.
Xu L. et al. Maximum margin clustering // Advances in neural information processing systems. 2004. V. 17.
Sinha A. et al. Certifying some distributional robustness with principled adversarial training // arXiv preprint arXiv:1710.10571. 2017.
Audet C., Hare W. Derivative-free and blackbox optimization. 2017.
Rosenbrock H.H. An automatic method for finding the greatest or least value of a function // The computer journal. 1960. V. 3. № 3. P. 175–184.
Gasnikov A. et al. Randomized gradient-free methods in convex optimization // arXiv preprint arXiv:2211.13566. 2022.
Lobanov A. et al. Gradient-Free Federated Learning Methods with and -Randomization for Non-Smooth Convex Stochastic Optimization Problems // arXiv preprint arXiv:2211.10783. 2022.
Gasnikov A. et al. The power of first-order smooth optimization for black-box non-smooth problems // International Conference on Machine Learning. PMLR, 2022. P. 7241–7265.
Bach F., Perchet V. Highly-smooth zero-th order online optimization // Conference on Learning Theory. PMLR, 2016. P. 257–283.
Beznosikov A., Novitskii V., Gasnikov A. One-point gradient-free methods for smooth and non-smooth saddle-point problems // Mathematical Optimization Theory and Operations Research: 20th International Conference, MOTOR 2021, Irkutsk, Russia, July 5–10, 2021, Proceedings 20. Springer International Publishing, 2021. P. 144–158.
Akhavan A., Pontil M., Tsybakov A. Exploiting higher order smoothness in derivative-free optimization and continuous bandits // Advances in Neural Information Processing Systems. 2020. V. 33. P. 9017–9027.
Polyak B.T. Gradient methods for the minimisation of functionals // USSR Computational Mathematics and Mathematical Physics. 1963. V. 3. № 4. P. 864–878.
Lojasiewicz S. Une propriété topologique des sous-ensembles analytiques réels // Les équations aux dérivées partielles. 1963. V. 117. P. 87–89.
Ajalloeian A., Stich S.U. On the convergence of SGD with biased gradients // arXiv preprint arXiv:2008.00051. 2020.
Lobanov A., Gasnikov A., Stonyakin F. Highly Smoothness Zero-Order Methods for Solving Optimization Problems under PL Condition // arXiv preprint arXiv:2305.15828. 2023.
Yue P., Fang C., Lin Z. On the Lower Bound of Minimizing Polyak-Łojasiewicz functions // arXiv preprint arXiv:2212.13551. 2022.
Yang J., Kiyavash N., He N. Global convergence and variance-reduced optimization for a class of nonconvex-nonconcave minimax problems // arXiv preprint arXiv:2002.09621. 2020.
Akhavan A. et al. Gradient-free optimization of highly smooth functions: improved analysis and a new algorithm // arXiv preprint arXiv:2306.02159. 2023.
Nouiehed M. et al. Solving a class of non-convex min-max games using iterative first order methods // Advances in Neural Information Processing Systems. 2019. V. 32.
Osserman R. The isoperimetric inequality // Bulletin of the American Mathematical Society. 1978. V. 84. № 6. P. 1182–1238.
Beckner W. A generalized Poincaré inequality for Gaussian measures // Proceedings of the American Mathematical Society. 1989. V. 105. № 2. P. 397–400.
Karimi H., Nutini J., Schmidt M. Linear convergence of gradient and proximal-gradient methods under the polyak-E‚ojasiewicz condition // Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19–23, 2016, Proceedings, Part I 16. Springer International Publishing, 2016. P. 795–811.
Zorich V.A., Paniagua O. Mathematical analysis II. Berlin : Springer, 2016. V. 220.