GRADIENT-FREE ALGORITHMS FOR SOLVING STOCHASTIC SADDLE OPTIMIZATION PROBLEMS WITH THE POLYAK–LOYASIEVICH CONDITION
- Authors: Sadykov S.I.1, Lobanov A.V.1,2, Raigorodskii A.M.1,3
-
Affiliations:
- Moscow Institute of Physics and Technology
- Trusted Artificial Intelligence Research Center of ISP RAS
- Caucasian Mathematical Center of the Adyghe State University
- Issue: No 6 (2023)
- Pages: 60-74
- Section: DATA ANALYSIS
- URL: https://journals.rcsi.science/0132-3474/article/view/148120
- DOI: https://doi.org/10.31857/S0132347423060079
- EDN: https://elibrary.ru/DSVULZ
- ID: 148120
Cite item
Abstract
This paper focuses on solving a subclass of a stochastic nonconvex-concave black box optimization problem with a saddle point that satisfies the Polyak–Loyasievich condition. To solve such a problem, we provide the first, to our knowledge, gradient-free algorithm, the approach to which is based on applying a gradient approximation (kernel approximation) to the oracle-shifted stochastic gradient descent algorithm. We present theoretical estimates that guarantee a global linear rate of convergence to the desired accuracy. We check the theoretical results on a model example, comparing with an algorithm using Gaussian approximation.
About the authors
S. I. Sadykov
Moscow Institute of Physics and Technology
Author for correspondence.
Email: sadykov.si@phystech.edu
Russia, 141701, Moscow region, Dolgoprudny, Institutskiy per., 9
A. V. Lobanov
Moscow Institute of Physics and Technology; Trusted Artificial Intelligence Research Center of ISP RAS
Author for correspondence.
Email: lobbsasha@mail.ru
Russia, 141701, Moscow region, Dolgoprudny, Institutskiy per., 9; Russia, 109004, Moscow, Alexander Solzhenitsyn st., 25
A. M. Raigorodskii
Moscow Institute of Physics and Technology; Caucasian Mathematical Center of the Adyghe State University
Author for correspondence.
Email: mraigor@yandex.ru
Russia, 141701, Moscow region, Dolgoprudny, Institutskiy per., 9; Republic of Adygea, 385016, Maykop, st. Pervomaiskaya, 208
References
- Heaton J. Ian Goodfellow, Yoshua Bengio, and Aaron Courville: Deep learning: The MIT Press, 2016, 800 pp, ISBN: 0262035618 // Genetic programming and evolvable machines. 2018. V. 19. № 1–2. P. 305–307.
- Dai B. et al. SBEED: Convergent reinforcement learning with nonlinear function approximation // International Conference on Machine Learning. PMLR, 2018. P. 1125–1134.
- Namkoong H., Duchi J.C. Variance-based regularization with convex objectives // Advances in neural information processing systems. 2017. V. 30.
- Xu L. et al. Maximum margin clustering // Advances in neural information processing systems. 2004. V. 17.
- Sinha A. et al. Certifying some distributional robustness with principled adversarial training // arXiv preprint arXiv:1710.10571. 2017.
- Audet C., Hare W. Derivative-free and blackbox optimization. 2017.
- Rosenbrock H.H. An automatic method for finding the greatest or least value of a function // The computer journal. 1960. V. 3. № 3. P. 175–184.
- Gasnikov A. et al. Randomized gradient-free methods in convex optimization // arXiv preprint arXiv:2211.13566. 2022.
- Lobanov A. et al. Gradient-Free Federated Learning Methods with and -Randomization for Non-Smooth Convex Stochastic Optimization Problems // arXiv preprint arXiv:2211.10783. 2022.
- Gasnikov A. et al. The power of first-order smooth optimization for black-box non-smooth problems // International Conference on Machine Learning. PMLR, 2022. P. 7241–7265.
- Bach F., Perchet V. Highly-smooth zero-th order online optimization // Conference on Learning Theory. PMLR, 2016. P. 257–283.
- Beznosikov A., Novitskii V., Gasnikov A. One-point gradient-free methods for smooth and non-smooth saddle-point problems // Mathematical Optimization Theory and Operations Research: 20th International Conference, MOTOR 2021, Irkutsk, Russia, July 5–10, 2021, Proceedings 20. Springer International Publishing, 2021. P. 144–158.
- Akhavan A., Pontil M., Tsybakov A. Exploiting higher order smoothness in derivative-free optimization and continuous bandits // Advances in Neural Information Processing Systems. 2020. V. 33. P. 9017–9027.
- Polyak B.T. Gradient methods for the minimisation of functionals // USSR Computational Mathematics and Mathematical Physics. 1963. V. 3. № 4. P. 864–878.
- Lojasiewicz S. Une propriété topologique des sous-ensembles analytiques réels // Les équations aux dérivées partielles. 1963. V. 117. P. 87–89.
- Ajalloeian A., Stich S.U. On the convergence of SGD with biased gradients // arXiv preprint arXiv:2008.00051. 2020.
- Lobanov A., Gasnikov A., Stonyakin F. Highly Smoothness Zero-Order Methods for Solving Optimization Problems under PL Condition // arXiv preprint arXiv:2305.15828. 2023.
- Yue P., Fang C., Lin Z. On the Lower Bound of Minimizing Polyak-Łojasiewicz functions // arXiv preprint arXiv:2212.13551. 2022.
- Yang J., Kiyavash N., He N. Global convergence and variance-reduced optimization for a class of nonconvex-nonconcave minimax problems // arXiv preprint arXiv:2002.09621. 2020.
- Akhavan A. et al. Gradient-free optimization of highly smooth functions: improved analysis and a new algorithm // arXiv preprint arXiv:2306.02159. 2023.
- Nouiehed M. et al. Solving a class of non-convex min-max games using iterative first order methods // Advances in Neural Information Processing Systems. 2019. V. 32.
- Osserman R. The isoperimetric inequality // Bulletin of the American Mathematical Society. 1978. V. 84. № 6. P. 1182–1238.
- Beckner W. A generalized Poincaré inequality for Gaussian measures // Proceedings of the American Mathematical Society. 1989. V. 105. № 2. P. 397–400.
- Karimi H., Nutini J., Schmidt M. Linear convergence of gradient and proximal-gradient methods under the polyak-E‚ojasiewicz condition // Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19–23, 2016, Proceedings, Part I 16. Springer International Publishing, 2016. P. 795–811.
- Zorich V.A., Paniagua O. Mathematical analysis II. Berlin : Springer, 2016. V. 220.