A Unified Analysis of Variational Inequality Methods: Variance Reduction, Sampling, Quantization, and Coordinate Descent

A. N. Beznosikov; Безносиков А. Н.; A. V. Gasnikov; Гасников А. В.; K. E. Zainullina; Зайнуллина К. Э.; A. Yu. Maslovskii; Масловский А. Ю.; D. A.  Pasechnyuk; Пасечнюк Д. А.

doi:10.31857/S0044466923020059

A Unified Analysis of Variational Inequality Methods: Variance Reduction, Sampling, Quantization, and Coordinate Descent

Authors: Beznosikov A.N.¹, Gasnikov A.V.¹^,2^,3, Zainullina K.E.¹, Maslovskii A.Y.¹, Pasechnyuk D.A.¹^,2
Affiliations:
1. Moscow Institute of Physics and Technology (National Research University)
2. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences
3. Caucasus Mathematical Center, Adyghe State University
Issue: Vol 63, No 2 (2023)
Pages: 189-217
Section: ОБЩИЕ ЧИСЛЕННЫЕ МЕТОДЫ
URL: https://journals.rcsi.science/0044-4669/article/view/136113
DOI: https://doi.org/10.31857/S0044466923020059
EDN: https://elibrary.ru/BMKMVF
ID: 136113

Cite item

Full Text

Open Access
Restricted Access

Access granted
Restricted Access

Subscription Access

Abstract
About the authors
References
Supplementary files
Statistics

Abstract

We present a unified analysis of methods for such a wide class of problems as variational inequalities, which include minimization and saddle point problems as special cases. The analysis is developed relying on the extragradient method, which is a classic technique for solving variational inequalities. We consider the monotone and strongly monotone cases, which correspond to convex-concave and strongly-convex-strongly-concave saddle point problems. The theoretical analysis is based on parametric assumptions about extragradient iterations. Therefore, it can serve as a strong basis for combining existing methods of various types and for creating new algorithms. Specifically, to show this, we develop new robust methods, including methods with quantization, coordinate methods, and distributed randomized local methods. Most of these approaches have never been considered in the generality of variational inequalities and have previously been used only for minimization problems. The robustness of the new methods is confirmed by numerical experiments with GANs.

Keywords

extragradient method, stochastic variational inequalities, quantization, variance reduction

References

Facchinei F., Pang J. Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer New York, 2007. (Springer Series in Operations Research and Financial Engineering). URL: https://books.google.ru/books?id=lX%5C_7Rce3%5C_Q0C.
Nesterov Y. Smooth minimization of non-smooth functions // Math. Program. 2005. T. 103. № 1. C. 127–152.
Nemirovski A. Prox-method with rate of convergence O (1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems // SIAM J. Optimizat. 2004. V. 15. № 1. P. 229–251.
Chambolle A., Pock T. A first-order primal-dual algorithm for convex problems with applications to imaging // J. Math. Imag. and Vis. 2011. V. 40. № 1. P. 120–145.
Esser E., Zhang X., Chan T.F. A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science // SIAM J. Imag. Sci. 2010. V. 3. № 4. P. 1015–1046.
Generative Adversarial Networks / I.J. Goodfellow [и дp.]. 2014. arXiv: 1406.2661[stat.ML].
Hanzely F., Richtárik P. Federated learning of a mixture of global and local models // arXiv preprint a-rXiv:2002.05516. 2020.
Lower bounds and optimal algorithms for personalized federated learning / F. Hanzely [и дp.] // arXiv preprint arXiv:2010.02372. 2020.
Korpelevich G.M. The extragradient method for finding saddle points and other problems // Ekonomika i Matematicheskie Metody. 1976. V. 12. № 4. P. 747–756.
Juditsky A., Nemirovski A., Tauvel C. Solving variational inequalities with stochastic mirror-prox algorithm // Stochastic System. 2011. V. 1. № 1. P. 17–58.
Alacaoglu A., Malitsky Y. Stochastic Variance Reduction for Variational Inequality Methods // arXiv preprint arXiv:2102.08352. 2021.
Robbins H., Monro S. A Stochastic Approximation Method // Ann. Math. Statistic. 1951. V. 22. № 3. P. 400–407. URL: https://doi.org/10.1214/aoms/1177729586
Johnson R., Zhang T. Accelerating stochastic gradient descent using predictive variance reduction // Adv. Neural Informat. Processing System. 2013. V. 26. P. 315–323.
QSGD: Communication-efficient SGD via gradient quantization and encoding / D. Alistarh [и дp.] // Adv. Neural Informat. Processing System. 2017. P. 1709–1720.
Hanzely F., Mishchenko K., Richtarik P. SEGA: Variance reduction via gradient sketching // arXiv preprint a-rXiv:1809.03054. 2018.
Gorbunov E., Hanzely F., Richtarik P. A unified theory of SGD: Variance reduction, sampling, quantization and coordinate descent // Internat. Conf. Artific. Intelligence and Statistic. PMLR. 2020. P. 680–690.
On the convergence of single-call stochastic extra-gradient methods / Y.-G. Hsieh [и дp.]. 2019. arXiv: 1908.08465 [math.OC].
Revisiting stochastic extragradient / K. Mishchenko [и дp.] // Internat. Conf. Artific. Intelligence and Statistic. PMLR. 2020. P. 4573–4582.
Tseng P. A Modified Forward-Backward Splitting Method for Maximal Monotone Mappings // SIAM J. Control and Optimizat. 2000. V. 38. № 2. P. 431–446. https://doi.org/10.1137/S0363012998338806
Nesterov Y. Dual extrapolation and its applications to solving variational inequalities and related problems // Math. Program. 2007. V. 109. № 2. P. 319–344.
Palaniappan B., Bach F. Stochastic Variance Reduction Methods for Saddle- Point Problems // Adv. Neural Informat. Processing System. Т. 29 / Под ред. D. Lee [и др.]. Curran Associates, Inc., 2016. URL: https://proceedings.neurips.cc/paper/2016/file/1aa48fc4880bb0c9b8aPaper.pdf.
Reducing noise in gan training with variance reduced extragradient / T. Chavdarova [и дp.] // arXiv preprint arXiv:1904.08598. 2019.
Sidford A., Tian K. Coordinate methods for accelerating regression and faster approximate maximum flow // 2018 IEEE 59th Ann. Symp. Foundat. of Comput. Sci. (FOCS). IEEE. 2018. P. 922– 933.
Coordinate methods for matrix games / Y. Carmon [и дp.] // arXiv preprint arXiv:2009.08447. 2020.
Zeroth-Order Algorithms for Smooth Saddle-Point Problems / A. Sadiev [и дp.] // arXiv preprint ar-Xiv:2009.09908. 2020.
Deng Y., Mahdavi M. Local Stochastic Gradient Descent Ascent: Convergence Analysis and Communication Efficiency // Internat. Conf. Artific. Intelligence and Statistic. PMLR. 2021. P. 1387–1395.
Beznosikov A., Samokhin V., Gasnikov A. Distributed Saddle-Point Problems: Lower Bounds, Optimal Algorithms and Federated GANs // arXiv preprint arXiv:2010.13112. 2021.
Stich S.U. Unified optimal analysis of the (stochastic) gradient method // arXiv preprint arXiv:1907.04232. 2019.
Wright S.J. Coordinate descent algorithms // Math. Program. 2015. V. 151. № 1. C. 3–34.
Nesterov Y. Efficiency of coordinate descent methods on huge-scale optimization problems // SIAM J. Optimizat. 2012. V. 22. № 2. P. 341–362.
On Biased Compression for Distributed Learning / A. Beznosikov [и дp.] arXiv preprint arXiv:2002.12410. 2020.
Barratt S., Sharma R. A note on the inception score // arXiv preprint arXiv:1801.01973. 2018.
Radford A., Metz L., Chintala S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. 2016. arXiv: 1511.06434 [cs.LG].
Mirza M., Osindero S. Conditional generative adversarial nets // arXiv preprint arXiv:1411.1784. 2014.