Proximal gradient methods are a generalized form of projection used to solve non-differentiable convex optimization problems.
Many interesting problems can be formulated as convex optimization problems of the form
min x ∈ R N ∑ i = 1 n f i ( x ) {\displaystyle \operatorname {min} \limits _{x\in \mathbb {R} ^{N}}\sum _{i=1}^{n}f_{i}(x)}
where f i : R N → R , i = 1 , … , n {\displaystyle f_{i}:\mathbb {R} ^{N}\rightarrow \mathbb {R} ,\ i=1,\dots ,n} are possibly non-differentiable convex functions. The lack of differentiability rules out conventional smooth optimization techniques like the steepest descent method and the conjugate gradient method, but proximal gradient methods can be used instead.
Proximal gradient methods starts by a splitting step, in which the functions f 1 , . . . , f n {\displaystyle f_{1},...,f_{n}} are used individually so as to yield an easily implementable algorithm. They are called proximal because each non-differentiable function among f 1 , . . . , f n {\displaystyle f_{1},...,f_{n}} is involved via its proximity operator. Iterative shrinkage thresholding algorithm, projected Landweber, projected gradient, alternating projections, alternating-direction method of multipliers, alternating split Bregman are special instances of proximal algorithms.
For the theory of proximal gradient methods from the perspective of and with applications to statistical learning theory, see proximal gradient methods for learning.