Consider the optimization problem
where f {\displaystyle f} is a convex and smooth function.
Smoothness: By smoothness we mean the following: we assume the gradient of f {\displaystyle f} is coordinate-wise Lipschitz continuous with constants L 1 , L 2 , … , L n {\displaystyle L_{1},L_{2},\dots ,L_{n}} . That is, we assume that
for all x ∈ R n {\displaystyle x\in R^{n}} and h ∈ R {\displaystyle h\in R} , where ∇ i {\displaystyle \nabla _{i}} denotes the partial derivative with respect to variable x ( i ) {\displaystyle x^{(i)}} .
Nesterov, and Richtarik and Takac showed that the following algorithm converges to the optimal point:
Since the iterates of this algorithm are random vectors, a complexity result would give a bound on the number of iterations needed for the method to output an approximate solution with high probability. It was shown in 2 that if k ≥ 2 n R L ( x 0 ) ϵ log ( f ( x 0 ) − f ∗ ϵ ρ ) {\displaystyle k\geq {\frac {2nR_{L}(x_{0})}{\epsilon }}\log \left({\frac {f(x_{0})-f^{*}}{\epsilon \rho }}\right)} , where R L ( x ) = max y max x ∗ ∈ X ∗ { ‖ y − x ∗ ‖ L : f ( y ) ≤ f ( x ) } {\displaystyle R_{L}(x)=\max _{y}\max _{x^{*}\in X^{*}}\{\|y-x^{*}\|_{L}:f(y)\leq f(x)\}} , f ∗ {\displaystyle f^{*}} is an optimal solution ( f ∗ = min x ∈ R n { f ( x ) } {\displaystyle f^{*}=\min _{x\in R^{n}}\{f(x)\}} ), ρ ∈ ( 0 , 1 ) {\displaystyle \rho \in (0,1)} is a confidence level and ϵ > 0 {\displaystyle \epsilon >0} is target accuracy, then Prob ( f ( x k ) − f ∗ > ϵ ) ≤ ρ {\displaystyle {\text{Prob}}(f(x_{k})-f^{*}>\epsilon )\leq \rho } .
The following Figure shows how x k {\displaystyle x_{k}} develops during iterations, in principle. The problem is
One can naturally extend this algorithm not only just to coordinates, but to blocks of coordinates. Assume that we have space R 5 {\displaystyle R^{5}} . This space has 5 coordinate directions, concretely e 1 = ( 1 , 0 , 0 , 0 , 0 ) T , e 2 = ( 0 , 1 , 0 , 0 , 0 ) T , e 3 = ( 0 , 0 , 1 , 0 , 0 ) T , e 4 = ( 0 , 0 , 0 , 1 , 0 ) T , e 5 = ( 0 , 0 , 0 , 0 , 1 ) T {\displaystyle e_{1}=(1,0,0,0,0)^{T},e_{2}=(0,1,0,0,0)^{T},e_{3}=(0,0,1,0,0)^{T},e_{4}=(0,0,0,1,0)^{T},e_{5}=(0,0,0,0,1)^{T}} in which Random Coordinate Descent Method can move. However, one can group some coordinate directions into blocks and we can have instead of those 5 coordinate directions 3 block coordinate directions (see image).
Nesterov, Yurii (2010), "Efficiency of coordinate descent methods on huge-scale optimization problems", SIAM Journal on Optimization, 22 (2): 341–362, CiteSeerX 10.1.1.332.3336, doi:10.1137/100802001 /wiki/CiteSeerX_(identifier) ↩
Richtárik, Peter; Takáč, Martin (2011), "Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function", Mathematical Programming, Series A, 144 (1–2): 1–38, arXiv:1107.2848, doi:10.1007/s10107-012-0614-z /wiki/ArXiv_(identifier) ↩