Iteratively reweighted least squares

<h2 id="examples">Examples</h2>
<h3>L1 minimization for sparse recovery</h3>
IRLS can be used for <a href="/facts/L1_norm/gbad0Rv1">ℓ1</a> minimization and smoothed <a href="/facts/Lp_quasi-norm/gbad0Rv1">ℓp</a> minimization, p < 1, in <a href="/facts/Compressed_sensing/aR5DQm3n">compressed sensing</a> problems. It has been proved that the algorithm has a linear rate of convergence for ℓ1 norm and superlinear for ℓt with t < 1, under the <a href="/facts/Restricted_isometry_property/dgRNpmTf">restricted isometry property</a>, which is generally a sufficient condition for sparse solutions.<a class="footnote-ref" id="fnref:2" href="#fn:2">2</a><a class="footnote-ref" id="fnref:3" href="#fn:3">3</a>

<h3>Lp norm linear regression</h3>
To find the parameters β = (β1, …,βk)T which minimize the <a href="/facts/Lp_space/gbad0Rv1">Lp norm</a> for the <a href="/facts/Linear_regression/5n998IhK">linear regression</a> problem,

 
 
 
 
 
 
 a
 r
 g
 
 m
 i
 n
 
 β
 
 
 
 
 ‖
 
 
 
 y
 
 −
 X
 
 β
 
 
 ‖
 
 p
 
 
 =
 
 
 
 a
 r
 g
 
 m
 i
 n
 
 β
 
 
 
 ∑
 
 i
 =
 1
 
 
 n
 
 
 
 
 |
 
 
 y
 
 i
 
 
 −
 
 X
 
 i
 
 
 
 β
 
 
 |
 
 
 p
 
 
 ,
 
 
 {\displaystyle {\underset {\boldsymbol {\beta }}{\operatorname {arg\,min} }}{\big \|}\mathbf {y} -X{\boldsymbol {\beta }}\|_{p}={\underset {\boldsymbol {\beta }}{\operatorname {arg\,min} }}\sum _{i=1}^{n}\left|y_{i}-X_{i}{\boldsymbol {\beta }}\right|^{p},}

the IRLS algorithm at step t + 1 involves solving the <a href="/facts/Linear_least_squares_(mathematics)/dKctFUCV">weighted linear least squares</a> problem:<a class="footnote-ref" id="fnref:4" href="#fn:4">4</a>

 
 
 
 
 
 β
 
 
 (
 t
 +
 1
 )
 
 
 =
 
 
 
 a
 r
 g
 
 m
 i
 n
 
 β
 
 
 
 ∑
 
 i
 =
 1
 
 
 n
 
 
 
 w
 
 i
 
 
 (
 t
 )
 
 
 
 
 |
 
 
 y
 
 i
 
 
 −
 
 X
 
 i
 
 
 
 β
 
 
 |
 
 
 2
 
 
 =
 (
 
 X
 
 
 T
 
 
 
 
 W
 
 (
 t
 )
 
 
 X
 
 )
 
 −
 1
 
 
 
 X
 
 
 T
 
 
 
 
 W
 
 (
 t
 )
 
 
 
 y
 
 ,
 
 
 {\displaystyle {\boldsymbol {\beta }}^{(t+1)}={\underset {\boldsymbol {\beta }}{\operatorname {arg\,min} }}\sum _{i=1}^{n}w_{i}^{(t)}\left|y_{i}-X_{i}{\boldsymbol {\beta }}\right|^{2}=(X^{\rm {T}}W^{(t)}X)^{-1}X^{\rm {T}}W^{(t)}\mathbf {y} ,}

where W(t) is the <a href="/facts/Diagonal_matrix/tjIAHrE6">diagonal matrix</a> of weights, usually with all elements set initially to:

 
 
 
 
 w
 
 i
 
 
 (
 0
 )
 
 
 =
 1
 
 
 {\displaystyle w_{i}^{(0)}=1}

and updated after each iteration to:

 
 
 
 
 w
 
 i
 
 
 (
 t
 )
 
 
 =
 
 
 |
 
 
 
 y
 
 i
 
 
 −
 
 X
 
 i
 
 
 
 
 β
 
 
 (
 t
 )
 
 
 
 
 
 |
 
 
 
 p
 −
 2
 
 
 .
 
 
 {\displaystyle w_{i}^{(t)}={\big |}y_{i}-X_{i}{\boldsymbol {\beta }}^{(t)}{\big |}^{p-2}.}

In the case p = 1, this corresponds to <a href="/facts/Least_absolute_deviation/fcqbHnEt">least absolute deviation</a> regression (in this case, the problem would be better approached by use of <a href="/facts/Linear_programming/GduXFQxT">linear programming</a> methods,<a class="footnote-ref" id="fnref:5" href="#fn:5">5</a> so the result would be exact) and the formula is:

 
 
 
 
 w
 
 i
 
 
 (
 t
 )
 
 
 =
 
 
 1
 
 
 
 |
 
 
 
 y
 
 i
 
 
 −
 
 X
 
 i
 
 
 
 
 β
 
 
 (
 t
 )
 
 
 
 
 |
 
 
 
 
 
 .
 
 
 {\displaystyle w_{i}^{(t)}={\frac {1}{{\big |}y_{i}-X_{i}{\boldsymbol {\beta }}^{(t)}{\big |}}}.}

To avoid dividing by zero, <a href="/facts/Regularization_(mathematics)/K601A3y2">regularization</a> must be done, so in practice the formula is:

 
 
 
 
 w
 
 i
 
 
 (
 t
 )
 
 
 =
 
 
 1
 
 max
 
 {
 
 δ
 ,
 
 |
 
 
 y
 
 i
 
 
 −
 
 X
 
 i
 
 
 
 
 β
 
 
 (
 t
 )
 
 
 
 |
 
 
 }
 
 
 
 
 .
 
 
 {\displaystyle w_{i}^{(t)}={\frac {1}{\max \left\{\delta ,\left|y_{i}-X_{i}{\boldsymbol {\beta }}^{(t)}\right|\right\}}}.}

where 
 
 
 
 δ
 
 
 {\displaystyle \delta }
 
 is some small value, like 0.0001.<a class="footnote-ref" id="fnref:6" href="#fn:6">6</a> Note the use of 
 
 
 
 δ
 
 
 {\displaystyle \delta }
 
 in the weighting function is equivalent to the <a href="/facts/Huber_loss/NFRCnH1v">Huber loss</a> function in robust estimation. <a class="footnote-ref" id="fnref:7" href="#fn:7">7</a>

<h2 id="see-also">See also</h2>
<ul><li><a href="/facts/Feasible_generalized_least_squares/NI2GL3RK">Feasible generalized least squares</a></li>
<li><a href="/facts/Weiszfeld%27s_algorithm/aIQYJaHX">Weiszfeld's algorithm</a> (for approximating the <a href="/facts/Geometric_median/aIQYJaHX">geometric median</a>), which can be viewed as a special case of IRLS</li></ul>
<h2 id="notes">Notes</h2>

<ul><li><a href="https://web.archive.org/web/20070810222123/http://www.mai.liu.se/~akbjo/LSPbook.html">Numerical Methods for Least Squares Problems by Åke Björck</a> (Chapter 4: Generalized Least Squares Problems.)</li>
<li><a href="http://graphics.stanford.edu/~jplewis/lscourse/SLIDES.pdf">Practical Least-Squares for Computer Graphics. SIGGRAPH Course 11</a></li></ul>
<h2 id="external-links">External links</h2>
<ul><li><a href="https://stemblab.github.io/irls/">Solve under-determined linear systems iteratively</a></li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1">C. Sidney Burrus, Iterative Reweighted Least Squares <a href="https://web.archive.org/web/20221017041048/https://cnx.org/exports/92b90377-2b34-49e4-b26f-7fe572db78a1@12.pdf/iterative-reweighted-least-squares-12.pdf" target="_blank">https://web.archive.org/web/20221017041048/https://cnx.org/exports/92b90377-2b34-49e4-b26f-7fe572db78a1@12.pdf/iterative-reweighted-least-squares-12.pdf</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></li>
<li id="fn:2">Chartrand, R.; Yin, W. (March 31 – April 4, 2008). "Iteratively reweighted algorithms for compressive sensing". IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2008. pp. 3869–3872. doi:10.1109/ICASSP.2008.4518498. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></li>
<li id="fn:3">Daubechies, I.; Devore, R.; Fornasier, M.; Güntürk, C. S. N. (2010). "Iteratively reweighted least squares minimization for sparse recovery". Communications on Pure and Applied Mathematics. 63: 1–38. arXiv:0807.0575. doi:10.1002/cpa.20303. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></li>
<li id="fn:4">Gentle, James (2007). "6.8.1 Solutions that Minimize Other Norms of the Residuals". Matrix algebra. Springer Texts in Statistics. New York: Springer. doi:10.1007/978-0-387-70873-7. ISBN 978-0-387-70872-0. <a href="978-0-387-70872-0" target="_blank">978-0-387-70872-0</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></li>
<li id="fn:5">William A. Pfeil,
Statistical Teaching Aids, Bachelor of Science thesis, Worcester Polytechnic Institute, 2006 <a href="http://www.wpi.edu/Pubs/E-project/Available/E-project-050506-091720/unrestricted/IQP_Final_Report.pdf" target="_blank">http://www.wpi.edu/Pubs/E-project/Available/E-project-050506-091720/unrestricted/IQP_Final_Report.pdf</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></li>
<li id="fn:6">William A. Pfeil,
Statistical Teaching Aids, Bachelor of Science thesis, Worcester Polytechnic Institute, 2006 <a href="http://www.wpi.edu/Pubs/E-project/Available/E-project-050506-091720/unrestricted/IQP_Final_Report.pdf" target="_blank">http://www.wpi.edu/Pubs/E-project/Available/E-project-050506-091720/unrestricted/IQP_Final_Report.pdf</a> <a href="#fnref:6" class="footnote-back-ref">↩</a></li>
<li id="fn:7">Fox, J.; Weisberg, S. (2013),Robust Regression, Course Notes, University of Minnesota <a href="http://users.stat.umn.edu/~sandy/courses/8053/handouts/robust.pdf" target="_blank">http://users.stat.umn.edu/~sandy/courses/8053/handouts/robust.pdf</a> <a href="#fnref:7" class="footnote-back-ref">↩</a></li>
</ol>

Iteratively reweighted least squares open-in-new

Iteratively reweighted least squares