Learning curve (machine learning)

<h2 id="formal-definition">Formal definition</h2>
<p>When creating a function to approximate the distribution of some data, it is necessary to define a loss function 
  
    
      
        L
        (
        
          f
          
            θ
          
        
        (
        X
        )
        ,
        Y
        )
      
    
    {\displaystyle L(f_{\theta }(X),Y)}
  
 to measure how good the model output is (e.g., accuracy for classification tasks or <a href="/facts/Mean_squared_error/kz3TR7bv">mean squared error</a> for regression). We then define an optimization process which finds model parameters 
  
    
      
        θ
      
    
    {\displaystyle \theta }
  
 such that 
  
    
      
        L
        (
        
          f
          
            θ
          
        
        (
        X
        )
        ,
        Y
        )
      
    
    {\displaystyle L(f_{\theta }(X),Y)}
  
 is minimized, referred to as 
  
    
      
        
          θ
          
            ∗
          
        
      
    
    {\displaystyle \theta ^{*}}
  
.
</p>
<h3>Training curve for amount of data</h3>
<p>If the training data is
</p><p>
  
    
      
        {
        
          x
          
            1
          
        
        ,
        
          x
          
            2
          
        
        ,
        …
        ,
        
          x
          
            n
          
        
        }
        ,
        {
        
          y
          
            1
          
        
        ,
        
          y
          
            2
          
        
        ,
        …
        
          y
          
            n
          
        
        }
      
    
    {\displaystyle \{x_{1},x_{2},\dots ,x_{n}\},\{y_{1},y_{2},\dots y_{n}\}}

</p><p>and the validation data is
</p><p>
  
    
      
        {
        
          x
          
            1
          
          ′
        
        ,
        
          x
          
            2
          
          ′
        
        ,
        …
        
          x
          
            m
          
          ′
        
        }
        ,
        {
        
          y
          
            1
          
          ′
        
        ,
        
          y
          
            2
          
          ′
        
        ,
        …
        
          y
          
            m
          
          ′
        
        }
      
    
    {\displaystyle \{x_{1}',x_{2}',\dots x_{m}'\},\{y_{1}',y_{2}',\dots y_{m}'\}}
  
,
</p><p>a learning curve is the plot of the two curves
</p>
<ol><li>
  
    
      
        i
        ↦
        L
        (
        
          f
          
            
              θ
              
                ∗
              
            
            (
            
              X
              
                i
              
            
            ,
            
              Y
              
                i
              
            
            )
          
        
        (
        
          X
          
            i
          
        
        )
        ,
        
          Y
          
            i
          
        
        )
      
    
    {\displaystyle i\mapsto L(f_{\theta ^{*}(X_{i},Y_{i})}(X_{i}),Y_{i})}
  
</li>
<li>
  
    
      
        i
        ↦
        L
        (
        
          f
          
            
              θ
              
                ∗
              
            
            (
            
              X
              
                i
              
            
            ,
            
              Y
              
                i
              
            
            )
          
        
        (
        
          X
          
            i
          
          ′
        
        )
        ,
        
          Y
          
            i
          
          ′
        
        )
      
    
    {\displaystyle i\mapsto L(f_{\theta ^{*}(X_{i},Y_{i})}(X_{i}'),Y_{i}')}
  
</li></ol>
<p>where 
  
    
      
        
          X
          
            i
          
        
        =
        {
        
          x
          
            1
          
        
        ,
        
          x
          
            2
          
        
        ,
        …
        
          x
          
            i
          
        
        }
      
    
    {\displaystyle X_{i}=\{x_{1},x_{2},\dots x_{i}\}}

</p>
<h3>Training curve for number of iterations</h3>
<p>Many optimization <a href="/facts/Algorithm/fnl5NmRt">algorithms</a> are iterative, repeating the same step (such as <a href="/facts/Backpropagation/lCsIdKHc">backpropagation</a>) until the process <a href="/facts/Convergence_(mathematics)/DTok170z">converges</a> to an optimal value. <a href="/facts/Gradient_descent/pFFrek0F">Gradient descent</a> is one such algorithm. If 
  
    
      
        
          θ
          
            i
          
          
            ∗
          
        
      
    
    {\displaystyle \theta _{i}^{*}}
  
 is the approximation of the optimal 
  
    
      
        θ
      
    
    {\displaystyle \theta }
  
 after 
  
    
      
        i
      
    
    {\displaystyle i}
  
 steps, a learning curve is the plot of
</p>
<ol><li>
  
    
      
        i
        ↦
        L
        (
        
          f
          
            
              θ
              
                i
              
              
                ∗
              
            
            (
            X
            ,
            Y
            )
          
        
        (
        X
        )
        ,
        Y
        )
      
    
    {\displaystyle i\mapsto L(f_{\theta _{i}^{*}(X,Y)}(X),Y)}
  
</li>
<li>
  
    
      
        i
        ↦
        L
        (
        
          f
          
            
              θ
              
                i
              
              
                ∗
              
            
            (
            X
            ,
            Y
            )
          
        
        (
        
          X
          ′
        
        )
        ,
        
          Y
          ′
        
        )
      
    
    {\displaystyle i\mapsto L(f_{\theta _{i}^{*}(X,Y)}(X'),Y')}
  
</li></ol>
<h2 id="see-also">See also</h2>
<ul><li><a href="/facts/Overfitting/5xnFLcMg">Overfitting</a></li>
<li><a href="/facts/Bias%E2%80%93variance_tradeoff/xqDnRpeQ">Bias–variance tradeoff</a></li>
<li><a href="/facts/Model_selection/R11QK2GC">Model selection</a></li>
<li><a href="/facts/Cross-validation_(statistics)/pDWg8s0e">Cross-validation (statistics)</a></li>
<li><a href="/facts/Validity_(statistics)/9LNgtTUG">Validity (statistics)</a></li>
<li><a href="/facts/Verification_and_validation/hgqysrzb">Verification and validation</a></li>
<li><a href="/facts/Double_descent/rj7qPTku">Double descent</a></li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1"><p>"Mohr, Felix and van Rijn, Jan N. "Learning Curves for Decision Making in Supervised Machine Learning - A Survey." arXiv preprint arXiv:2201.12150 (2022)". arXiv:2201.12150. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></p></li>
<li id="fn:2"><p>Viering, Tom; Loog, Marco (2023-06-01). "The Shape of Learning Curves: A Review". IEEE Transactions on Pattern Analysis and Machine Intelligence. 45 (6): 7799–7819. arXiv:2103.10948. doi:10.1109/TPAMI.2022.3220744. ISSN 0162-8828. PMID 36350870. <a href="https://ieeexplore.ieee.org/document/9944190" target="_blank">https://ieeexplore.ieee.org/document/9944190</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></p></li>
<li id="fn:3"><p>Perlich, Claudia (2010), "Learning Curves in Machine Learning", in Sammut, Claude; Webb, Geoffrey I. (eds.), Encyclopedia of Machine Learning, Boston, MA: Springer US, pp. 577–580, doi:10.1007/978-0-387-30164-8_452, ISBN 978-0-387-30164-8, retrieved 2023-07-06 <a href="978-0-387-30164-8" target="_blank">978-0-387-30164-8</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></p></li>
<li id="fn:4"><p>Madhavan, P.G. (1997). "A New Recurrent Neural Network Learning Algorithm for Time Series Prediction" (PDF). Journal of Intelligent Systems. p. 113 Fig. 3. <a href="http://www.jininnovation.com/RecurrentNN_JIntlSys_PG.pdf" target="_blank">http://www.jininnovation.com/RecurrentNN_JIntlSys_PG.pdf</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></p></li>
<li id="fn:5"><p>"Machine Learning 102: Practical Advice". Tutorial: Machine Learning for Astronomy with Scikit-learn. <a href="https://astroml.github.com/sklearn_tutorial/practical.html#learning-curves" target="_blank">https://astroml.github.com/sklearn_tutorial/practical.html#learning-curves</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></p></li>
<li id="fn:6"><p>Meek, Christopher; Thiesson, Bo; Heckerman, David (Summer 2002). "The Learning-Curve Sampling Method Applied to Model-Based Clustering". Journal of Machine Learning Research. 2 (3): 397. Archived from the original on 2013-07-15. <a href="https://web.archive.org/web/20130715142652/http://connection.ebscohost.com/c/articles/7188676/learning-curve-sampling-method-applied-model-based-clustering" target="_blank">https://web.archive.org/web/20130715142652/http://connection.ebscohost.com/c/articles/7188676/learning-curve-sampling-method-applied-model-based-clustering</a> <a href="#fnref:6" class="footnote-back-ref">↩</a></p></li>
<li id="fn:7"><p>scikit-learn developers. "Validation curves: plotting scores to evaluate models — scikit-learn 0.20.2 documentation". Retrieved February 15, 2019. <a href="https://scikit-learn.org/stable/modules/learning_curve.html#learning-curve" target="_blank">https://scikit-learn.org/stable/modules/learning_curve.html#learning-curve</a> <a href="#fnref:7" class="footnote-back-ref">↩</a></p></li>
</ol>

Learning curve (machine learning) open-in-new

Learning curve (machine learning)