When creating a function to approximate the distribution of some data, it is necessary to define a loss function L ( f θ ( X ) , Y ) {\displaystyle L(f_{\theta }(X),Y)} to measure how good the model output is (e.g., accuracy for classification tasks or mean squared error for regression). We then define an optimization process which finds model parameters θ {\displaystyle \theta } such that L ( f θ ( X ) , Y ) {\displaystyle L(f_{\theta }(X),Y)} is minimized, referred to as θ ∗ {\displaystyle \theta ^{*}} .
If the training data is
{ x 1 , x 2 , … , x n } , { y 1 , y 2 , … y n } {\displaystyle \{x_{1},x_{2},\dots ,x_{n}\},\{y_{1},y_{2},\dots y_{n}\}}
and the validation data is
{ x 1 ′ , x 2 ′ , … x m ′ } , { y 1 ′ , y 2 ′ , … y m ′ } {\displaystyle \{x_{1}',x_{2}',\dots x_{m}'\},\{y_{1}',y_{2}',\dots y_{m}'\}} ,
a learning curve is the plot of the two curves
where X i = { x 1 , x 2 , … x i } {\displaystyle X_{i}=\{x_{1},x_{2},\dots x_{i}\}}
Many optimization algorithms are iterative, repeating the same step (such as backpropagation) until the process converges to an optimal value. Gradient descent is one such algorithm. If θ i ∗ {\displaystyle \theta _{i}^{*}} is the approximation of the optimal θ {\displaystyle \theta } after i {\displaystyle i} steps, a learning curve is the plot of
"Mohr, Felix and van Rijn, Jan N. "Learning Curves for Decision Making in Supervised Machine Learning - A Survey." arXiv preprint arXiv:2201.12150 (2022)". arXiv:2201.12150. /wiki/ArXiv_(identifier) ↩
Viering, Tom; Loog, Marco (2023-06-01). "The Shape of Learning Curves: A Review". IEEE Transactions on Pattern Analysis and Machine Intelligence. 45 (6): 7799–7819. arXiv:2103.10948. doi:10.1109/TPAMI.2022.3220744. ISSN 0162-8828. PMID 36350870. https://ieeexplore.ieee.org/document/9944190 ↩
Perlich, Claudia (2010), "Learning Curves in Machine Learning", in Sammut, Claude; Webb, Geoffrey I. (eds.), Encyclopedia of Machine Learning, Boston, MA: Springer US, pp. 577–580, doi:10.1007/978-0-387-30164-8_452, ISBN 978-0-387-30164-8, retrieved 2023-07-06 978-0-387-30164-8 ↩
Madhavan, P.G. (1997). "A New Recurrent Neural Network Learning Algorithm for Time Series Prediction" (PDF). Journal of Intelligent Systems. p. 113 Fig. 3. http://www.jininnovation.com/RecurrentNN_JIntlSys_PG.pdf ↩
"Machine Learning 102: Practical Advice". Tutorial: Machine Learning for Astronomy with Scikit-learn. https://astroml.github.com/sklearn_tutorial/practical.html#learning-curves ↩
Meek, Christopher; Thiesson, Bo; Heckerman, David (Summer 2002). "The Learning-Curve Sampling Method Applied to Model-Based Clustering". Journal of Machine Learning Research. 2 (3): 397. Archived from the original on 2013-07-15. https://web.archive.org/web/20130715142652/http://connection.ebscohost.com/c/articles/7188676/learning-curve-sampling-method-applied-model-based-clustering ↩
scikit-learn developers. "Validation curves: plotting scores to evaluate models — scikit-learn 0.20.2 documentation". Retrieved February 15, 2019. https://scikit-learn.org/stable/modules/learning_curve.html#learning-curve ↩