Curriculum learning

<h2 id="approach">Approach</h2>
Most generally, curriculum learning is the technique of successively increasing the difficulty of examples in the <a href="/facts/Training_set/V3mMQb2G">training set</a> that is presented to a model over multiple training iterations. This can produce better results than exposing the model to the full training set immediately under some circumstances; most typically, when the model is able to learn general principles from easier examples, and then gradually incorporate more complex and nuanced information as harder examples are introduced, such as <a href="/facts/Edge_case/E5po5oDy">edge cases</a>. This has been shown to work in many domains, most likely as a form of <a href="/facts/Regularization_(mathematics)/K601A3y2">regularization</a>.<a class="footnote-ref" id="fnref:3" href="#fn:3">3</a>
There are several major variations in how the technique is applied:

<ul><li>A concept of "difficulty" must be defined. This may come from human annotation<a class="footnote-ref" id="fnref:4" href="#fn:4">4</a><a class="footnote-ref" id="fnref:5" href="#fn:5">5</a> or an external <a href="/facts/Heuristic/BcBEkv9c">heuristic</a>; for example in <a href="/facts/Language_modeling/FntSpg0j">language modeling</a>, shorter sentences might be classified as easier than longer ones.<a class="footnote-ref" id="fnref:6" href="#fn:6">6</a> Another approach is to use the performance of another model, with examples accurately predicted by that model being classified as easier (providing a connection to <a href="/facts/Boosting_(machine_learning)/HgejTPPu">boosting</a>).</li>
<li>Difficulty can be increased steadily<a class="footnote-ref" id="fnref:7" href="#fn:7">7</a> or in distinct epochs,<a class="footnote-ref" id="fnref:8" href="#fn:8">8</a> and in a deterministic schedule or according to a <a href="/facts/Probability_distribution/EpsKKVRu">probability distribution</a>. This may also be moderated by a requirement for diversity at each stage, in cases where easier examples are likely to be disproportionately similar to each other.<a class="footnote-ref" id="fnref:9" href="#fn:9">9</a></li>
<li>Applications must also decide the schedule for increasing the difficulty. Simple approaches may use a fixed schedule, such as training on easy examples for half of the available iterations and then all examples for the second half.<a class="footnote-ref" id="fnref:10" href="#fn:10">10</a> Other approaches use <a href="/facts/Self-paced_learning/jJJpIJJk">self-paced learning</a> to increase the difficulty in proportion to the performance of the model on the current set.<a class="footnote-ref" id="fnref:11" href="#fn:11">11</a></li></ul>
Since curriculum learning only concerns the selection and ordering of training data, it can be combined with many other techniques in machine learning. The success of the method assumes that a model trained for an easier version of the problem can <a href="/facts/Generalization_(learning)/oV3xKSA7">generalize</a> to harder versions, so it can be seen as a form of <a href="/facts/Transfer_learning/pNz4P2KP">transfer learning</a>. Some authors also consider curriculum learning to include other forms of progressively increasing complexity, such as increasing the number of model parameters.<a class="footnote-ref" id="fnref:12" href="#fn:12">12</a> It is frequently combined with <a href="/facts/Reinforcement_learning/NrgPPS0Q">reinforcement learning</a>, such as learning a simplified version of a game first.<a class="footnote-ref" id="fnref:13" href="#fn:13">13</a>
Some domains have shown success with anti-curriculum learning: training on the most difficult examples first. One example is the ACCAN method for <a href="/facts/Speech_recognition/z7S7Pgk6">speech recognition</a>, which trains on the examples with the lowest <a href="/facts/Signal-to-noise_ratio/qohClhyG">signal-to-noise ratio</a> first.<a class="footnote-ref" id="fnref:14" href="#fn:14">14</a>

<h2 id="history">History</h2>
The term "curriculum learning" was introduced by <a href="/facts/Yoshua_Bengio/796atWBU">Yoshua Bengio</a> et al in 2009,<a class="footnote-ref" id="fnref:15" href="#fn:15">15</a> with reference to the <a href="/facts/Psychology/duoqW7Sz">psychological</a> technique of <a href="/facts/Shaping_(psychology)/m9sHyePr">shaping</a> in animals and structured education for humans: beginning with the simplest concepts and then building on them. The authors also note that the application of this technique in machine learning has its roots in the early study of <a href="/facts/Neural_networks/DDhJTfMc">neural networks</a> such as <a href="/facts/Jeffrey_Elman/EGexBx0L">Jeffrey Elman</a>'s 1993 paper Learning and development in neural networks: the importance of starting small. <a class="footnote-ref" id="fnref:16" href="#fn:16">16</a> Bengio et al showed good results for problems in <a href="/facts/Image_classification/Tl2Yyk66">image classification</a>, such as identifying <a href="/facts/Geometric_shape/nww0x85d">geometric shapes</a> with progressively more complex forms, and <a href="/facts/Language_modeling/FntSpg0j">language modeling</a>, such as training with a gradually expanding <a href="/facts/Vocabulary/gUfoY6TJ">vocabulary</a>. They conclude that, for curriculum strategies, "their beneficial effect is most pronounced on the test
set", suggesting good generalization.
The technique has since been applied to many other domains:

<ul><li><a href="/facts/Natural_language_processing/1hjMKsSN">Natural language processing</a>:
<ul><li><a href="/facts/Part-of-speech_tagging/BYAPmH3r">Part-of-speech tagging</a><a class="footnote-ref" id="fnref:17" href="#fn:17">17</a></li>
<li><a href="/facts/Intent_detection/lRtgcUbw">Intent detection</a><a class="footnote-ref" id="fnref:18" href="#fn:18">18</a></li>
<li><a href="/facts/Sentiment_analysis/QRdywf6g">Sentiment analysis</a><a class="footnote-ref" id="fnref:19" href="#fn:19">19</a></li>
<li><a href="/facts/Machine_translation/DGF3NwuI">Machine translation</a><a class="footnote-ref" id="fnref:20" href="#fn:20">20</a><a class="footnote-ref" id="fnref:21" href="#fn:21">21</a></li>
<li><a href="/facts/Speech_recognition/z7S7Pgk6">Speech recognition</a><a class="footnote-ref" id="fnref:22" href="#fn:22">22</a></li>
<li><a href="/facts/Language_model/FntSpg0j">Language model</a> pre-training<a class="footnote-ref" id="fnref:23" href="#fn:23">23</a></li></ul></li>
<li><a href="/facts/Image_recognition/Tl2Yyk66">Image recognition</a>:
<ul><li><a href="/facts/Facial_recognition_system/mtVxaDln">Facial recognition</a><a class="footnote-ref" id="fnref:24" href="#fn:24">24</a></li>
<li><a href="/facts/Object_detection/sCx3FK44">Object detection</a><a class="footnote-ref" id="fnref:25" href="#fn:25">25</a></li></ul></li>
<li><a href="/facts/Reinforcement_learning/NrgPPS0Q">Reinforcement learning</a>:
<ul><li>Game-playing<a class="footnote-ref" id="fnref:26" href="#fn:26">26</a></li></ul></li>
<li><a href="/facts/Graph_neural_network/6rRxvT5H">Graph learning</a><a class="footnote-ref" id="fnref:27" href="#fn:27">27</a><a class="footnote-ref" id="fnref:28" href="#fn:28">28</a></li>
<li><a href="/facts/Matrix_factorization_(recommender_systems)/1aYturD5">Matrix factorization</a><a class="footnote-ref" id="fnref:29" href="#fn:29">29</a></li></ul>

<h2 id="further-reading">Further reading</h2>
<ul><li><a href="https://arxiv.org/abs/2101.10382">Curriculum Learning: A Survey</a></li>
<li><a href="https://ieeexplore.ieee.org/document/9392296">A Survey on Curriculum Learning</a></li>
<li><a href="https://dl.acm.org/doi/abs/10.5555/3455716.3455897">Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey</a></li>
<li><a href="https://ieeexplore.ieee.org/search/searchresult.jsp?matchBoolean=true&queryText=%22Index%20Terms%22:Curriculum%20Learning">Curriculum learning at IEEE Xplore</a></li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1">Guo, Sheng; Huang, Weilin; Zhang, Haozhi; Zhuang, Chenfan; Dong, Dengke; Scott, Matthew R.; Huang, Dinglong (2018). "CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images". arXiv:1808.01097 [cs.CV]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></li>
<li id="fn:2">"Competence-based curriculum learning for neural machine translation". Retrieved March 29, 2024. <a href="https://www.researchgate.net/publication/334600961" target="_blank">https://www.researchgate.net/publication/334600961</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></li>
<li id="fn:3">Bengio, Yoshua; Louradour, Jérôme; Collobert, Ronan; Weston, Jason (2009). "Curriculum Learning". Proceedings of the 26th Annual International Conference on Machine Learning. pp. 41–48. doi:10.1145/1553374.1553380. ISBN 978-1-60558-516-1. Retrieved March 24, 2024. <a href="978-1-60558-516-1" target="_blank">978-1-60558-516-1</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></li>
<li id="fn:4">"Curriculum learning of multiple tasks". Retrieved March 29, 2024. <a href="https://www.researchgate.net/publication/308813493" target="_blank">https://www.researchgate.net/publication/308813493</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></li>
<li id="fn:5">Ionescu, Radu Tudor; Alexe, Bogdan; Leordeanu, Marius; Popescu, Marius; Papadopoulos, Dim P.; Ferrari, Vittorio (2016). "How Hard Can It Be? Estimating the Difficulty of Visual Search in an Image". 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (PDF). pp. 2157–2166. doi:10.1109/CVPR.2016.237. ISBN 978-1-4673-8851-1. Retrieved March 29, 2024. <a href="978-1-4673-8851-1" target="_blank">978-1-4673-8851-1</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></li>
<li id="fn:6">"Baby Steps: How "Less is More" in unsupervised dependency parsing" (PDF). Retrieved March 29, 2024. <a href="https://web.stanford.edu/~jurafsky/babysteps.pdf" target="_blank">https://web.stanford.edu/~jurafsky/babysteps.pdf</a> <a href="#fnref:6" class="footnote-back-ref">↩</a></li>
<li id="fn:7">"Self-paced learning for latent variable models". 6 December 2010. pp. 1189–1197. Retrieved March 29, 2024. <a href="https://dl.acm.org/doi/abs/10.5555/2997189.2997322" target="_blank">https://dl.acm.org/doi/abs/10.5555/2997189.2997322</a> <a href="#fnref:7" class="footnote-back-ref">↩</a></li>
<li id="fn:8">Tang, Ye; Yang, Yu-Bin; Gao, Yang (2012). "Self-paced dictionary learning for image classification". Proceedings of the 20th ACM international conference on Multimedia. pp. 833–836. doi:10.1145/2393347.2396324. ISBN 978-1-4503-1089-5. Retrieved March 29, 2024. <a href="978-1-4503-1089-5" target="_blank">978-1-4503-1089-5</a> <a href="#fnref:8" class="footnote-back-ref">↩</a></li>
<li id="fn:9">"Curriculum learning with diversity for supervised computer vision tasks". Retrieved March 29, 2024. <a href="https://www.researchgate.net/publication/344347816" target="_blank">https://www.researchgate.net/publication/344347816</a> <a href="#fnref:9" class="footnote-back-ref">↩</a></li>
<li id="fn:10">Bengio, Yoshua; Louradour, Jérôme; Collobert, Ronan; Weston, Jason (2009). "Curriculum Learning". Proceedings of the 26th Annual International Conference on Machine Learning. pp. 41–48. doi:10.1145/1553374.1553380. ISBN 978-1-60558-516-1. Retrieved March 24, 2024. <a href="978-1-60558-516-1" target="_blank">978-1-60558-516-1</a> <a href="#fnref:10" class="footnote-back-ref">↩</a></li>
<li id="fn:11">"Self-paced Curriculum Learning". Retrieved March 29, 2024. <a href="https://www.researchgate.net/publication/279853657" target="_blank">https://www.researchgate.net/publication/279853657</a> <a href="#fnref:11" class="footnote-back-ref">↩</a></li>
<li id="fn:12">Soviany, Petru; Radu Tudor Ionescu; Rota, Paolo; Sebe, Nicu (2021). "Curriculum learning: A Survey". arXiv:2101.10382 [cs.LG]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:12" class="footnote-back-ref">↩</a></li>
<li id="fn:13">Narvekar, Sanmit; Peng, Bei; Leonetti, Matteo; Sinapov, Jivko; Taylor, Matthew E.; Stone, Peter (January 2020). "Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey". The Journal of Machine Learning Research. 21 (1): 181:7382–181:7431. arXiv:2003.04960. Retrieved March 29, 2024. <a href="https://dl.acm.org/doi/abs/10.5555/3455716.3455897" target="_blank">https://dl.acm.org/doi/abs/10.5555/3455716.3455897</a> <a href="#fnref:13" class="footnote-back-ref">↩</a></li>
<li id="fn:14">"A Curriculum Learning Method for Improved Noise Robustness in Automatic Speech Recognition". Retrieved March 29, 2024. <a href="https://www.researchgate.net/publication/304270362" target="_blank">https://www.researchgate.net/publication/304270362</a> <a href="#fnref:14" class="footnote-back-ref">↩</a></li>
<li id="fn:15">Bengio, Yoshua; Louradour, Jérôme; Collobert, Ronan; Weston, Jason (2009). "Curriculum Learning". Proceedings of the 26th Annual International Conference on Machine Learning. pp. 41–48. doi:10.1145/1553374.1553380. ISBN 978-1-60558-516-1. Retrieved March 24, 2024. <a href="978-1-60558-516-1" target="_blank">978-1-60558-516-1</a> <a href="#fnref:15" class="footnote-back-ref">↩</a></li>
<li id="fn:16">Elman, J. L. (1993). "Learning and development in neural networks: the importance of starting small". Cognition. 48 (1): 71–99. doi:10.1016/0010-0277(93)90058-4. PMID 8403835. Retrieved March 29, 2024. <a href="https://pubmed.ncbi.nlm.nih.gov/8403835/" target="_blank">https://pubmed.ncbi.nlm.nih.gov/8403835/</a> <a href="#fnref:16" class="footnote-back-ref">↩</a></li>
<li id="fn:17">"Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning". Retrieved March 29, 2024. <a href="https://www.researchgate.net/publication/306093292" target="_blank">https://www.researchgate.net/publication/306093292</a> <a href="#fnref:17" class="footnote-back-ref">↩</a></li>
<li id="fn:18">Gong, Yantao; Liu, Cao; Yuan, Jiazhen; Yang, Fan; Cai, Xunliang; Wan, Guanglu; Chen, Jiansong; Niu, Ruiyao; Wang, Houfeng (2021). "Density-based dynamic curriculum learning for intent detection". Proceedings of the 30th ACM International Conference on Information & Knowledge Management. pp. 3034–3037. arXiv:2108.10674. doi:10.1145/3459637.3482082. ISBN 978-1-4503-8446-9. Retrieved March 29, 2024. <a href="978-1-4503-8446-9" target="_blank">978-1-4503-8446-9</a> <a href="#fnref:18" class="footnote-back-ref">↩</a></li>
<li id="fn:19">"Visualizing and understanding curriculum learning for long short-term memory networks". Retrieved March 29, 2024. <a href="https://www.researchgate.net/publication/310595150" target="_blank">https://www.researchgate.net/publication/310595150</a> <a href="#fnref:19" class="footnote-back-ref">↩</a></li>
<li id="fn:20">"An empirical exploration of curriculum learning for neural machine translation". Retrieved March 29, 2024. <a href="https://www.researchgate.net/publication/328736800" target="_blank">https://www.researchgate.net/publication/328736800</a> <a href="#fnref:20" class="footnote-back-ref">↩</a></li>
<li id="fn:21">"Reinforcement learning based curriculum optimization for neural machine translation". Retrieved March 29, 2024. <a href="https://www.researchgate.net/publication/334601695" target="_blank">https://www.researchgate.net/publication/334601695</a> <a href="#fnref:21" class="footnote-back-ref">↩</a></li>
<li id="fn:22">"A curriculum learning method for improved noise robustness in automatic speechrecognition". Retrieved March 29, 2024. <a href="https://www.researchgate.net/publication/304270362" target="_blank">https://www.researchgate.net/publication/304270362</a> <a href="#fnref:22" class="footnote-back-ref">↩</a></li>
<li id="fn:23">"Beyond Random Sampling: Efficient Language Model Pretraining via Curriculum Learning". Retrieved June 12, 2025. <a href="https://arxiv.org/abs/2506.11300" target="_blank">https://arxiv.org/abs/2506.11300</a> <a href="#fnref:23" class="footnote-back-ref">↩</a></li>
<li id="fn:24">Huang, Yuge; Wang, Yuhan; Tai, Ying; Liu, Xiaoming; Shen, Pengcheng; Li, Shaoxin; Li, Jilin; Huang, Feiyue (2020). "CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition". 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 5900–5909. arXiv:2004.00288. doi:10.1109/CVPR42600.2020.00594. ISBN 978-1-7281-7168-5. Retrieved March 29, 2024. <a href="978-1-7281-7168-5" target="_blank">978-1-7281-7168-5</a> <a href="#fnref:24" class="footnote-back-ref">↩</a></li>
<li id="fn:25">"Curriculum self-paced learning for cross-domain object detection". Retrieved March 29, 2024. <a href="https://www.researchgate.net/publication/348568948" target="_blank">https://www.researchgate.net/publication/348568948</a> <a href="#fnref:25" class="footnote-back-ref">↩</a></li>
<li id="fn:26">"Automatic curriculum graph generation for reinforcement learning agents". 4 February 2017. pp. 2590–2596. Retrieved March 29, 2024. <a href="https://dl.acm.org/doi/10.5555/3298483.3298612" target="_blank">https://dl.acm.org/doi/10.5555/3298483.3298612</a> <a href="#fnref:26" class="footnote-back-ref">↩</a></li>
<li id="fn:27">Gong, Chen; Yang, Jian; Tao, Dacheng (2019). "Multi-modal curriculum learning over graphs". ACM Transactions on Intelligent Systems and Technology. 10 (4): 1–25. doi:10.1145/3322122. Retrieved March 29, 2024. <a href="https://dl.acm.org/doi/abs/10.1145/3322122" target="_blank">https://dl.acm.org/doi/abs/10.1145/3322122</a> <a href="#fnref:27" class="footnote-back-ref">↩</a></li>
<li id="fn:28">Qu, Meng; Tang, Jian; Han, Jiawei (2018). Curriculum learning for heterogeneous star network embedding via deep reinforcement learning. pp. 468–476. doi:10.1145/3159652.3159711. hdl:2142/101634. ISBN 978-1-4503-5581-0. Retrieved March 29, 2024. <a href="978-1-4503-5581-0" target="_blank">978-1-4503-5581-0</a> <a href="#fnref:28" class="footnote-back-ref">↩</a></li>
<li id="fn:29">Self-paced learning for matrix factorization. MIT Press. 25 January 2015. pp. 3196–3202. ISBN 978-0-262-51129-2. Retrieved March 29, 2024. <a href="978-0-262-51129-2" target="_blank">978-0-262-51129-2</a> <a href="#fnref:29" class="footnote-back-ref">↩</a></li>
</ol>

Curriculum learning open-in-new

Curriculum learning