Neural tangent kernel

In the study of <a href="/facts/Artificial_neural_network/6V1jMlkx">artificial neural networks</a> (ANNs), the neural tangent kernel (NTK) is a <a href="/facts/Kernel_method/fYuURIPk">kernel</a> that describes the evolution of <a href="/facts/Deep_learning/JLuwD3ea">deep artificial neural networks</a> during their training by <a href="/facts/Gradient_descent/pFFrek0F">gradient descent</a>. It allows ANNs to be studied using theoretical tools from <a href="/facts/Kernel_methods/fYuURIPk">kernel methods</a>.
In general, a kernel is a <a href="/facts/Positive-definite_kernel/Ws3zkxfl">positive-semidefinite</a> <a href="/facts/Symmetric_function/IB8N4T3m">symmetric</a> function of two inputs which represents some notion of similarity between the two inputs. The NTK is a specific kernel derived from a given neural network; in general, when the neural network parameters change during training, the NTK evolves as well. However, in the limit of large layer width the NTK becomes constant, revealing a duality between training the wide neural network and kernel methods: <a href="/facts/Gradient_descent/pFFrek0F">gradient descent</a> in the <a href="/facts/Large_width_limits_of_neural_networks/uGB5Hzb7">infinite-width limit</a> is fully equivalent to kernel gradient descent with the NTK. As a result, using gradient descent to minimize least-square loss for neural networks yields the same mean estimator as ridgeless kernel regression with the NTK. This duality enables simple <a href="/facts/Closed-form_expression/y0xCXlSk">closed form</a> equations describing the training dynamics, <a href="/facts/Generalization_(machine_learning)/e0w0XJTu">generalization</a>, and predictions of wide neural networks.

The NTK was introduced in 2018 by Arthur Jacot, Franck Gabriel and Clément Hongler, who used it to study the convergence and generalization properties of fully connected neural networks. Later works extended the NTK results to other neural network architectures. In fact, the phenomenon behind NTK is not specific to neural networks and can be observed in generic nonlinear models, usually by a suitable scaling.

Neural tangent kernel open-in-new

Neural tangent kernel