Delta rule

In <a href="/facts/Machine_learning/e0w0XJTu">machine learning</a>, the delta rule is a <a href="/facts/Gradient_descent/pFFrek0F">gradient descent</a> learning rule for updating the weights of the inputs to <a href="/facts/Artificial_neurons/TSH6nsgg">artificial neurons</a> in a <a href="/facts/Feedforward_neural_network/CP0pPGDF">single-layer neural network</a>. It can be derived as the <a href="/facts/Backpropagation/lCsIdKHc">backpropagation</a> algorithm for a single-layer neural network with mean-square error loss function.
For a neuron 
 
 
 
 j
 
 
 {\displaystyle j}
 
 with <a href="/facts/Activation_function/S4NImL6L">activation function</a> 
 
 
 
 g
 (
 x
 )
 
 
 {\displaystyle g(x)}
 
, the delta rule for neuron 
 
 
 
 j
 
 
 {\displaystyle j}
 
's 
 
 
 
 i
 
 
 {\displaystyle i}
 
-th weight 
 
 
 
 
 w
 
 j
 i
 
 
 
 
 {\displaystyle w_{ji}}
 
 is given by

 
 
 
 Δ
 
 w
 
 j
 i
 
 
 =
 α
 (
 
 t
 
 j
 
 
 −
 
 y
 
 j
 
 
 )
 
 g
 ′
 
 (
 
 h
 
 j
 
 
 )
 
 x
 
 i
 
 
 ,
 
 
 {\displaystyle \Delta w_{ji}=\alpha (t_{j}-y_{j})g'(h_{j})x_{i},}

where

<ul><li>
 
 
 
 α
 
 
 {\displaystyle \alpha }
 
 is a small constant called <a href="/facts/Learning_rate/EClSgMCR">learning rate</a></li>
<li>
 
 
 
 g
 (
 x
 )
 
 
 {\displaystyle g(x)}
 
 is the neuron's activation function</li>
<li>
 
 
 
 
 g
 ′
 
 
 
 {\displaystyle g'}
 
 is the <a href="/facts/Derivative/7Ito70Dh">derivative</a> of 
 
 
 
 g
 
 
 {\displaystyle g}
 
</li>
<li>
 
 
 
 
 t
 
 j
 
 
 
 
 {\displaystyle t_{j}}
 
 is the target output</li>
<li>
 
 
 
 
 h
 
 j
 
 
 
 
 {\displaystyle h_{j}}
 
 is the weighted sum of the neuron's inputs</li>
<li>
 
 
 
 
 y
 
 j
 
 
 
 
 {\displaystyle y_{j}}
 
 is the actual output</li>
<li>
 
 
 
 
 x
 
 i
 
 
 
 
 {\displaystyle x_{i}}
 
 is the 
 
 
 
 i
 
 
 {\displaystyle i}
 
-th input.</li></ul>
It holds that 
 
 
 
 
 h
 
 j
 
 
 =
 
 ∑
 
 i
 
 
 
 x
 
 i
 
 
 
 w
 
 j
 i
 
 
 
 
 {\textstyle h_{j}=\sum _{i}x_{i}w_{ji}}
 
 and 
 
 
 
 
 y
 
 j
 
 
 =
 g
 (
 
 h
 
 j
 
 
 )
 
 
 {\displaystyle y_{j}=g(h_{j})}
 
.
The delta rule is commonly stated in simplified form for a neuron with a linear activation function as

Δ
 
 w
 
 j
 i
 
 
 =
 α
 
 (
 
 
 t
 
 j
 
 
 −
 
 y
 
 j
 
 
 
 )
 
 
 x
 
 i
 
 
 
 
 {\displaystyle \Delta w_{ji}=\alpha \left(t_{j}-y_{j}\right)x_{i}}
 
 
While the delta rule is similar to the <a href="/facts/Perceptron/ArxdkAC1">perceptron</a>'s update rule, the derivation is different. The perceptron uses the <a href="/facts/Heaviside_step_function/bgsUtxgP">Heaviside step function</a> as the activation function 
 
 
 
 g
 (
 h
 )
 
 
 {\displaystyle g(h)}
 
, and that means that 
 
 
 
 
 g
 ′
 
 (
 h
 )
 
 
 {\displaystyle g'(h)}
 
 does not exist at zero, and is equal to zero elsewhere, which makes the direct application of the delta rule impossible.

Delta rule open-in-new

Delta rule