In machine learning, the delta rule is a gradient descent learning rule for updating the weights of the inputs to artificial neurons in a single-layer neural network. It can be derived as the backpropagation algorithm for a single-layer neural network with mean-square error loss function.
For a neuron j {\displaystyle j} with activation function g ( x ) {\displaystyle g(x)} , the delta rule for neuron j {\displaystyle j} 's i {\displaystyle i} -th weight w j i {\displaystyle w_{ji}} is given by
Δ w j i = α ( t j − y j ) g ′ ( h j ) x i , {\displaystyle \Delta w_{ji}=\alpha (t_{j}-y_{j})g'(h_{j})x_{i},}
where
It holds that h j = ∑ i x i w j i {\textstyle h_{j}=\sum _{i}x_{i}w_{ji}} and y j = g ( h j ) {\displaystyle y_{j}=g(h_{j})} .
The delta rule is commonly stated in simplified form for a neuron with a linear activation function as Δ w j i = α ( t j − y j ) x i {\displaystyle \Delta w_{ji}=\alpha \left(t_{j}-y_{j}\right)x_{i}}
While the delta rule is similar to the perceptron's update rule, the derivation is different. The perceptron uses the Heaviside step function as the activation function g ( h ) {\displaystyle g(h)} , and that means that g ′ ( h ) {\displaystyle g'(h)} does not exist at zero, and is equal to zero elsewhere, which makes the direct application of the delta rule impossible.