Mathematical Proof of the Perceptron Learning Rule
Perceptron Learning Rule
The Perceptron Learning Rule is based on an update which only happens when the activation (related to the predicted label) is different from the actual label. In the situation where the activation and the actual label are the same, no update takes places!
The learning rule is based on the following mathematical formula: \[ w_{d} = w_{d} + yx{_d} \] \[ b = b + y \] where y is the actual label value!
We have current parameters, \[ w = (w_1, w_2, w_3,....., w_d) \] \[ \text{where } w \in \mathbb{R}^{d} \] \[ b \rightarrow \text{bias} \]
For an example: \((x, y)\) where \(x \in \mathbb{R}^{d}\) and \(d\) = number of features, and \(y = \pm 1\) is the true label!
When Actual Label: \(y = +1\) but Activation: \(a < 0\)
Let us say, an example is of a positive label: \(y = +1\), but our activation is wrong, \(a < 0\)!
Updating the weights and bias, we have a modified weight vector \(w^{'}\) and a modified bias \(b^{'}\): \(w^{'} = (w_{1}^{'}, w_{2}^{'}, w_{3}^{'},......, w_{d}^{'})\) and \(b^{'}\)
When we observe the same example again, we compute the new activation \(a^{'}\):
\[ a^{'} = \sum_{d = 1}^{D}w_{d}^{'}x_{d} + b^{'} = \sum_{d = 1}^{D}(w_{d} + yx_{d})x_{d} + (b + y) \]
Since, \(y = +1\): \[ a^{'} = \sum_{d = 1}^{D}(w_{d} + x_{d})x_{d} + (b + 1) \] \[ = \sum_{d = 1}^{D}w_{d}x_{d} + \sum_{d = 1}^{D}(x_{d})^{2} + b + 1 \] \[ = [\sum_{d = 1}^{D}w_{d}x_{d} + b] + \sum_{d = 1}^{D}(x_{d})^{2} + 1 \]
Since, \(a \text{ (the old activation)}\) = \(\sum_{d = 1}^{D}w_{d}x_{d} + b\), we have: \[ a^{'} = a + \sum_{d = 1}^{D}(x_{d})^{2} + 1 \text{ which makes } a^{'} > a \] Since, \((x_d)^{2} \geq 0\), \(a^{'}\) is always at least 1 more than \(a\). Since our activation a was \(a < 0\), and \(y\) was \(y = +1\), we have successfully moved \(a^{'}\) in the right direction i.e. towards positive!
When Actual Label: \(y = -1\) but Activation: \(a > 0\)
Let us say, an example is of a positive label: \(y = -1\), but our activation is wrong, \(a > 0\)!
Updating the weights and bias, we have a modified weight vector \(w^{'}\) and a modified bias \(b^{'}\): \(w^{'} = (w_{1}^{'}, w_{2}^{'}, w_{3}^{'},......, w_{d}^{'})\) and \(b^{'}\)
When we observe the same example again, we compute the new activation \(a^{'}\):
\[ a^{'} = \sum_{d = 1}^{D}w_{d}^{'}x_{d} + b^{'} = \sum_{d = 1}^{D}(w_{d} + yx_{d})x_{d} + (b + y) \]
Since, \(y = -1\): \[ a^{'} = \sum_{d = 1}^{D}(w_{d} - x_{d})x_{d} + (b - 1) \] \[ = \sum_{d = 1}^{D}w_{d}x_{d} - \sum_{d = 1}^{D}(x_{d})^{2} + b - 1 \] \[ = [\sum_{d = 1}^{D}w_{d}x_{d} + b] - [\sum_{d = 1}^{D}(x_{d})^{2} + 1] \]
Since, \(a \text{ (the old activation)}\) = \(\sum_{d = 1}^{D}w_{d}x_{d} + b\), we have: \[ a^{'} = a - [\sum_{d = 1}^{D}(x_{d})^{2} + 1] \text{ which makes } a^{'} < a \] Since, \((x_d)^{2} \geq 0\), \(a^{'}\) is always at least 1 less than \(a\). Since our activation a was \(a > 0\), and \(y\) was \(y = -1\), we have successfully moved \(a^{'}\) in the right direction i.e. towards negative!