Perceptron - Model Representation

neuron pre-activation (input activation)

𝑎(𝒙) = 𝑏 + 𝛴_{1≤𝑖≤𝑑}(𝑤_𝑖𝑥_𝑖) = 𝑏 + 𝒘^𝑇𝒙

neuron output activation

ℎ(𝒙) = 𝑔(𝑎(𝒙))

where:

𝒙 = [𝑥₁, ..., 𝑥_𝑑]
𝒘 = [𝑤₁, ..., 𝑤_𝑑]
𝑤_𝑖- weight
𝑥_𝑖- input value
𝑏 - bias
𝑎(..) - a weighted sum function
𝑔(..) - an activation function

perceptron-example.drawio

Example (Only 𝑥₁ & 𝑥₂)

Perceptron - How Weights are Learned

Click here to expand...

General Rule

For each training example (i.e. input vector), if the perceptron outputs:

the correct answer (either 0 or 1), then leave the weights alone
a false negative (0 when the answer is 1), then add the input vector to the weights vector
a false positive (1 when the answer is 0), then subtract the input vector from the weights vector

Geometric Intuition

Initialize a vector space where the dimensions are weights (including the bias). A location in this weight space is dictated by the values of the weights.

For simplicity, let's initialize a 2D weight space (one weight, and one bias). Since our input vector is the same dimension as our weights vectors (note the input for bias is always equal to 1), we can also initialize our input vector within the weights space.

Now let’s take a single training example where the right answer is 1. In the scenario where the right answer is 1, that means we want the z we plug into our step function to be greater than 0.

We must ask, then, when is z, the dot product of the weights and inputs, greater than 0?

If you are familiar with the geometric view of dot products, you will know that dot products can be seen as a measure of how much two vectors point in the same direction.

When two vectors lie in similar directions, the dot product is positive.
When they lie in dissimilar directions, the dot product is negative.
The dot product is exactly zero when the two vectors are perpendicular.

So, we can section off our weight space by creating a dividing line, which is perpendicular to our input vector. This dividing line represents where, if a weight vector was placed along it, would return a z of 0.

Then, depending on our desired answer, we can want our weights vector to be on either side of the dividing line. For example, in our case:

When the Desired Answer is 1	When the Desired Answer is 0
	swap "good weights" with "bad weights"

Thus you can see how to "general rule" above applies.

Perceptron - Limitations

Only in cases where the input features were picked tediously to be linearly separable and not victim to group invariance theorem were perceptrons helpful. In these cases with good input features, perceptrons still worked superbly.

Perceptrons - Limitations that Lead to Neural Networks

Click here to expand...

Neural Networks are Just Multilayer Perceptrons

In cases where good input features were handpicked, perceptrons could still learn very effectively. The hard part was the feature engineering and detecting which features best foretold the class a training example belonged to.

We needed to find an algorithm that also learned the features. That’s just a perceptron algorithm with one more layer — a hidden layer.

The hidden layer(s) in a modern neural network are simply feature detectors for a simpler perceptron algorithm. We filter out, from our initial features (that don’t need to be as ‘perfect’ anymore) which features work best. Our weights in our hidden layer are the feature detectors.

This meant adding another layer and finding a way to learn the weights for another layer, which is conceptually hard to do. How does a change in a weight propagate through a network to affect the final output?

This was not figured out for nearly 17 years after the publication of Perceptrons, until 1986, when a paper by David Rumelhart, Geoffrey Hinton, and Ronald Williams outlined the backpropagation algorithm.

Perceptrons (Artificial Neurons)