Math – Blog

This is part of a megapost is about the 3Blue1Brown series about deep learning.

Overview

This is intended as a lightweight introduction into the topic. The motivation is about the very hard task of programatically classifying the handwritten digits.

There are many variants of neural networks.

Plain vanilla – Multilayer perceptron

Neuron: A thing that holds a number in \([0,1]\). The number inside the neuron is called ‘activation’.

All 784 neurons make up the first layer of the network.

In the last layer we have only 10 neurons which represent the output number.

There are 2 hidden layers a 16 neurons.

The Component Analogy

We hope that each middle layer represents the components of the numbers. (Note: this seems always like a claim without me having seen a proof about that at any time)

The analogy goes further. We disect the circles into single edges, and those into pixels (or the other way around seen from input to output).

How to design the activation flow

Pixel -> Edges -> Patterns -> Digits.

The task at hand is: What dials have to be turned to reliably recognize a pattern.

Take all the activations of the first layer and compute their weighted sum.

\(sumOfNeuron_X = w_1 a_1 + w_2 a_2 + … + w_n a_n \)

Activation function (per neuron)

Sigmoid!!! (i like that curve)

Constrains the inputs into outputs from -1 to 1.

Bias (per neuron)

Maybe you want a bias as to when we want to read a positive value. Then we add a negative bias (threshold). To distinguish getting meaningfully active.

Notation

Because we want to standardize everthing.

All activations are represented with a column-vector, the weights are represented as a row in the weight matrix and a column-vector for the bias. The \( \sigma \) function is the above mentioned sigmoid function, getting applied to each of the \( k \) neurons in the next layer.

\[
\sigma \left(
\begin{bmatrix} w_{0,0} & w_{0,1} & … & w_{0,n}\\ w_{1,0} & w_{1,1} & … & w_{1,n} \\ \vdots & \vdots & \ddots & \vdots \\ w_{k,0} & w_{k,1} & … & w_{k,n} \end{bmatrix}

\cdot

\begin{bmatrix} a_{0} \\ a_{1} \\ \vdots \\ a_{n} \end{bmatrix}

+

\begin{bmatrix} b_{0} \\ b_{1} \\ \vdots \\ b_{k} \end{bmatrix}

\right)

=

\begin{bmatrix} r_{0} \\ r_{1} \\ \vdots \\ r_{k} \end{bmatrix}
\]

This gets oversimplified to the following

\[
\sigma \left(
W

a^{(0)}

+

b

\right)

=

a^{(1)}
\]

What is a Neuron – revisited

A function. Takes in the output of all the neurons in the previous layer and spits out a number.

Category: Math

Why we need the geometric mean

But what is a neural network? Deep Learning Chapter 1