From Neural Networks to Deep Learning

Prof. Dr. Mirco Schoenfeld

Why successful today?

Why is deep learning successful today?

(Raza 2023; Pichler and Hartig 2023)

Moores law

(Roser, Ritchie, and Mathieu 2023)

Moores law

(Roser, Ritchie, and Mathieu 2023)

computational power

  1. Computational power has reached huge capabilities and
    specialized hardware has become available.

data availability

  1. Deep learning handles vast amounts of data efficiently.

end-to-end learning

  1. Characteristic end-to-end learning doesn’t require manual feature engineering.

Transfer learning

  1. Pre-trained state-of-the-art models allow for transfer learning.

DL’s bright future

Deep learning will continue to shape the future of artificial intelligence.

behind the scenes

What’s behind the scenes?

Neural Nets

Deep Learning models are neural networks.

(Shukla 2019)

Neurons

Neural networks are modeled after neural cells.

Artificial Neurons

Artificial Neurons are the elementary
units of artificial neural networks.

Artificial Neurons

An artificial neuron is a function that receives one or more inputs, applies weights to these inputs and sums them to produce an output.

Artificial Neurons

Many artificial neurons together form a neural network.

the input

The input is one neuron per feature.

Here, it is a 28x28 pixel image. 28x28=784 input neurons.

the output

The number of output neurons depends on the number of predictions you want to make.

For regression, this can be one neuron.

For classification, this is one neuron per class.

the inner part

(Shukla 2019)

the inner part

Defining the number of hidden layers and number of
neurons per hidden layer is… pure magic?

Hyperparameters

There are even more hyperparameters

Learning

Learning means adjusting the inner input weights after data processing and based on the amount of error in the output.

By the way, the GPT-3 model had 175 Billion inner weights.

loss function

Measuring the amount of error in the output is
the purpose of a loss function.

Ultimately, the entire goal of training is to minimize loss.

loss function

Two widely used loss functions:

  1. Mean Squared Error (MSE): \(\frac{1}{n} \sum_{i=1}^n (Y_i - Ŷ_i)^2\)
  2. Categorical Cross Entropy (Gomez 2018)

Learning Rate

Learning rate is a tuning parameter that
determines how quickly a model “learns”.

It influences to what extent new information
overrides old information.

Setting the learning rate

Setting the learning rate is a trade-off:
either converge too slow or miss important details…

https://x.com/Jeande_d/status/1459222900066586625

many parameters

These have been just a few of the available hyperparameters.

There are also

  • epochs
  • batch size
  • bias in neurons
  • activation functions
  • weight initialization
  • and many more…

many parameters

Funnily enough, studies even yield opposing recommendations in setting certain parameters.

(Karpathy 2019)

recent advancements

Recent advancements in the field even top it up a notch…

Kolmogorov-Arnold Networks (Liu et al. 2024)

Kolmogorov-Arnold Networks

They turn this (the good’ol multilayer perceptron)…

Kolmogorov-Arnold Networks

…into this

getting it to work

Is there a secret to successfully set up a neural network?

getting it to work

Most of the time it will train but silently work a bit worse. (Karpathy 2019)

getting it to work

Suffering is a perfectly natural part of getting a neural network to work well. (Karpathy 2019)

some important hightlights

Some more hightlights…just slightly older.

Backpropagation

Backpropagation

Backpropagation takes a neural networks output error and propagates this error backwards through the network determining which paths have the greatest influence on the output. (Scarff 2021)

Backpropagation in more detail

Please refer to this
fantastic explanation of backpropagation on YouTube

general architecture

With all of this in mind… how to design your own neural network?