From Neural Networks to Deep Learning

Prof. Dr. Mirco Schoenfeld

Why successful today?

Why is deep learning successful today?

(Raza 2023; Pichler and Hartig 2023)

Moores law

(Roser, Ritchie, and Mathieu 2023)

Moores law

(Roser, Ritchie, and Mathieu 2023)

computational power

Computational power has reached huge capabilities and
specialized hardware has become available.

data availability

Deep learning handles vast amounts of data efficiently.

end-to-end learning

Characteristic end-to-end learning doesn’t require manual feature engineering.

Transfer learning

Pre-trained state-of-the-art models allow for transfer learning.

Neural Architecture Search

Neural Architecture Search proposes optimal model architectures and hyperparameter tuning automatically.

DL’s bright future

Deep learning will continue to shape the future of artificial intelligence.

behind the scenes

What’s behind the scenes?

Neural Nets

Deep Learning models are neural networks.

(Shukla 2019)

Neurons

Neural networks are modeled after neural cells.

Artificial Neurons

Artificial Neurons are the elementary
units of artificial neural networks.

Artificial Neurons

An artificial neuron is a function that receives one or more inputs, applies weights to these inputs and sums them to produce an output.

Artificial Neurons

Many artificial neurons together form a neural network.

the input

The input is one neuron per feature.

Here, it is a 28x28 pixel image. 28x28=784 input neurons.

the output

The number of output neurons depends on the number of predictions you want to make.

For regression, this can be one neuron.

For classification, this is one neuron per class.

the inner part

(Shukla 2019)

the inner part

Defining the number of hidden layers and number of
neurons per hidden layer is… pure magic?

Hyperparameters

There are even more hyperparameters…

Learning

Learning means adjusting the inner input weights after data processing and based on the amount of error in the output.

By the way, the GPT-3 model had 175 Billion inner weights.

loss function

Measuring the amount of error in the output is
the purpose of a loss function.

Ultimately, the entire goal of training is to minimize loss.

loss function

Two widely used loss functions:

Mean Squared Error (MSE): \(\frac{1}{n} \sum_{i=1}^n (Y_i - Ŷ_i)^2\)
Categorical Cross Entropy (Gomez 2018)

Learning Rate

Learning rate is a tuning parameter that
determines how quickly a model “learns”.

It influences to what extent new information
overrides old information.

Setting the learning rate

Setting the learning rate is a trade-off:
either converge too slow or miss important details…

https://x.com/Jeande_d/status/1459222900066586625

many parameters

These have been just a few of the available hyperparameters.

There are also

epochs
batch size
bias in neurons
activation functions
weight initialization
and many more…

many parameters

Funnily enough, studies even yield opposing recommendations in setting certain parameters.

(Karpathy 2019)

recent advancements

Recent advancements in the field even top it up a notch…

Kolmogorov-Arnold Networks (Liu et al. 2024)

Kolmogorov-Arnold Networks

They turn this (the good’ol multilayer perceptron)…

Kolmogorov-Arnold Networks

…into this

getting it to work

Is there a secret to successfully set up a neural network?

getting it to work

Most of the time it will train but silently work a bit worse. (Karpathy 2019)

getting it to work

Suffering is a perfectly natural part of getting a neural network to work well. (Karpathy 2019)

some important hightlights

Some more hightlights…just slightly older.

Backpropagation

Backpropagation takes a neural networks output error and propagates this error backwards through the network determining which paths have the greatest influence on the output. (Scarff 2021)

Backpropagation in more detail

Please refer to this
fantastic explanation of backpropagation on YouTube

general architecture

With all of this in mind… how to design your own neural network?

https://towardsdatascience.com/understanding-backpropagation-abcc509ca9d0

Neural Architecture Search

Luckily, there is Neural Architecture Search.

Neural Architecture Search

Neural architecture search (NAS), the process of automating the design of neural architectures for a given task, is an inevitable next step in automating machine learning and has already outpaced the best human-designed architectures on many tasks. (White et al. 2023)

Transfer Learning

Or, you use someone else’s model instead!

Transfer Learning

(Biggerj1 2024)

Transfer Learning

You need to be careful what base model you use!

…at the same time

initializing a network with transferred features from almost any number of layers can produce a boost to generalization (Yosinski et al. 2014)

Transfer Learning

https://huggingface.co/models

Recommended references

A strong recommendation for further “reading”:

https://www.youtube.com/watch?v=aircAruvnKk&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi

References

Biggerj1. 2024. “File:transfer Learning.svg — Wikimedia Commons, the Free Media Repository.” https://commons.wikimedia.org/w/index.php?title=File:Transfer_learning.svg&oldid=841017966.

Gomez, Raúl. 2018. “Understanding Categorical Cross-Entropy Loss, Binary Cross-Entropy Loss, Softmax Loss, Logistic Loss, Focal Loss and All Those Confusing Names.” https://gombru.github.io/2018/05/23/cross_entropy_loss/.

Karpathy, Andrej. 2019. “A Recipe for Training Neural Networks.” http://karpathy.github.io/2019/04/25/recipe/.

Liu, Ziming, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljačić, Thomas Y. Hou, and Max Tegmark. 2024. “KAN: Kolmogorov-Arnold Networks.” https://arxiv.org/abs/2404.19756.

Pichler, Maximilian, and Florian Hartig. 2023. “Machine Learning and Deep Learning – a Review for Ecologists.” Methods in Ecology and Evolution 14 (4): 994–1016. https://doi.org/https://doi.org/10.1111/2041-210X.14061.

Raza, Syed Hamed. 2023. “The Deep Learning Revolution – Unraveling the Success Story Compared to Traditional Machine Learning.” https://medium.com/@ai.mlresearcher/the-deep-learning-revolution-3bddc3b5e85e.

Roser, Max, Hannah Ritchie, and Edouard Mathieu. 2023. “What Is Moore’s Law?” Our World in Data.

Scarff, Brent. 2021. “Understanding Backpropagation.” https://towardsdatascience.com/understanding-backpropagation-abcc509ca9d0.

Shukla, Lavanya. 2019. “Designing Your Neural Networks.” https://towardsdatascience.com/designing-your-neural-networks-a5e4617027ed.

White, Colin, Mahmoud Safari, Rhea Sukthanker, Binxin Ru, Thomas Elsken, Arber Zela, Debadeepta Dey, and Frank Hutter. 2023. “Neural Architecture Search: Insights from 1000 Papers.” https://arxiv.org/abs/2301.08727.

Yosinski, Jason, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. “How Transferable Are Features in Deep Neural Networks?” https://arxiv.org/abs/1411.1792.