Connecting the dots:
Data Modeling

Prof. Dr. Mirco Schoenfeld

Motivation

https://www.xkcd.com/2209/

What is data modeling?

What is data modeling?

And are you doing it already?

Why should we care?

Why should we care?

Common misunderstanding

Common misunderstanding

Data Modeling

The process of data modeling describes the translation
between real world observations and their digital representatives.

Models

  1. A model is a representation of an original
  2. A model has only a subset of characteristics of the original
  3. A model’s purpose is to replace the original under certain conditions

Example

Example

Book Scan

Representation of the text

Representation of the text

Bag of Words

„Von dem wer eine hübsche Farbe will haben Betonica Trink Wein ab Betonien so wird dir eine gute Farbe spricht Plinius Wer sie bei ihm trage dem mag keine Zauberei schaden“

ab, bei, betonica, betonien, dem, dem, dir, eine, eine, farbe, farbe, gute, haben, hübsche, ihm, keine, mag, plinius, schaden, sie, so, spricht, trage, trink, von, wein, wer, wer, will, wird, zauberei

Models change their roles

ab, bei, betonica, betonien, dem, dir, eine, farbe, gute, haben, hübsche, ihm, keine, mag, plinius, schaden, sie, so, spricht, trage, trink, von, wein, wer, will, wird, zauberei

Models change their roles

975
unique tokens

Data Modeling

Data Modeling

Modeling means making assumptions

Modeling requires making assumptions about reality.

Making assumptions leads to bias built into algorithms.

Modeling creates reality

Modeling means creating reality to some extent.

Modeling is more than improving accuracy

Data modeling is more than
improving accuracy & optimizing the model

Modeling is more than improving accuracy

Technical design decisions can have
implications for people’s every day lifes

The issue

The computer will do whatever you ask it to do, but the outcome depends on what you ask.

https://fortune.com/education/articles/what-zillows-failed-algorithm-means-for-the-future-of-data-science/

What is data modeling?

Are you doing data modeling already?

Back to Seminar