Data Structures

Prof. Dr. Mirco Schoenfeld

Recap

Last week was about complexity and big-O-notation.

Räcke (2021)

Today

Before we can move on with complexity,
we need to consider data structures.

What is data?

Ultimately, we want to organize and store arbitrary data.

This goes well beyond bits and bytes
and focuses on higher-level data.

What is data?

When we think about data, we find that we can classify it easily. Examples are

numbers
texts
images
files
…

These classes are called data types
and they specify a set of possible values.

Data Types

Purpose of data types:

constraint possible values of data
for humans: bring order to a program
for compilers: reserve the right amount of memory for data
for both: spot errors

Data Types

On the lowest level, we have primitive or basic data types.

Non-primitive, composite, or compound data types are higher-level types which consist of or build upon primitive ones.

Data Types

Some primitive types:

Some non-primitive types:

Data Types

Some example data types in Java

Tantau (2016)

Data Types

Based on primitive data types, we can define more complex ones.

A date may consist of several integers and specific characters:

2024-12-24

Type Safety

Why do we care so much about data types?
They constrain possible values of expressions expressions:

x + y
2 * (i + 1)
pow(x,2)

In statically typed programming languages (STL), every value, every expression, and every variable has a type.

Type Safety

Declaring variables in Java requires specifying their type:

int x = 5;
int y = 6;
char c = 'a';
double i = 3.14;
boolean switch = true;

Type Safety

That helps deducing types of expressions:

int x = 5;
int y = 6;
x + y

The type of x+y is int.

Type Safety

int x = 5;
double i = 3.14;

2 * x
2.0 * x
2 * (i+1)

The type of 2*x is int, the type of 2.0*x is double

The type of 2*(i+1) is double.

Type Safety

In STL, each expression has a type - even partial expressions:

Tantau (2016)

Type Safety

In STL, each expression has a type - even partial expressions:

Tantau (2016)

Type Errors

Data types help us spot errors in program code.

There are two types of program errors:

runtime errors

which may occur while
the program is running

and compile time errors

which may occur while
compiling code.

Type Errors

Type errors lead to compile time errors, e.g. by

assigning a value to a variable with a wrong type
using a type other than boolean in a condition
misusing an operator

Type Errors

Detecting type errors is a huge help in big software projects!

Hence, type safety of statically typed languages is desired.

Type Errors in Python

What about Python?

i = 0
i = "bob"

Type Errors in Python

Python is both strongly typed and dynamically typed:

Strongly typed because the type of a value doesn’t change unexpectedly.

Conversions are explicit and you can’t get around the type system.

Dynamically typed because runtime objects (i.e. values) have a type.

This is different to static typing where variables have a type.

From Types to Data Structures

Usually, we deal with more than atomic values.

We are more interested in collections of items
that are finite in number and with or without a particular order.

From Types to Data Structures

In mathematics, such collections are tuples or finite sequences.

In computer science, these correspond to abstract lists.

Abstract Lists

Implementations of abstract lists differ in these characteristics:

Operation
Create
Test for empty
Add item to beginning or end
Access first or last item
Access item by index
Store only unique values
Efficient test for membership

Abstract Lists

Implementations of abstract lists are

arrays
lists

Array

Arrays are data structures to store a collection of elements.

one of the oldest and most important data structure
exists in almost every programming language
used to implement a lot of other data structures

Array

All elements have the same memory size.

The array takes up a consecutive space in memory.

Array

Each element is identified by an array index.

Accessing elements requires a single subscript.

Array

Indices are restricted to a consecutive range of integers:

\(B + c \times i\)

Array

Indices are restricted to a consecutive range of integers:

\(B + c \times i\)

Array variables

Array variables are variables of an array type,
so arrays are also data types.

An example in Java (Tantau 2016)

int a;
int[] b = {3,4,5};
int[] c = {1,2};

Array variables

Array variables usually only contain a pointer
to the beginning of an array.

int a;
int[] b = {3,4,5};
int[] c;            // Situation 1
c = new int[2];     // Situation 2

Array variables

That might lead to side effects for
assignments and comparisons of arrays!

int a;
int[] b = {3,4,5};
int[] c = {1,2};            // Situation 1
c = b;              // Situation 2

Array

The amount of allocated memory space of an array is only known
for an instance of an array, not for the type!

int a;
int[] b = {3,4,5};  // this size is known
int[] c;
c = new int[2];     // and this size is known

Array

Arrays are also used to model a notion of matrices still only consuming consecutive space in memory

\(m = \begin{bmatrix}1 & 2 & 3\\4 & 5 & 6\\7 & 8 & 9\end{bmatrix}\)

Array

\(m = \begin{bmatrix}1 & 2 & 3\\4 & 5 & 6\\7 & 8 & 9\end{bmatrix}\)

in row-major:

column-major:

Array

What is the issue of arrays taking up consecutive space in memory?

The size of an array is fixed.

Array

The size of an array is fixed.

Operations that alter the size of an array are costly,
e.g. insertion and deletion of new elements.

Sometimes, the whole array has to be copied
to a corresponding space in memory.

Lists

To overcome the requirement of consecutive space in memory for linear collection of data, we can use linked lists.

Lists

A linked list is a sequence of nodes that contains two fields:

data (an integer value here as an example) and
a link to the next node.

The last node is linked to NULL used to signify the end of the list.

Lists

Other variants of lists exist, like the circular linked list:

Or the doubly linked list:

Lists

Lists are often used to implement structures like

stacks or
queues

Sets

When order doesn’t matter but membership, use sets.

Sets

Sets store unique values without particular order.

Sets are usually used to test for membership.

Sets

A famous poem stored in an array:

…and stored in a set:

Sets

Sets may support fundamental operations from algebra of sets to create new sets from given sets

Comparison of data structures

What we have considered so far:

Arrays
Lists
Sets

Comparison of data structures

Operation	Arrays	Lists	Sets
Create
Test for empty
Add item to beginning or end
Access first or last item
Access item by index
Store only unique values
Efficient test for membership

Thanks

mirco.schoenfeld@uni-bayreuth.de

https://xkcd.com/2483/

Acknowledgement

Acknowledgement:

This script is inspired by Tantau (2016) and Räcke (2021).

References

Räcke, Harald. 2021. “Lecture Notes to the Lecture Intro to Linear Programming.” TUM. http://www14.in.tum.de/lehre/2021WS/ea/.

Tantau, Till. 2016. Einführung in Die Informatik. Online. https://www.tcs.uni-luebeck.de/de/mitarbeiter/tantau/lehre/lectures/EinfuehrungindieInformatik-2016.pdf.