Last week was about complexity and big-O-notation.
Räcke (2021)
Before we can move on with complexity,
we need to consider data
structures.
Ultimately, we want to organize and store arbitrary data.
This goes well beyond bits and bytes
and focuses on higher-level
data.
When we think about data, we find that we can classify it easily. Examples are
These classes are called data types
and
they specify a set of possible values.
Purpose of data types:
On the lowest level, we have primitive or basic data types.
Non-primitive, composite, or compound data types are higher-level types which consist of or build upon primitive ones.
Some primitive types:
Some example data types in Java
Tantau (2016)
Based on primitive data types, we can define more complex ones.
A date may consist of several integers and specific characters:
2024-12-24
Why do we care so much about data types?
They constrain possible
values of expressions expressions:
x + y
2 * (i + 1)
pow(x,2)
In statically typed programming languages (STL), every value, every expression, and every variable has a type.
Declaring variables in Java requires specifying their type:
That helps deducing types of expressions:
The type of x+y
is int
.
The type of 2*x
is int
, the type of
2.0*x
is double
The type of 2*(i+1)
is double
.
In STL, each expression has a type - even partial expressions:
Tantau (2016)
In STL, each expression has a type - even partial expressions:
Tantau (2016)
Data types help us spot errors in program code.
There are two types of program errors:
runtime errors
which may occur while
the program is running
and compile time errors
which may occur while
compiling code.
Type errors lead to compile time errors, e.g. by
Detecting type errors is a huge help in big software projects!
Hence, type safety of statically typed languages is desired.
What about Python?
i = 0
i = "bob"
Python is both strongly typed and dynamically typed:
Strongly typed because the type of a value doesn’t change unexpectedly.
Conversions are explicit and you can’t get around the type system.
Dynamically typed because runtime objects (i.e. values) have a type.
This is different to static typing where variables have a type.
Usually, we deal with more than atomic values.
We are more interested in collections of items
that are finite
in number and with or without a particular order.
In mathematics, such collections are tuples or finite sequences.
In computer science, these correspond to abstract lists.
Implementations of abstract lists differ in these characteristics:
Operation |
---|
Create |
Test for empty |
Add item to beginning or end |
Access first or last item |
Access item by index |
Store only unique values |
Efficient test for membership |
Implementations of abstract lists are
Arrays are data structures to store a collection of elements.
All elements have the same memory size.
The array takes up a consecutive space in memory.
Each element is identified by an array index.
Accessing elements requires a single subscript.
Indices are restricted to a consecutive range of integers:
\(B + c \times i\)
Indices are restricted to a consecutive range of integers:
\(B + c \times i\)
Array variables are variables of an array
type,
so arrays are also data types.
An example in Java (Tantau 2016)
Array variables usually only contain a pointer
to the
beginning of an array.
That might lead to side effects for
assignments and comparisons
of arrays!
The amount of allocated memory space of an array is only
known
for an instance of an array, not for the
type!
Arrays are also used to model a notion of matrices still only consuming consecutive space in memory
\(m = \begin{bmatrix}1 & 2 & 3\\4 & 5 & 6\\7 & 8 & 9\end{bmatrix}\)
\(m = \begin{bmatrix}1 & 2 & 3\\4 & 5 & 6\\7 & 8 & 9\end{bmatrix}\)
in row-major:
column-major:
What is the issue of arrays taking up consecutive space in memory?
The size of an array is fixed.
The size of an array is fixed.
Operations that alter the size of an array are
costly,
e.g. insertion and deletion of new elements.
Sometimes, the whole array has to be copied
to a corresponding
space in memory.
To overcome the requirement of consecutive space in memory for linear collection of data, we can use linked lists.
A linked list is a sequence of nodes that contains two fields:
The last node is linked to NULL used to signify the end of the list.
Other variants of lists exist, like the circular linked list:
Or the doubly linked list:
Lists are often used to implement structures like
More on that in the next practical.
When order doesn’t matter but membership, use sets.
Sets store unique values without particular order.
Sets are usually used to test for membership.
A famous poem stored in an array:
…and stored in a set:
Sets may support fundamental operations from algebra of sets to create new sets from given sets
What we have considered so far:
Operation | Arrays | Lists | Sets |
---|---|---|---|
Create | |||
Test for empty | |||
Add item to beginning or end | |||
Access first or last item | |||
Access item by index | |||
Store only unique values | |||
Efficient test for membership |
Acknowledgement:
This script is inspired by Tantau (2016) and Räcke (2021).