Searching and Sorting

Recap

Last session was about data types and data structures.

Recap

Some data types…:

Tantau (2016)

Recap

…and some data structures:

Operation	Arrays	Lists	Sets
Add item to beginning or end
Access first or last item
Access item by index
Store only unique values
Efficient test for membership

Today

Why should we care?

Motivation

Clever organization of your data will help you find things faster.

The search problem

What do we actually want to find?

Given an array containing some values,
we could search for a specific value.

The search problem

Given an array containing some values, we could search

a single value
all values
a single value with a certain characteristic
all values value with a certain characteristic

I/O

About the inputs and outputs:

We search for	Inputs	Outputs
a single value	array + single value	one array-position
all values	array + single value	all array-positions
a single value with a characteristic	array + characteristic	one array-position
all values with a characteristic	array + characteristic	all array-posistions

Approaches to searching

In general, there are two approaches to searching:

Linear Search

most basic approach
simply iterate over input
sequentially check each element
rarely practical

Binary Search

can be much faster than linear search
iteratively eliminates half ot the input
input list needs to be sorted
works only with random-access input

Linear Search: Implementation

Linear Search in pseudo code:

procedure linear_search (list, value)
  set i to 0
  set n to length of list
  while i < n
  do
    if (list at i) = value, terminate and return i
    increment i by 1
  done
  terminate unsuccessfully
end procedure

Linear Search: Runtime Complexity

Consider this example:

How many comparisons are needed to find

27: 5
4: 1
8: 9

Linear Search: Runtime Complexity

How many comparisons are needed

…in the best case?	\(O(1)\)
…in the worst case?	\(O(n)\)
…on average?	\(O(n)\)

Linear Search: Runtime Complexity

Regarding \(O(n)\) in the average case:

If each element occurs in the list once,

and each element is equally likely to be searched,

then the expected number of comparisons is

\[ \frac{ n+1 }{ 2 } \]

The O-class of that is \(O(n)\)

Linear Search: Summary

Summarizing linear search:

simple to implement
no requirements on input
only practical for small lists

Before Moving on: A Quick Survey

Towards binary search

Do you still know these things?

https://commons.wikimedia.org/wiki/File:Latin_dictionary.jpg

Towards binary search

Ancient storage of knowledge, usually kept on shelves

https://commons.wikimedia.org/wiki/File:Meyers_Lexikon_(1936%E2%80%931942).jpg

Towards binary search

Space complexity highly inefficient

https://commons.wikimedia.org/wiki/File:Museo_de_la_Catedral_de_Quito,_Quito,_Ecuador,_2015-07-22,_DD_94-96_HDR.JPG

Towards binary search

Some examples had a special organization of knowledge:

https://commons.wikimedia.org/wiki/File:Geographical_Webster%27s_Home_a._Office_Dictionary._ca_1900_(132840235).jpg

Towards binary search

Using linear search in dictionaries, would be highly inefficient.

What could be a better approach?

Binary search

Using binary search, we access the input in the center and
continue with only one of the two sides.

This only works on sorted inputs.

Binary search: Example

Search “8”!

7<8

13>8

This took 3 comparisons, instead of 6 with linear search.

Binary Search: Implementation

procedure binary_search(list, value)
  set lower_bound to 0
  set upper_bound to (length of list) - 1
  while (lower_bound != upper_bound)
  do
    // Obtain mid-point of interval
    set mid_point to (lower_bound + upper_bound) / 2
    if (list at mid_point) < value
    then
      // value is expected to be in the upper half
      update lower_bound to mid_point + 1
    else
      // value must be in lower half
      update upper_bound = mid_point;
    end if
  done
  return lower_bound;
end procedure

Binary Search: Runtime Complexity

Using a binary search, how many comparisons are required…

…in the best case?	\(O(1)\)
…in the worst case?	\(O( \log n)\)
…on average?	\(O( \log n)\)

Explanation for \(O( \log n)\) will follow when we talk about trees.

Binary Search: Requirements

Binary search is quite fast,
however, it has two big requirements:

random-access data structure
Binary search works with arrays but not with lists
input data needs to be sorted
Sorting is costly; it is only worth it if from more than approx. \(\log_2(n)\) search operations

Binary Search: approximate matching

Binary search also supports approximate matches!

Binary Search: approximate matching

Approximate matches could mean

predecessor,
successor,
nearest neighbor

Binary Search: approximate matching

Approximate matching in binary search:

This shows rank, predecessor, successor, and nearest neighbor for the target value, which is not in the array.

https://en.wikipedia.org/wiki/File:Approximate-binary-search.svg

Summary

Performance comparison of search algorithms

Performance	Linear Search	Binary Search
Best case	\(O(1)\)	\(O(1)\)
Worst case	\(O(n)\)	\(O(\log n)\)
Average	\(O(n)\)	\(O(\log n)\)

Summary

How do we notice the difference between \(O(n)\) and \(O(\log n)\)?

Summary

Let’s assume \(n = 100\) elements. Finding an element requires:

\(O(n)\): 100 operations
\(O(\log n)\): 10 operations

https://en.wikipedia.org/wiki/File:Comparison_computational_complexity.svg

Summary

…and with \(n = 10.000.000\) elements, a factor of 100.000?

\(O(n)\): 10.000.000 operations
\(O(\log n)\): \(~ 24\) operations

https://en.wikipedia.org/wiki/File:Comparison_computational_complexity.svg

Summary

Assume each operation takes 1 second to finish:

10.000.000 sec ~ 2777,78 hours ~ 115,74 days

Sorting

Why Sorting

The search problem

What does sorting actually mean?

Given an array of values, a sorting procedure yields an array containing the same values permutated such that each value is not higher than its successor.

Insertion Sort: Naive Example

Räcke (2019)

Insertion Sort: Naive Example

Räcke (2019)

The search problem

Beneficial characteristics of search algorithms:

Stable sort algorithms maintain relative order of elements with equal value
In-place sort algorithms do not require a copy of the input
Efficient algorithms require as few comparison operations as possible

Insertion Sort: Improved Example

A similar approach but without a second storage space

Cormen and Leiserson (2022)

Insertion Sort: Improved Example

A similar approach but without a second storage space

Räcke (2019)

Insertion Sort: Improved Example

Räcke (2019)

Insertion Sort

This algorithm is called Insertion Sort. It is

intuitive
easy to understand
in-place and stable
pretty quick with sorted and almost-sorted data
efficient for a small lists (<50 elements)

Cormen and Leiserson (2022)

Insertion Sort

However, Insertion Sort

needs many operations with randomized data
has complexity \(O(n^2)\)

Insertion Sort: Implementation

i ← 1
while i < length(A)
    x ← A[i]
    j ← i
    while j > 0 and A[j-1] > x
        A[j] ← A[j-1]
        j ← j - 1
    end while
    A[j] ← x
    i ← i + 1
end while

https://en.wikipedia.org/wiki/Insertion_sort#Algorithm

Insertion Sort: Complexity

Where does complexity \(O(n^2)\) come from?

We have a nested loop:

while i < length(A)
  ...
  while j > 0 and A[j-1] > x

Each element needs to be compared to the already sorted elements to find its position

The best case: the list is already sorted. This yields actually \(O(n)\)

The worst case: the list is already sorted but in descending order.

Other Sorting Algorithms

There are many other sorting algorithms:

bubble sort
selection sort
quicksort
heapsort
mergesort
…

Other Sorting Algorithms: Highlights

Two of the other algorithms are interesting:

Mergesort and quicksort.

Mergesort

Conceptually, merge sort works as follows:

Divide the unsorted list into n sub-lists, each containing one element (a list of one element is considered sorted).
Repeatedly merge sublists to produce new sorted sublists until there is only one sublist remaining. This will be the sorted list.

Mergesort

https://commons.wikimedia.org/wiki/File:Merge_sort_algorithm_diagram.svg

Mergesort

not in-place (requires additional memory)
but, it is stable
works with lists as well
can guarantee \(O(n \log n)\) (in average and worst cases)

Quicksort

Quicksort works in-place and
is faster than Mergesort (typically by factor 2).

It’s average performance is also \(O(n \log n)\),
but the worst-case degrades to \(O(n^2)\). And it is not stable.

Quicksort

Quicksort uses a pivot element.

It partitions the other elements such that smaller elements are left to the pivot, and larger elements are right to the pivot element.

https://commons.wikimedia.org/wiki/File:Sorting_quicksort_anim.gif

Quicksort

Much depends on the pivot element.

Ideally, it is the median, however, that can not be known initially. Different strategies propose to use

the middle element
a random element
the result of a pivot-search
…

Summary

Performance comparison of sort algorithms

Performance	Insertion Sort	Mergesort	Quicksort
Best case	\(O(n)\)	\(O(n \log n)\)	\(O(n \log n)\)
Worst case	\(O(n^2)\)	\(O(n \log n)\)	\(O(n^2)\)
Average	\(O(n^2)\)	\(O(n \log n)\)	\(O(n \log n)\)

Thanks

mirco.schoenfeld@uni-bayreuth.de

https://xkcd.com/1185/

Acknowledgement

Acknowledgement:

This script is inspired by Tantau (2016) and Räcke (2019).

References

Cormen, Thomas H, and Charles E Leiserson. 2022. Introduction to Algorithms, Fourth Edition. London, England: MIT Press.

Räcke, Harald. 2019. “Lecture Notes to the Lecture Algorithmen und Datenstrukturen.” TUM. https://wwwalbers.in.tum.de/lehre/2019SS/ad/index.html.de.

Tantau, Till. 2016. Einführung in Die Informatik. Online. https://www.tcs.uni-luebeck.de/de/mitarbeiter/tantau/lehre/lectures/EinfuehrungindieInformatik-2016.pdf.

Searching and Sorting

Prof. Dr. Mirco Schoenfeld

Recap

Recap

Recap

Today

Motivation

The search problem

The search problem

I/O

Approaches to searching

Linear Search: Implementation

Linear Search: Runtime Complexity

Linear Search: Runtime Complexity

Linear Search: Runtime Complexity

Linear Search: Summary

Before Moving on: A Quick Survey

Towards binary search

Towards binary search

Towards binary search

Towards binary search

Towards binary search

Binary search

Binary search: Example

Binary Search: Implementation

Binary Search: Runtime Complexity

Binary Search: Requirements

Binary Search: approximate matching

Binary Search: approximate matching

Binary Search: approximate matching

Summary

Summary

Summary

Summary

Summary

Sorting

Why Sorting

The search problem

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

The search problem

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example