Searching and Sorting

Prof. Dr. Mirco Schoenfeld

Recap

Last session was about data types and data structures.

Recap

Some data types…:

Tantau (2016)

Recap

…and some data structures:

Operation Arrays Lists Sets
Add item to beginning or end
Access first or last item
Access item by index
Store only unique values
Efficient test for membership

Today

Why should we care?

Motivation

Clever organization of your data will help you find things faster.

The search problem

What do we actually want to find?

Given an array containing some values,
we could search for a specific value.

The search problem

Given an array containing some values, we could search

  • a single value
  • all values
  • a single value with a certain characteristic
  • all values value with a certain characteristic

I/O

About the inputs and outputs:

We search for Inputs Outputs
a single value array + single value one array-position
all values array + single value all array-positions
a single value with a characteristic array + characteristic one array-position
all values with a characteristic array + characteristic all array-posistions

Approaches to searching

In general, there are two approaches to searching:

Linear Search

  • most basic approach
  • simply iterate over input
  • sequentially check each element
  • rarely practical

Binary Search

  • can be much faster than linear search
  • iteratively eliminates half ot the input
  • input list needs to be sorted
  • works only with random-access input

Linear Search: Implementation

Linear Search in pseudo code:

procedure linear_search (list, value)
  set i to 0
  set n to length of list
  while i < n
  do
    if (list at i) = value, terminate and return i
    increment i by 1
  done
  terminate unsuccessfully
end procedure

Linear Search: Runtime Complexity

Consider this example:

How many comparisons are needed to find

  • 27: 5
  • 4: 1
  • 8: 9

Linear Search: Runtime Complexity

How many comparisons are needed

…in the best case? \(O(1)\)
…in the worst case? \(O(n)\)
…on average? \(O(n)\)

Linear Search: Runtime Complexity

Regarding \(O(n)\) in the average case:

If each element occurs in the list once,
and each element is equally likely to be searched,
then the expected number of comparisons is

\[ \frac{ n+1 }{ 2 } \]

The O-class of that is \(O(n)\)

Linear Search: Summary

Summarizing linear search:

  • simple to implement
  • no requirements on input
  • only practical for small lists

Before Moving on: A Quick Survey

Towards binary search

Ancient storage of knowledge, usually kept on shelves

Towards binary search

Space complexity highly inefficient

Towards binary search

Some examples had a special organization of knowledge:

https://commons.wikimedia.org/wiki/File:Geographical_Webster%27s_Home_a._Office_Dictionary._ca_1900_(132840235).jpg

Towards binary search

Using linear search in dictionaries, would be highly inefficient.

What could be a better approach?

Binary search: Example

Search “8”!

7<8
13>8

This took 3 comparisons, instead of 6 with linear search.

Binary Search: Implementation

procedure binary_search(list, value)
  set lower_bound to 0
  set upper_bound to (length of list) - 1
  while (lower_bound != upper_bound)
  do
    // Obtain mid-point of interval
    set mid_point to (lower_bound + upper_bound) / 2
    if (list at mid_point) < value
    then
      // value is expected to be in the upper half
      update lower_bound to mid_point + 1
    else
      // value must be in lower half
      update upper_bound = mid_point;
    end if
  done
  return lower_bound;
end procedure

Binary Search: Runtime Complexity

Using a binary search, how many comparisons are required…

…in the best case? \(O(1)\)
…in the worst case? \(O( \log n)\)
…on average? \(O( \log n)\)

Explanation for \(O( \log n)\) will follow when we talk about trees.

Binary Search: Requirements

Binary search is quite fast,
however, it has two big requirements:

  1. random-access data structure
    Binary search works with arrays but not with lists
  2. input data needs to be sorted
    Sorting is costly; it is only worth it if from more than approx. \(\log_2(n)\) search operations

Binary Search: approximate matching

Binary search also supports approximate matches!

Binary Search: approximate matching

Approximate matches could mean

  • predecessor,
  • successor,
  • nearest neighbor

Binary Search: approximate matching

Approximate matching in binary search:

This shows rank, predecessor, successor, and nearest neighbor for the target value, which is not in the array.

https://en.wikipedia.org/wiki/File:Approximate-binary-search.svg

Summary

Performance comparison of search algorithms

Performance Linear Search Binary Search
Best case \(O(1)\) \(O(1)\)
Worst case \(O(n)\) \(O(\log n)\)
Average \(O(n)\) \(O(\log n)\)

Summary

How do we notice the difference between \(O(n)\) and \(O(\log n)\)?

Summary

Let’s assume \(n = 100\) elements. Finding an element requires:

  • \(O(n)\): 100 operations
  • \(O(\log n)\): 10 operations

Summary

…and with \(n = 10.000.000\) elements, a factor of 100.000?

  • \(O(n)\): 10.000.000 operations
  • \(O(\log n)\): \(~ 24\) operations

Summary

Assume each operation takes 1 second to finish:

10.000.000 sec ~ 2777,78 hours ~ 115,74 days

Sorting

Why Sorting

The search problem

What does sorting actually mean?

Given an array of values, a sorting procedure yields an array containing the same values permutated such that each value is not higher than its successor.

Insertion Sort: Naive Example

Räcke (2019)

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Insertion Sort: Naive Example

Räcke (2019)

The search problem

Beneficial characteristics of search algorithms:

  • Stable sort algorithms maintain relative order of elements with equal value
  • In-place sort algorithms do not require a copy of the input
  • Efficient algorithms require as few comparison operations as possible

Insertion Sort: Improved Example

A similar approach but without a second storage space

Cormen and Leiserson (2022)

Insertion Sort: Improved Example

A similar approach but without a second storage space

Räcke (2019)

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Insertion Sort: Improved Example

Räcke (2019)

Insertion Sort

This algorithm is called Insertion Sort. It is

  • intuitive
  • easy to understand
  • in-place and stable
  • pretty quick with sorted and almost-sorted data
  • efficient for a small lists (<50 elements)

Cormen and Leiserson (2022)

Insertion Sort

However, Insertion Sort

  • needs many operations with randomized data
  • has complexity \(O(n^2)\)

Insertion Sort: Implementation

i ← 1
while i < length(A)
    x ← A[i]
    j ← i
    while j > 0 and A[j-1] > x
        A[j] ← A[j-1]
        j ← j - 1
    end while
    A[j] ← x
    i ← i + 1
end while

https://en.wikipedia.org/wiki/Insertion_sort#Algorithm

Insertion Sort: Complexity

Where does complexity \(O(n^2)\) come from?

We have a nested loop:

while i < length(A)
  ...
  while j > 0 and A[j-1] > x

Each element needs to be compared to the already sorted elements to find its position

The best case: the list is already sorted. This yields actually \(O(n)\)

The worst case: the list is already sorted but in descending order.

Other Sorting Algorithms

There are many other sorting algorithms:

  • bubble sort
  • selection sort
  • quicksort
  • heapsort
  • mergesort

Other Sorting Algorithms: Highlights

Two of the other algorithms are interesting:

Mergesort and quicksort.

Mergesort

Conceptually, merge sort works as follows:

  1. Divide the unsorted list into n sub-lists, each containing one element (a list of one element is considered sorted).
  2. Repeatedly merge sublists to produce new sorted sublists until there is only one sublist remaining. This will be the sorted list.

Mergesort

https://commons.wikimedia.org/wiki/File:Merge_sort_algorithm_diagram.svg

Mergesort

Mergesort

  • not in-place (requires additional memory)
  • but, it is stable
  • works with lists as well
  • can guarantee \(O(n \log n)\) (in average and worst cases)

Quicksort

Quicksort works in-place and
is faster than Mergesort (typically by factor 2).

It’s average performance is also \(O(n \log n)\),
but the worst-case degrades to \(O(n^2)\). And it is not stable.

Quicksort

Quicksort uses a pivot element.

It partitions the other elements such that smaller elements are left to the pivot, and larger elements are right to the pivot element.

https://commons.wikimedia.org/wiki/File:Sorting_quicksort_anim.gif

Quicksort

Much depends on the pivot element.

Ideally, it is the median, however, that can not be known initially. Different strategies propose to use

  • the middle element
  • a random element
  • the result of a pivot-search

Summary

Performance comparison of sort algorithms

Performance Insertion Sort Mergesort Quicksort
Best case \(O(n)\) \(O(n \log n)\) \(O(n \log n)\)
Worst case \(O(n^2)\) \(O(n \log n)\) \(O(n^2)\)
Average \(O(n^2)\) \(O(n \log n)\) \(O(n \log n)\)

Thanks

https://xkcd.com/1185/

Acknowledgement

Acknowledgement:

This script is inspired by Tantau (2016) and Räcke (2019).

References

Cormen, Thomas H, and Charles E Leiserson. 2022. Introduction to Algorithms, Fourth Edition. London, England: MIT Press.
Räcke, Harald. 2019. “Lecture Notes to the Lecture Algorithmen und Datenstrukturen.” TUM. https://wwwalbers.in.tum.de/lehre/2019SS/ad/index.html.de.
Tantau, Till. 2016. Einführung in Die Informatik. Online. https://www.tcs.uni-luebeck.de/de/mitarbeiter/tantau/lehre/lectures/EinfuehrungindieInformatik-2016.pdf.
Back to Lecture Website