Practising Unsupervised Learning:
hierarchical agglomerative clustering

Prof. Dr. Mirco Schoenfeld

Beware

Attention, this task can be solved in two ways!

Choose Wisely

Choose wisely!

Visual Solution

Yay, KNIME!

Visual Solution: Basics

First, create a basic KNIME workflow for hierarchical clustering.

It’s your turn

  1. Create a workflow in KNIME to apply a hierarchical agglomerative clustering to the mouse dataset.
  2. Visualize a dendrogram.

Visual Solution: Silhouettes

Next step is to obtain silhouette scores.

It’s your turn

  1. Extend your KNIME-workflow to obtain silhouette scores for clusters.
  2. Determine the optimal number of clusters.

Visual Solution: Linkage strategies

Now, modify your workflow to explore the effect of different linkage strategies.

It’s your turn

What are the effects of different linkage strategies?

Which strategy yields best results? And why?

Programming Solution

Yay, Programming!

Programming Solution: Basics

First, create a basic script for hierarchical clustering.

Again, you can use the programming language of your choice.

A solution will be provided in R.

It’s your turn

  1. Download the task sheet
  2. Open the task sheet in RStudio
  3. Fill the gaps to apply and visualize a hierarchical agglomerative clustering
    If you want to read what a function (e.g. hclust) does use ? to access its documentation (i.e. ?hclust)

Programming Solution: Silhouettes

Now, extend your script to obtain silhouette scores and determine the optimal number of clusters.

It’s your turn

  1. Either discover the relevant functions on your own or download the next task sheet
  2. Open the task sheet in RStudio
  3. Fill the gaps to obtain silhouette scores and determine the optimal number of clusters.

Programming Solution: Linkage strategies

Now, modify your script to explore the effect of different linkage strategies.

It’s your turn

What are the effects of different linkage strategies?

Which strategy yields best results? And why?

Back to Lecture Website