Practising Unsupervised Learning: kmeans

Prof. Dr. Mirco Schoenfeld

Beware

Attention, this task can be solved in two ways!

Choose Wisely

Choose wisely!

Visual Solution

Yay, KNIME!

Visual Solution: Basics

First, create a basic KNIME workflow.

It’s your turn

Create a workflow in KNIME to apply a k-means-clustering to the mouse dataset.

Visual Solution: Silhouettes

Next step is to obtain silhouette scores.

It’s your turn

Extend your KNIME-workflow to obtain silhouette scores for clusters.

Visual Solution: Optimal K

Now, we want to obtain the optimal number of clusters.

It’s your turn

Extend your KNIME-workflow to obtain the optimal number of clusters.

Programming Solution

Yay, Programming!

Requirements

First, check the requirements!

Do you have a python3 installation?

Have you downloaded and installed the required R packages?

windows tip

In case you face issues installing any package on Windows

You need to install RTools:
https://cran.rstudio.com/bin/windows/Rtools/

windows tip

Again, on Windows, please check

download.file("https://cran.rstudio.com/src/contrib/PACKAGES", "text.txt")

In download.file( […] )
‘SSL connect error’

If you see that message, enter:

options("download.file.method"="wininet")

Programming Solution: Basics

To begin, create a basic clustering script.

Choose a programming language of your choice.

The course offers a solution in R.

It’s your turn

  1. Download the task sheet
  2. Open the task sheet in RStudio
  3. Fill the gaps to apply a kmeans clustering
    If you want to read what a function (e.g. kmeans) does use ? to access its documentation (i.e. ?kmeans)

It’s your turn

The last command in the kmeans task was the caret::featurePlot.

What does it visualize?

Programming Solution: Silhouettes

Next, extend your script by the calculation of silhouette scores.

It’s your turn

  1. Either discover the relevant functions or download the next task sheet
  2. Open the task sheet in RStudio
  3. Fill the gaps to obtain silhouette scores for your kmeans cluster.

Programming Solution: Optimal K

Now, obtain the optimal number of clusters k.

It’s your turn

  1. Either discover the relevant functions or download the next task sheet
  2. Open the task sheet in RStudio
  3. Fill the gaps to obtain the optimal number of clusters.
Back to Lecture Website