Practising ML: KNIME

Prof. Dr. Mirco Schoenfeld

Motivation

In this course, you shall learn to create your own ML pipeline.

Can you code?

But, I can’t code…!

KNIME

https://www.knime.com/

KNIME

KNIME (/naɪm/), the Konstanz Information Miner,
is a data analytics, reporting and integrating platform.

https://www.knime.com/
https://en.wikipedia.org/wiki/KNIME

Download KNIME

Download KNIME from https://www.knime.com/downloads

Choose either the Latest, or the LTS version.

Download KNIME

You don’t need to register.

Start KNIME

Now, please (install and) start KNIME

Install extensions

Before we can move on, we need to install some extensions.

Install extensions

Install extensions

Install extensions

Dragging the button on the KNIME window starts the installation.

Install extensions

Please install these extensions:

Useful ressources

The KNIME Hub contains many useful resources:

https://hub.knime.com/knime

Advanced usage

Or refer to these articles for advanced examples:

Getting started

Getting started

First, we need some data.

Getting started

We’ll be working with the mouse.csv data from

https://elki-project.github.io/datasets/

Can also be obtained from here

Why mouse, you ask?

Getting started

Getting started

Getting started

Getting started

Anyways…

Please .

Getting started

Getting started

Getting started

Getting started

Getting started

Getting started

Getting started

Getting started

Getting started

Getting started

Getting started

Getting started

Getting started

Getting started

Getting started

Getting started

Getting started

It’s your turn

Remove the noise.

  1. Inspect the mouse.csv to find the noise
  2. Design a workflow in KNIME to remove it from the scatter plot.

It’s your turn

How would you remove the noise in or ?

Beware

Attention, this task can be solved in two ways!

Choose Wisely

Choose wisely!

Visual Solution

Yay, KNIME!

Visual Solution: Basics

First, create a basic KNIME workflow.

It’s your turn

Create a workflow in KNIME to apply a k-means-clustering to the mouse dataset.

Visual Solution: Silhouettes

Next step is to obtain silhouette scores.

It’s your turn

Extend your KNIME-workflow to obtain silhouette scores for clusters.

Visual Solution: Optimal K

Now, we want to obtain the optimal number of clusters.

It’s your turn

Extend your KNIME-workflow to obtain the optimal number of clusters.

Programming Solution

Yay, Programming!

Requirements

First, check the requirements!

Do you have a python3 installation?

Have you downloaded and installed the required R packages?

windows tip

In case you face issues installing any package on Windows

You need to install RTools:
https://cran.rstudio.com/bin/windows/Rtools/

windows tip

Again, on Windows, please check

download.file("https://cran.rstudio.com/src/contrib/PACKAGES", "text.txt")

In download.file( […] )
‘SSL connect error’

If you see that message, enter:

options("download.file.method"="wininet")

Programming Solution: Basics

To begin, create a basic clustering script.

Choose a programming language of your choice.

The course offers a solution in R.

It’s your turn

  1. Download the task sheet
  2. Open the task sheet in RStudio
  3. Fill the gaps to apply a kmeans clustering
    If you want to read what a function (e.g. kmeans) does use ? to access its documentation (i.e. ?kmeans)

It’s your turn

The last command in the kmeans task was the caret::featurePlot.

What does it visualize?

Programming Solution: Silhouettes

Next, extend your script by the calculation of silhouette scores.

It’s your turn

  1. Either discover the relevant functions or download the next task sheet
  2. Open the task sheet in RStudio
  3. Fill the gaps to obtain silhouette scores for your kmeans cluster.

Programming Solution: Optimal K

Now, obtain the optimal number of clusters k.

It’s your turn

  1. Either discover the relevant functions or download the next task sheet
  2. Open the task sheet in RStudio
  3. Fill the gaps to obtain the optimal number of clusters.