In this course, you shall learn to create your own ML pipeline.
But, I can’t code…!
KNIME (/naɪm/), the Konstanz Information Miner,
is a data
analytics, reporting and integrating platform.
Download KNIME from https://www.knime.com/downloads
Choose either the Latest, or the LTS version.
You don’t need to register.
Now, please (install and) start KNIME
Before we can move on, we need to install some extensions.
Dragging the button on the KNIME window starts the installation.
Please install these extensions:
The KNIME Hub contains many useful resources:
Or refer to these articles for advanced examples:
First, we need some data.
We’ll be working with the mouse.csv data from
https://elki-project.github.io/datasets/
Can also be obtained from here
Why mouse, you ask?
Anyways…
Please .
Remove the noise.
mouse.csv to find the noiseHow would you remove the noise in or ?
Attention, this task can be solved in two ways!
Choose wisely!
Yay, KNIME!
First, create a basic KNIME workflow.
Create a workflow in KNIME to apply a k-means-clustering
to the mouse dataset.
Next step is to obtain silhouette scores.
Extend your KNIME-workflow to obtain silhouette scores for clusters.
Now, we want to obtain the optimal number of clusters.
Extend your KNIME-workflow to obtain the optimal number of clusters.
Yay, Programming!
First, check the requirements!
Do you have a python3
installation?
In case you face issues installing any package on Windows
You need to install RTools:
https://cran.rstudio.com/bin/windows/Rtools/
Again, on Windows, please check
download.file("https://cran.rstudio.com/src/contrib/PACKAGES", "text.txt")
In download.file( […] )
‘SSL connect error’
If you see that message, enter:
options("download.file.method"="wininet")
To begin, create a basic clustering script.
Choose a programming language of your choice.
The course offers a solution in R.
kmeans clusteringkmeans) does use ? to access its
documentation (i.e. ?kmeans)The last command in the kmeans task was the
caret::featurePlot.
What does it visualize?
Next, extend your script by the calculation of silhouette scores.
kmeans cluster.Now, obtain the optimal number of clusters k.