Practising Unsupervised Learning: DBScan

Prof. Dr. Mirco Schoenfeld


DBScan stands for

Density-based spatial clustering of applications with noise

(Ester et al. 1996)


DBScan is a density-based clustering.

It groups together points with many nearby neighbors.


In 2014, the algorithm was awarded the test of time award.

It is one of the most common clustering algorithms.

(Schubert et al. 2017)

abstract formulation

The algorithm (abstracted):

  1. Find the points in a fixed neighborhood of every point, and identify the core points with more than minPts neighbors.
  2. Find the connected components of core points on the neighbor graph, ignoring all non-core points.
  3. Assign each non-core point to a nearby cluster if the cluster is a neighbor, otherwise assign it to noise.




DBScan has a few advantages:

  • it does not require the number of clusters a priori
  • it can find arbitrarily-shaped clusters
  • it has a notion of noise and is robust to outliers
  • configurable by domain experts


DBScan has also a few disadvantages:

  • it is not entirely deterministic
  • its quality largely depends on the distance measure
  • it struggles to cluster datasets with large differences in densities
  • if data and scale are not well understood it’s difficult to set up

It’s your turn

  1. Download the task sheet
  2. Open the task sheet in RStudio
  3. Fill the gaps to apply a dbscan clustering
    If you want to read what a function (e.g. dbscan) does use ? to access its documentation (i.e. ?dbscan)


