Table of contents

🎓Intended Learning Outcomes

At the end of this lesson, you should be able to:

Why consider clustering?

In lesson 5.1 and 5.2 we discussed different ways in which we can measure similarities between individual datapoints $x_i, x_j\in\mathcal{X}$ in terms of their metric distance $d(x_i, x_j)$. However, we want to go further than that and be able to identify distinct groups of elements $\mathcal{X}$ that naturally belong together in some sense without being given explicit class labels about them.

<aside> 💡 In clustering problems, we are given a dataset without any explicit labels about group membership and want to instead discover natural groups or “clusters” in the dataset based on a given distance measure. Clustering is one of the most fundamental examples of unsupervised machine learning.

</aside>

Sample Applications:

Remember Lecture ‣? We in fact already started to discuss clustering then and looked at an example of grouping different clothing items together with an algorithm called k-medoids to cluster these objects:

Recall our discussion on clustering in Lesson 3.6 using k-medoids?

Recall our discussion on clustering in Lesson 3.6 using k-medoids?

A formal definition of what we mean by clustering