7.3 Mixture models and the EM algorithm

🎓 Intended learning outcomes

At the end of this lesson, students are expected to:

Know and understand the definition of mixture models
Be able to model complex datasets – characterised by multiple modes – using mixtures of distributions
Demonstrate an in-depth understanding of Gaussian mixture models
Motivate and explain the expectation-maximisation algorithm as a tool for training probabilistic models that contain latent variables
Be aware of practical considerations when working with implementations of expectation-maximisation, including initialisation, termination and numerical issues
Understand the main advantages and disadvantages of mixture models and know when to use them

In this lesson we will describe a way to build flexible, arbitrarily complex distributions from simple components. We will also cover how to estimate the parameters of the resulting models, and some ways that these models can be used in machine learning, including with other machine-learning methods.

Limitations of simple models

So far in this module, we have only discussed simple distributions and toy problems such as coin tosses. In practice, we are interested in more complex phenomena which cannot be modelled with these distributions. For an example of a real dataset that’s a bit more complicated than those we have covered thus far, we will consider the Old Faithful dataset, which is a classic 2D dataset used in machine learning. It contains information about the timings and durations of the eruptions of a geyser with the same name in Yellowstone National Park. Here’s a scatterplot of what that dataset looks like:

Untitled

None of the distributions we've seen so far fit the pattern seen in this dataset. If we were to approximate this dataset with a Gaussian (which often is what we try first for continuous-valued data), the resulting Gaussian density would either ignore one of the two groups, or place too much probability mass in the middle, neither of which is desirable.

Untitled

Clearly, we need a more flexible model than what we have seen thus far in order to describe this dataset well. The point of this lesson is to show how we can build such flexible models from the simple models we already know.

Table of contents

🎓 Intended learning outcomes

Limitations of simple models