At the end of this lesson, students are expected to:
In this lesson we will describe a way to build flexible, arbitrarily complex distributions from simple components. We will also cover how to estimate the parameters of the resulting models, and some ways that these models can be used in machine learning, including with other machine-learning methods.
So far in this module, we have only discussed simple distributions and toy problems such as coin tosses. In practice, we are interested in more complex phenomena which cannot be modelled with these distributions. For an example of a real dataset that’s a bit more complicated than those we have covered thus far, we will consider the Old Faithful dataset, which is a classic 2D dataset used in machine learning. It contains information about the timings and durations of the eruptions of a geyser with the same name in Yellowstone National Park. Here’s a scatterplot of what that dataset looks like:
None of the distributions we've seen so far fit the pattern seen in this dataset. If we were to approximate this dataset with a Gaussian (which often is what we try first for continuous-valued data), the resulting Gaussian density would either ignore one of the two groups, or place too much probability mass in the middle, neither of which is desirable.
Clearly, we need a more flexible model than what we have seen thus far in order to describe this dataset well. The point of this lesson is to show how we can build such flexible models from the simple models we already know.