At the end of this lesson, the student is expected to:
In this lesson, we discuss nonparametric methods for density estimation, which are methods that do not require you to assume a particular parametric family for the density that you estimate. These are examples of a broader class of nonparametric methods in general, which can be both probabilistic and non-probabilistic, and be used in machine learning, statistics, and probability.
Suppose that you are a professor teaching a new machine-learning course at KTH, where the grading is composed of timed quizzes and coding assignments. Since the course is given for the first time, you are wondering whether you managed to strike a good balance in the first coding assignment. Let’s investigate!
At first, we have no idea what the distribution of these scores looks like. If the lecture notes sufficiently prepared students regardless of their backgrounds and the assessment tells students apart in a meaningful way, then the score distribution should be wide, with a single mode (i.e., peak). However, if (say) only students with a strong programming background could keep up, then they might form a peak at the higher end, distinct from the other students.
We might be tempted to fit a Gaussian to this data, or perhaps a simple GMM, but it is not clear if such an approach will be a able to offer good description of the shape of the underlying distribution that the data came from.
Instead of assuming a specific parametric form for our model, we turn to nonparametric probabilistic approaches, which are designed to (at least in theory) be able to adapt to any possible shape that the data distribution may have. The nonparametric methods we will cover are typically used for explorative analysis and visualisation, rather than solving complex density-estimation problems.