Table of contents
🎓 Intended learning outcomes
At the end of this lesson, the student is expected to:
- Be able to define the fundamental concepts of information theory, including the information content, the entropy, the cross entropy and the relative entropy
- Understand the intuitive explanations of entropy, cross entropy and relative entropy from the perspective of coding theory
- Be familiar with the computations involved in calculating information content, entropy, cross entropy, and relative entropy in practice
- Be able to recognise and explain the connection between the binary cross entropy loss and the cross entropy between two densities
- Be able to define divergence functions and in particular, the Kullback-Leibler divergence (KLD)
- Be aware of the decomposition of the KLD into a negative entropy and a cross entropy term
- Understand the asymmetric nature of the KL divergence and be able to mathematically show why it happens
- Motivate the minimisation of the KL divergence as an alternative formulation of maximum likelihood density estimation
What is information?
We will begin with an essential question: what is information? More precisely, how do we quantify information?
Picture this: You’re sitting in the KTH library on a Wednesday night, trying to find the last bugs in your solutions to the expectation maximisation exercise, but you just can’t figure it out. You decide to take a break and check your e-mails on your phone. This is what you see:
Question: Which of these e-mails would you open first? That is, which e-mail is likely to contain the most information, based on the previews? Why?