Table of Contents


🎓 Intended learning outcomes

At the end of this lesson, students are expected to:


In this module, we will look at neural networks and why they currently have a special place among ML methods.

Advances in neural networks are largely responsible for the recent explosion of interest in ML. Yet neural networks have been around since the 1960s. Why are we hearing about them now?

Historically, the perceptron (1959) is thought to be the first artificial neural network. It was initially implemented as a machine, not a computer program. There was a clear euphoria surrounding neural networks in the early days, but then real-world progress failed to meet expectations. Neural networks fell under criticism from famous researchers including Marvin Minsky, who showed that it was impossible for a simple perceptron to learn an XOR function. Many did not realize it at the time, but deeper networks can solve this problem

In the 1960’s, we lacked the methods to effectively train multilayer networks. But that changed in 1986 when the backpropagation algorithm was applied successfully to neural networks. This made it possible to train deeper networks (as we cover in 4.3 How Neural Networks Learn).

The Mark I Perceptron machine. Source: Cornell University News Service records, #4-3-15. Division of Rare and Manuscript Collections, Cornell University Library.

The Mark I Perceptron machine. Source: Cornell University News Service records, #4-3-15. Division of Rare and Manuscript Collections, Cornell University Library.

Still, computational bottlenecks and obstacles such as the vanishing gradient problem and the degradation problem (which we cover in 4.4 Architectures and Innovations) blocked us from training truly deep neural networks until the 2010’s, when innovations in parallel computing, an abundance of data, advances in network architectures and efficient optimization techniques changed things for good.

What is the point of this mini history lesson? As you may have noticed, many historic milestones in neural network development correspond to advances that allowed us to add depth to neural networks. This is important because it exposes an important special characteristic of neural networks —

neural networks are compositions of simple non-linear functions that perform representation learning.

What does “compositions of functions” mean? It means that neural networks are functions of functions of... Adding depth means adding more functions, which leads to more complex representations.

Speaking of which, what does representation learning mean? Let’s dive in...

🥞 Representation Learning ★★★