Curious about Deep Learning? Read on for a mathematics and jargon free crash course in Deep Learning and Artificial Neural Networks!
Deep Learning is a a type of Machine Learning built upon multi-layered Artificial Neural Networks:
Before we delve into Deep Learning we must first understand Artificial Neural Networks.
A common starting point for understanding ANNs is the Feed Forward architecture. In this section we'll look at a typical Feed Forward ANN and explain the process of training it.
Feed Forward ANNs organise their nodes into three distinct layers, with output from each layer being fed in to the next. Information flows in one direction, hence the Feed Forward name.
As already mentioned, nodes in the Input Layer exist merely as an entry point for incoming data and perform no additional processing.
Nodes in the Hidden and Output Layers are more complex.
Nodes in the Hidden and Output Layers function as follows:
The pattern of activated nodes in an ANN is basically the ANNs internal mapping from the input to the output.
To see some examples of the internal activation of an ANN you can check out Adam Harley's visualisation of an ANN trained on the The MNIST database of handwritten digits. This a 4 layer ANN with the input layer at the bottom, two hidden layers and finally an output layer at the top. The visualisation is interactive so feel free to click on the nodes and play around with it.
The next question we need to answer is - how does the ANN learn these internal representations?
Supervised Learning is a Machine Learning paradigm where we "help" our ANN to find the correlation between and input and output by training it with example input:output pairs.
This is pretty much the same as how we teach children the alphabet. We point at a letter and at the same time say what it is. Over time the children learn to associate the visual input with the desired response/output.
The process of training an ANN can be broken down into three steps:
Lets look at each of these steps in more detail:
This is also known as forward-propagation.
Loss or cost is a value representing the difference between an ANNs predictions and the desired output. A higher value indicates a larger gap between prediction and desired output.
ANNs use Loss Functions to calculate this value. Loss Functions come in many forms which in turn support different use cases. They are sometimes referred to as Cost Functions.
The weights in a ANN provide a mechanism for optimising the network and therefore reducing the Loss Function. This mechanism is known as back-propagation, as it's moves backwards through the ANN, updating weights as it goes.
We use an Optimiser Function to find out how best to adjust the ANN's weights to reduce the value of the Loss Function. Optimiser Functions come in different forms, but the most common ones are all based on some variation of Gradient Descent:
In essence the goal of Gradient Descent is to move the red ball down to the lowest point of the curve, or the Global Minimum. This represents the weight values that will give the lowest loss.
During training the ball will be moved "downhill" via incremental adjustments to weight values. The size of these adjustments are known as the Learning Rate. A large Learning Rate can cause the ball to take bigger steps, and perhaps hop over the Global Minimum, while a small Learning Rate will result in ANNs that take very small steps and are very slow to train.
An extra challenge here is that our Optimiser Function has no overview of the gradient that is going to descend. All it knows is that it wants to move downhill. A common problem for Optimiser Functions is therefore that they get stuck in a Local Minimum or on a plateau.
By working backwards through the ANN and adjusting it's weights, back-propagation helps the ANN to optimise it's behaviour over multiple training rounds.
A common definition of Deep Learning is "ANNs with more than one Hidden Layer". In other words, the more Hidden Layers an ANN has, the "deeper" it can be said to be.
The role of the Hidden Layers is to identify features from the input data and use these to correlate between a given input and the correct output.
Deep Learning allows ANNs to handle complex feature hierarchies by supporting a step by step process of pattern recognition, where each Hidden Layer builds upon the "knowledge" from previous ones.
In the above facial recognition example we have three Hidden Layers. Each of these aggregates and recombines features from the previous layer.
A drawback of Deep Learning is that the feature representations in Hidden Layers are not always human readable like the above example. This means that it can be extremely difficult to get insight into what a Deep Learning ANN delivers a specific result, especially if ANN is working in more than 3 dimensions.
Many Deep Learning Architectures exist. These are all based on ANNs, but have specific optimatisations that make them good for certain use cases. Some examples:
One of the cool aspects of Deep Learning is the possibility of Transfer Learning. Transfer Learning is where you copy a previously trained Deep Learning model and retrain it for another task. This can be a shortcut to getting a new model quickly up and running, or for dealing with use cases with small amounts of training data.
For Transfer Learning to succeed the new use case needs to share learned features with the new use case. You won't be able to retrain a facial recognition model to perfom text classification, for example, but you might be able to use it for other Image Processing tasks.
Deep Learning has many potential applications, including Computer Vision, Natural Language Processing, Pattern Recognition. Process Optimisation, Feature Extraction and much more.
Here are some questions that provide a quick sanity check before choosing to apply Deep Learning to your next project.
This knowledge will be necessary to help you configure your Deep Learning ANN, prepare your training data, interpret the output from your ANN as well as helping you deal with unexpected results.
Deep Learning is merely one tool in the Machine Learning toolbox. It is also one of the most complex and time consuming. Instead of going straight for Deep Learning, use some time to look into simpler alternatives.
Deep Learning projects generally perform better when they have access to large amounts of varied data. This gives Hidden Layers enough information perform automatic feature extraction. Lack of variation in your training data can give rise to biased models.
One of the weaknesses of Deep Learning is the lack of Interpretability. Deep Learning ANNs build their own feature hierarchies from the training data, based upon mathematical operations. These feature hierarchies may not be easily interpretable by humans. This can be a problem if you have to explain to a customer why your ANN rejected their application for a credit extension.
Deep Learning can be resource intensive. Not only do you need processing power for training the model, but you also need data storage for all your training data. In the era of cloud computing these problems are easily solvable, but can potentially lead to big costs.
If the answer is "yes" then Deep Learning might be able to help you with this! Deep Learning ANNs perform automatic feature extraction without human intervention. This makes them very suitable for cases where manual feature identification is not possible.
How will you deal with problems such as Data Drift? Do you need to retrain your ANN often? How will you monitor it's performance? Do you have a data pipeline ready? Many people forget to address these questions until it is too late.
When is your model good enough? 98% accuracy? 65%? What is your businesses acceptable margin of error?
A Neural Network Playground is a browser based application that allows you to configure and run an Artificial Neural Network (with 0 to 6 Hidden Layers) to solve simple classification problems.