In this article I show how to implement a simple Convolutional Neural Network for image classification using TensorFlow and Keras. A link to the source code is included!
In my previous article about the theories behind Convolutional Neural Networks I identified the following key attributes of Convolutional Neural Networks:
So how do we actually build a Convolutional Neural Network? In this article I will build a CNN using the following tools:
If you are just getting started with Deep Learning and Artificial Neural Networks (ANNs) then I strongly recommend that you read my Introduction to Deep Learning as this contains useful background information about Deep Learning and Artificial Neural Networks.
Before we delve into the architecture of our Convolutional Neural Network (CNN), lets first look at the dataset that we'll be working with!
The Fashion MNIST Dataset was created by Zalando and consists of a training set of 60,000 images and a test set of 10,000 images. Each image is a centered 28x28 pixel image of an item of clothing and is assigned to one of 10 classes.
All images in Fashion MNIST are greyscale. Each pixel is represented by a single value (between 0 and 255) reflecting the darkness of that pixel.
Fashion MNIST is a well understood and simple dataset that provides good results with many ANN Architectures. It is used here so that we can focus on the CNN Architecture rather than getting bogged down in the preprocessing our chosen dataset. Another reason for choosing this dataset is that training ANNs on Fashion MNIST takes relatively short time, allowing for quick and easy experimentation.
The code for this CNN implementation (along with some examples of feature map visualisation) is available from GitHub, but I strongly recommend that you open and execute this code in Google Colaboratory (and use the GPU Runtime) in order to quickly train and test the CNN directly from your browser!
Convolutional Neural Networks come in many different variants, but my architecture for solving Fashion MNIST contains all of the key elements that can be found in most CNNs.
This architecture is a traditional Feed Forward Network trained via back-propagation. In the context of this image, Feed Forward means that incoming data flows downwards through the layers. During training, weights in each layer (including Convolutional Filters) are updated from the bottom up, using back-propagation.
Broadly speaking this CNN architecture performs two tasks:
The bridge between these two tasks is therefore the Flatten Layer.
Confused? All will become clear as we take a closer look at each of the 6 layers.
Convolutional Layers extract features by sliding convolutional filters over the input image and generating feature maps. This process is described further in Convolutional Neural Networks : The Theory.
The code for generating a Convolutional Layer looks like this:
# Create an empty Neural Network model = tf.keras.models.Sequential() # Add a Convolutional Layer to the Neural Network model.add( tf.keras.layers.Conv2D( filters=32, kernel_size=(3, 3), strides=(1, 1), padding='valid', activation='relu', input_shape=(28, 28, 1) ) )
Key takeaways from the code:
Note that CNNs allow for multiple Convolutional Layers, allowing for subsequent Convolutional Layers to extract new features from feature maps produced by previous Convolutional Layers. This would allow for the identification of a feature hierachy in input data. In the case of Fashion MNIST this was not required due to the simplicity of the dataset.
The above image shows an example of input to a trained Conv2D Layer and the resulting output. We see all 32 feature maps, each of which focus on slightly different features of the original image.
It should also be noted that feature maps are not always intuitive for humans, especially if multiple levels of Convolutional Layers have been traversed.
The MaxPooling Layer downsamples the feature maps, as described in Convolutional Neural Networks : The Theory. The goal here is to reduce the size of each feature map. This in turn reduces processing requirements while also scaling down the total number of parameters that the model needs to learn.
The code for generating a MaxPooling Layer looks like this:
# Add a MaxPooling Layer model.add( tf.keras.layers.MaxPooling2D( pool_size=(2, 2), strides=(2, 2) ) )
Key takeaways from the code:
The above image shows how the MaxPooling Layer reduces our feature maps from 26x26 (676) pixels to (169) 13x13 pixels. That is a reduction of 75%!
The Dropout Layer fights overfitting and forces the model to learn mulitiple representations of the same data by randomly disabling a given amount of neurons in the learning phase. The code for this is pretty simple:
model.add( tf.keras.layers.Dropout( rate=0.25 ) )
We can see that the Dropout Layer mimics the output shape of the previous layer (this is automatically set by Keras). The only thing that happens here is that this layer will randomly disable 25% of it's nodes at a given time.
At this point we still have 32 feature maps consisting of 13x13 pixels. This is represented as a 3D array with the shape of 13x13x32. These features must be converted to a 1D list of pixel values before being sent to the Dense Layers for classification. Once again the code here is simple:
model.add( tf.keras.layers.Flatten() )
The result here is simply to rearrange our 5408 nodes into a 1D list, instead of having them in a 13x13x32 array. No futher processing is done in this layer.
The role of the first Dense Layer is to find the correlation between our feature maps (now a 1D array of filtered values) and the desired prediction we want the CNN to make.
model.add( tf.keras.layers.Dense( units=128, activation='relu' ) )
In the above code we create a Dense Layer of 128 nodes.
As this is a Dense Layer, each of it's 128 nodes receives weighted input from all 5408 nodes in the previous Flatten Layer. These weighted inputs will be summed up and sent onwards to the same type of Activation Function (ReLU) that we used in the Conv2D Layer.
During training the weights in both Dense Layers are updated using the same mechanism (back-propagation) that updates filters in the Conv2D Layer.
The second Dense Layer represents the final layer in our CNN. It's role is to map the 128 nodes in the previous layer to just 10 nodes that each represent a unique class in the Fashion MNIST dataset.
model.add( tf.keras.layers.Dense( units=10, activation='softmax') )
As this layer is Dense, each of our 10 nodes sums up weighted inputs from all 128 nodes in the previous layer. A Softmax Activation Function then converts the summed values for each of the 10 nodes into a probability that sums to 1 across all 10 nodes. The node with the highest probability wins!
In addition to specifying the layers of our Convolutional Neural Network, we also need to specify which loss and optimisation functions we want to use in the training of our model.
model.compile( loss=tf.keras.losses.sparse_categorical_crossentropy, optimizer=tf.keras.optimizers.Adam(), metrics=['accuracy'] )
Three things are being specified in this code:
For more background about back-propagation, Loss and Optimiser Functions refer to my Introduction to Deep Learning.
The code for this CNN implementation (along with some feature map visualisations) is available in Google Colaboratory for you to play around with in your browser! The code is also available in static form via GitHub.
In this article we have implemented a Convolutional Neural Network, using TensorFlow and Keras. This has given us additional insight into how CNNs process image data, and some of the possibilities that CNNs can offer.
When compared to more traditionally densely connected ANNs, CNNs are often much more efficient:
The use of filters also provides a couple of potential possibilities:
In this series we have focused on using Convolutional Neural Networks for image processing, but it is important to state that the spatial awareness shown by CNNs also helps them perform well with Natural Language Processing and Time Series problems.
I do hope that you enjoyed this short series of articles about Convolutional Neural Networks! If you liked them then feel free to check out my article about Recurrent Neural Networks!
Mark West leads the Data Science team at Bouvet Oslo.