How to train neural networks for image classification

Sandy Lee

11 min read

Aug 16, 2020

Getting started with Keras and TensorFlow

Keras is a high-level deep learning API in Python that allows you to easily create and train deep learning models. It was launched in 2016 and has gained traction within the data science community due to its ease-of-use as its syntax is designed to be very similar to Scikit-Learn.

TensorFlow is a library created by the Google Brain team for machine learning tasks and has often competed with PyTorch (created by Facebook) for market share, but has not been able to do as well as its documentation wasn’t as accessible. To remedy this, Google released TensorFlow 2 which contained many improvements particularly around cross-compatibility with models, GPU support and graphing utilities.

Importing the data set

For most simple image classification tasks, it is popular to use the MNIST data set, which consists of 60,000 photos of handwritten numbers. However, for this task, we are going to use the MNIST Fashion dataset, which consists of 60,000 28 x 28 grayscale images of Zalando article fashion images, all classified across 10 different classes. The reason for this is that image classifiers tend to find this more challenging.

Keras has utility functions to help import this dataset, so it is fairly straightforward to use (similar to Sklearn). Work in a Jupyter notebook, and begin by making sure we have all the imports we need:

import tensorflow as tf
from tensorflow import keras
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

We will be working with NumPy arrays and plotting this with Matplotlib, so you will need to ensure these are accessible within your environment. Once this is done, you can import the dataset:

fashion_mnist = keras.datasets.fashion_mnist
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()

The dataset is split between a training and a test set automatically (60,000 images in training, 10,000 images in test). The x-axis data are the images and the y-axis data are the labels. To make this more useful for working with, it is also a good idea to create a validation data set so we can ensure the model isn’t overfitting:

X_valid, X_train = X_train_full[:5000] X_train_full[5000:]
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

The y-axis data is just a series of numbers associated with each class label, therefore we need to define the class labels manually:

class_names = [ “T-shirt/top” , “Trouser” , “Pullover” , “Dress” , “Coat” , “Sandal” , “Shirt” , “Sneaker” , “Bag” , “Ankle boot” ]

To get an idea of what the dataset actually represents we can use a simple loop and Matplotlib:

plt.figure(figsize=(10,10))
for i in range(25):
 plt.subplot(5,5,i+1)
 plt.xticks([])
 plt.yticks([])
 plt.grid(False)
 plt.imshow(X_train_full[i], cmap=plt.cm.binary)
 plt.xlabel(class_names[y_train_full[i]])
plt.show()

From this, you will see something like this:

How to train neural networks for image classification — Part 1 (3)

As you can, while there are only 10 classes (similar to the MNIST dataset), the images are different for each class, which is why it is a more challenging dataset to work with.

Normalizing the dataset

The first step to working with neural networks is to normalize the dataset, otherwise, it could take a lot longer for the network to converge on a solution.

The usual way of normalizing a dataset is to scale the features, and this is done by substacting the mean from each feature and dividing by the standard deviation. This will put the features on the same scale somewhere between 0 — 1.

As we are working with 28 x 28 NumPy arrays representing each image and each pixel in the array has an intensity somewhere between 1 — 255, a simpler way of getting all of these images on a scale between 0–1 is to divide each array by 255.

X_valid, X_train = X_valid / 255., X_train / 255.
X_test = X_test / 255.

From there, we are good to go to build a dense layer neural network and train it on our data set.

Building the neural network image classifier

In order to build the model, we have to specify its structure using Keras’ syntax. As mentioned above, it is very similar to Scikit-Learn and so it should be recognisable if you are familiar with that package. The code for building the model is as follows:

model = keras.models.Sequential([keras.layers.Flatten(input_shape = [28, 28]),
keras.layers.Dense(300, activation = “relu” ),
keras.layers.Dense(100, activation = “relu” ),
keras.layers.Dense(100, activation = “relu” ),
keras.layers.Dense(100, activation = “relu” ),
keras.layers.Dense(10, activation = “softmax” )])

To explain this code:

The first line creates a Sequential model and this is the simplest type of data structure in Keras and is basically a sequence of connected layers in a network
The first layer in the model is a flatten layer and is there for pre-processing of the data and it isn’t trainable itself. What this does is take each 28 x 28 NumPy array for each image and flattens it into a 1 x 784 array that the network can work with
Next, we add a Dense hidden layer with 300 neurons. It will use the ReLU activation function. Each Dense layer manages its own weight matrix, containing all the connection weights between the neurons and their inputs
Next, we add another 3 Dense layers with 100 neurons each. There are diminishing returns to adding new layers and this is something we need to test as we build and optimise the network
Finally, we add a Dense layer with 10 neurons as there are 10 classes to predict and as they are all exclusive, we use the softmax activation function

To get a full understanding of the model’s structure we can use:

model.summary()

And this will give us an output of the full structure of the network:

Model: “sequential_2” _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= flatten_2 (Flatten) (None, 784) 0 _________________________________________________________________ dense_10 (Dense) (None, 300) 235500 _________________________________________________________________ dense_11 (Dense) (None, 100) 30100 _________________________________________________________________ dense_12 (Dense) (None, 100) 10100 _________________________________________________________________ dense_13 (Dense) (None, 100) 10100 _________________________________________________________________ dense_14 (Dense) (None, 10) 1010 ================================================================= Total params: 286,810 Trainable params: 286,810 Non-trainable params: 0 _________________________________________________________________

As can be seen this network has a total of 286,810 trainable parameters (consisting of weights between neurons and bias terms) and this gives the network a lot of flexibility, but it also means that it will be very easy for it to overfit so we need to becareful.

Before we can train the network we need to compile it, and this is done with the following code:

model.compile(loss = “sparse_categorical_crossentropy”,
optimizer = “sgd”,
metrics = [“accuracy”])

In this line we are specifying 3 things:

i) The loss function to use. In this case we are using sparse categorical cross entropy — this is because we have an index of exclusive (sparse) labels we are trying to predict against

ii) The optimizer we are going to use to optimise the model against the loss function is stochastic gradient descent and this will ensure the model converges on an optimum solution i.e. Keras will use the backpropagation method described above

iii) Finally, we specify a metric that we are going to use in addition to loss to give us an idea of how well our model is working. In this case, we are using accuracy which gives an idea of how well our model is doing by giving a percentage of how many predictions match the actual class for the model we are training

Training the network

Training the network is easy once it has been compiled. All you need to do is call the model’s fit method (like Sklearn) as follows:

history = model.fit(X_train,
 y_train,
 epochs = 10,
 validation_data = (X_valid, y_valid))

We initially pass in the data that we want to train the network on, in this case, X_train are the images and y_train is an array containing the labels. We also specify the number of epochs we want to train the model with (an epoch being defined as how many times we want to pass the training data through the network for training purposes).

Keras also lets us specify an optional validation_data argument where we pass in a validation data set. If we do this, then at the end of each epoch Keras will test the performance of the network on the validation data set. This is a good way of ensuring the model isn’t overfitting, however, it doesn’t feed into the training itself.

As training proceeds, you will see something like this:

Epoch 1/10 1719/1719 [==============================] — 5s 3ms/step — loss: 0.7698 — accuracy: 0.7385 — val_loss: 0.5738 — val_accuracy: 0.7962 
Epoch 2/10 1719/1719 [==============================] — 5s 3ms/step — loss: 0.4830 — accuracy: 0.8283 — val_loss: 0.4570 — val_accuracy: 0.8404 
Epoch 3/10 1719/1719 [==============================] — 5s 3ms/step — loss: 0.4261 — accuracy: 0.8480 — val_loss: 0.4121 — val_accuracy: 0.8522 
Epoch 4/10 1719/1719 [==============================] — 5s 3ms/step — loss: 0.3932 — accuracy: 0.8582 — val_loss: 0.3951 — val_accuracy: 0.8566 
Epoch 5/10 1719/1719 [==============================] — 5s 3ms/step — loss: 0.3708 — accuracy: 0.8660 — val_loss: 0.3597 — val_accuracy: 0.8682 
Epoch 6/10 1719/1719 [==============================] — 5s 3ms/step — loss: 0.3518 — accuracy: 0.8728 — val_loss: 0.3397 — val_accuracy: 0.8756 
Epoch 7/10 1719/1719 [==============================] — 5s 3ms/step — loss: 0.3369 — accuracy: 0.8779 — val_loss: 0.3506 — val_accuracy: 0.8738 
Epoch 8/10 1719/1719 [==============================] — 5s 3ms/step — loss: 0.3243 — accuracy: 0.8814 — val_loss: 0.3343 — val_accuracy: 0.8774 
Epoch 9/10 1719/1719 [==============================] — 4s 3ms/step — loss: 0.3128 — accuracy: 0.8861 — val_loss: 0.3415 — val_accuracy: 0.8794 
Epoch 10/10 1719/1719 [==============================] — 4s 2ms/step — loss: 0.3019 — accuracy: 0.8888 — val_loss: 0.3265 — val_accuracy: 0.8822

This will continue for as long as the training is happening, there are accuracy and loss metrics for both the training and validation data sets. The value of the accuracy is a simple percentage measure of how many items the network got right. The value of loss is the cross entropy loss.

Once the model is trained, it is possible to call its history method to get a dictionary of the loss and any other metrics needed at every stage of the training. We can put these in a Pandas DataFrame and plot them as follows:

pd.DataFrame(history.history).plot(figsize = (16, 10))
plt.grid(True)
plt.gca().set_ylim(0, 1)
plt.show()

How to train neural networks for image classification — Part 1 (4)

As can be seen above as the loss decreases, the accuracy increases. Two other things stand out from this plot:

We could probably train this model longer as it doesn’t look like the loss has reached a minimum
The accuracy of the training data set is higher than it is for the validation set (which is normal) but not wildly different to the validation dataset. This means there is no overfitting

Evaluating the performance of the neural network

Evaluating the performance of the network is straight forward and follows data science best practice principles. We call the model’s evalute method on the test data set to see how it performs. Remember that the test data set hasn’t been used in training and the network hasn’t seen this data before. We should perform this step only once so we can get an accurate idea of the model’s performance.

model.evalute(X_test, y_test)

This will run the model on the test data set and the output should look something like this:

313/313 [==============================] — 0s 2ms/step — loss: 0.3802 — accuracy: 0.8858

You’ll get an output of the loss and whatever other metrics specified when the model was compiled. Here we can see that this model is correct 88% of the time which isn’t bad for a simple network on such a difficult data set.

Next steps

In the next part of this series, I will talk about how to implement the above using a convolutional neural network and show how and why these perform better for image classification tasks.

You can get the code I’ve used for this work from my Github here. Thanks for reading.

How to train neural networks for image classification — Part 1 (2024)