In this article, I will show you how to classify hand written digits from the MNIST database using the python programming language and a machine learning technique called Convolutional Neural Networks!
If you prefer not to read this article and would like a video representation of it, you can check out the video below. It goes through everything in this article with a little more detail and will help make it easy for you to start programming your own Convolutional Neural Network (CNN) model even if you don’t have the programming language Python installed on your computer. Or you can use both the video and this article as supplementary materials for learning about CNN’s!
First I will write a description of what this program will do. This way when I look back at it later on in the future, I or someone else knows exactly what it does.
# Description: This program uses Convolutional Neural Networks (CNN) # to classify handwritten digits as numbers 0 - 9
Next, I need to install the dependencies / packages. If you don’t already have these packages installed, run the following command in your terminal, command prompt or, Google Colab website (depending on where you have your python programming language installed).
pip install tensorflow keras numpy matplotlib
Now that I’m done installing all of the necessary packages, I want to import the packages into my program.
#import the libraries from keras.models import Sequential from keras.layers import Dense, Conv2D, Flatten from keras.datasets import mnist from keras.utils import to_categorical import matplotlib.pyplot as plt import numpy as np
Next, load the data set into the variables
X_train (the variable that contains the images to train on) ,
y_train (the variable that contains the labels of the images in the training set),
X_test (the variable that contains the images to test on), and the
y_test (the variable that contains the labels of the images in the test set).
#Load the data and split it into train and test sets (X_train,y_train), (X_test, y_test) = mnist.load_data()
Get the image shape of the feature data sets. Notice the
X_trainshape contains 60,000 rows of 28 x 28 pixel images. The
X_testshape contains 10,000 rows of 28 x 28 pixel images.
#Get the image shape print(X_train.shape) print(X_test.shape)
Take a look at the first image in the training data set as a numpy array. This shows the image as a series of pixel values.
Print the image label of the first image from the training data set. The label that was printed was the number 5.
#Print the image label y_train
Show the image not as a series of pixel values, but as an actual image.
Reshape the features (
X_test)to fit the model. So now the training feature set will have 60,000 rows of 28 x 28 pixel images with depth=1 (gray scale) and the test feature set will have 10,000 rows of 28 x 28 pixel images with depth=1 (gray scale).
#Reshape the data to fit the model X_train = X_train.reshape(60000, 28,28,1) X_test = X_test.reshape(10000, 28, 28, 1)
One-Hot Encode the targets (
y_test) to fit the model. Essentially we will be converting these data sets into a set of 10 numbers to input into the neural network. Once done we will print the new label of the train set for the first image.
#One-Hot Encoding y_train_one_hot = to_categorical(y_train) y_test_one_hot = to_categorical(y_test) #Print the new label print(y_train_one_hot)
It’s time to build the model ! Two layers will be convolution layers the first with 64 channels, a 3 x 3 kernel and Rectifier Linear Unit (ReLu) function which will feed 64 images into the second layer, while the second layer will have 32 channels, a 3 x 3 kernel and Rectifier Linear Unit (ReLu) function and feed 32 images into the third layer. The third layer is the flatten layer to transform the dimentionality of the image to a 1-Dimension array to connect with the last layer which contains 10 neurons and the activation function softmax.
model = Sequential()model.add(Conv2D(64, kernel_size=3, activation='relu', input_shape=(28,28,1)))model.add(Conv2D(32, kernel_size=3, activation='relu'))model.add(Flatten())model.add(Dense(10, activation='softmax'))
Next we need to compile the model. We will use the adam optimizer which controls the learning rate, a loss function called categorical_crossentropy which is used for a number of classes greater than 2 (like the 10 different labels in the target data set), and metrics to see the accuracy score on the validation set when we train the model.
#Compile the model model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
Train the model on the training data set (
y_train). I will iterate 3 times over the entire data set to train on, with a number of 32 samples per gradient update for training. Then store this trained model into the variable
Note: I did not specify the number of samples (batch), by default if the batch isn’t specified, then it is 32.
Batch: Total number of training samples present per gradient update
Epoch:The number of iterations when an ENTIRE dataset is passed forward and backward through the neural network only ONCE.
Fit: Another word for train
hist = model.fit(X_train, y_train_one_hot, validation_data=(X_test, y_test_one_hot), epochs=3)
Looks like the model was 98.59% accurate on the training data and 97.68% accurate on the test data. Let’s visualize the models accuracy.
#Visualize the models accuracy plt.plot(hist.history['acc']) plt.plot(hist.history['val_acc']) plt.title('Model Accuracy') plt.ylabel('Accuracy') plt.xlabel('Epoch') plt.legend(['Train', 'Val'], loc='upper left') plt.show()
The model returns only probabilities. So let’s show the probabilities of the first 4 images in the test set.
predictions = model.predict(X_test[:4]) predictions
The probabilities are pretty hard to read. To understand them you must count find the highest number in the set and then count the index that the number is to figure out what the label is which is the index number. For example in the image above for the 3rd image, the highest probability is 9.98755455e-01 which means 99.8755% and that number is located at index 1, so the label is 1. So let’s print the predictions as labels for the first 4 images instead of probabilities like above, and let’s print the actual values / labels of each image to see how they match up.
#Print our predicitons as number labels for the first 4 images print( np.argmax(predictions, axis=1)) #Print the actual labels print(y_test[:4])
Let’s show the first four images as pictures !
#Show the first 4 images as pictures for i in range(0,4): image = X_test[i] image = np.array(image, dtype='float') pixels = image.reshape((28,28)) plt.imshow(pixels, cmap='gray') plt.show()
We are done creating the program ! You can see the video above for how I coded this program and code along with me with a few more detailed explanations, or you can just click the YouTube link here. If you just want to see the code altogether and skip the video you can find it on my Github.
If you are also interested in reading more on machine learning to immediately get started with problems and examples then I strongly recommend you check out Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. It is a great book for helping beginners learn how to write machine learning programs, and understanding machine learning concepts.
Thanks for reading this article I hope it’s helpful to you all! If you enjoyed this article and found it helpful please leave some claps to show your appreciation. Keep up the learning, and if you like machine learning, mathematics, computer science, programming or algorithm analysis, please visit and subscribe to my YouTube channels (randerson112358 & compsci112358 ).