data:image/s3,"s3://crabby-images/826d1/826d1f4037d8b473a880074ca38a36ddd394f862" alt="Artificial Vision and Language Processing for Robotics"
Introduction to Machine Learning
Machine learning (ML) is the science of making computers learn from data without stating any rules. ML is mostly based on models that are trained with a lot of data, such as images of digits or features of different objects, with their corresponding labels, such as the number of those digits or the type of the object. This is called supervised learning. There are other types of learning, such as unsupervised learning and reinforcement learning, but we will be focusing on supervised learning. The main difference between supervised learning and unsupervised learning is that the model learns clusters from the data (depending on how many clusters you specify), which are translated into classes. Reinforcement learning, on the other hand, is concerned with how software agents should take action in an environment in order to increase a reward that is given to the agent, which will be positive if the agent is performing the right actions and negative otherwise.
In this part of the chapter, we will gain an understanding of machine learning and check a variety of models and algorithms, going from the most basic models to explaining artificial neural networks.
Decision Trees and Boosting Algorithms
In this section, we will be explaining decision trees and boosting algorithms as some of the most basic machine learning algorithms.
Bagging (decision trees and random forests) and boosting (AdaBoost) will be explained in this topic.
Bagging:
Decision trees are perhaps the most basic machine learning algorithms, and are used for classification and regression, but on a basic level, they are used for teaching and performing tests.
In a decision tree, every node represents an attribute of the data that is being trained on (whether something is true or false), where every branch (line between nodes) represents a decision (if something is true, go this way; otherwise, the other way) and every leaf represents a final outcome (if all conditions are fulfilled, it's a sunflower or a daisy).
We are now going to use the Iris dataset. This dataset considers sepal width and length, along with petal width and length, in order to classify Iris flowers as setosa, versicolour, or virginica.
Note
The Iris dataset can be downloaded from scikit-learn using Python:
https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html
Scikit-learn is a library that provides useful tools for data mining and data analysis.
The following flowchart shows the learning representation of a decision tree trained on this dataset. X represents features from the dataset, X0 being sepal length, X1 being sepal width, X2 being petal length, and X3 petal width. The 'value' tag is how many samples of each category fall into each node. We can see that, in the first step, the decision tree already distinguishes setosa from the other two by only considering the X2 feature, petal length:
data:image/s3,"s3://crabby-images/27d61/27d61873e79bd1bb19dc474fdc09150dc6bf9f20" alt=""
Figure 2.26: Graph of a decision tree for the Iris dataset
Decision trees can be implemented in Python using only a couple of lines thanks to scikit-learn:
from sklearn.tree import DecisionTreeClassifier
dtree=DecisionTreeClassifier()
dtree.fit(x,y)
x and y are the features and the labels of the training set, respectively.
x, apart from being only columns of data representing those lengths and widths, could also be every pixel of the image. In machine learning, when the input data is images, every pixel is treated as a feature.
Decision trees are trained for one specific task or dataset and cannot be transferred to another similar problem. Nevertheless, several decision trees can be combined in order to create bigger models and learn how to generalize. These are called random forests.
The name forest refers to an ensemble of many decision tree algorithms, following the bagging method, which states that the combination of several algorithms achieves the best result overall. The appearance of the word "random" refers to the randomness of the algorithm when selecting the features to take into account to split a node.
Thanks again to scikit-learn, we can implement the random forest algorithm with only a couple of lines, fairly similar to the previous lines:
from sklearn.ensemble import RandomForestClassifier
rndForest=RandomForestClassifier(n_estimators=10)
rndForest.fit(x,y)
n_estimators stands for the number of underlying decision trees. If you test the results with this method, the results will improve for sure.
There are other methods that follow the boosting methodology as well. Boosting consists of algorithms called weak learners that are put together into a weighted sum and generate a strong learner, which gives an output. These weak learners are trained sequentially, meaning each one of them tries to solve the mistakes made by its predecessor.
There are many algorithms that use this approach. The most famous ones are AdaBoost, gradient boosting, and XGBoost. We are only going to look at AdaBoost as it is the most well known and easy to understand.
Boosting
AdaBoost puts together weak learners in order to form a strong learner. The name AdaBoost stands for adaptive boosting, which means that this strategy would weigh differently at each point in time. Those examples that are incorrectly classified in a single iteration, get a higher weight than the next iteration, and vice versa.
The code for this method is as follows:
from sklearn.ensemble import AdaBoostClassifier
adaboost=AdaBoostClassifier(n_estimators=100)
adaboost.fit(x_train, y_train)
n_estimators is the maximum number of estimators once boosting is completed.
This method is initialized with a decision tree underneath; thus, the performance might not be as good as the random forest. But in order to make a better classifier, the random forest algorithm should be used instead:
AdaBoostClassifier(RandomForestClassifier(n_jobs=-1,n_estimators=500,max_features='auto'),n_estimators=100)
Exercise 8: Predicting Numbers Using the Decision Tree, Random Forest, and AdaBoost Algorithms
In this exercise, we are going to use the digits obtained from the last exercise and the models that we have learned in this topic to correctly predict every number. To do that, we are going to extract several digits from some samples inside the Dataset/numbers folder, along with the MNIST dataset to have enough data, so the models learn properly. The MNIST dataset is a compound of handwritten digits, which go from 0 to 9 with a shape of 28 x 28 x 3, and it is mostly used for researchers to test their methods or to play around with. Nevertheless, it can help to predict some numbers even though they are not of the same kind. You can check out this dataset at http://yann.lecun.com/exdb/mnist/.
As the installation of Keras requires TensorFlow, we propose to use Google Colab, which is just like a Jupyter notebook but with the difference that your system is not being used. Instead, a remote virtual machine is used and everything for machine learning and Python is already installed.
Let's begin the exercise:
Note
We will be continuing the code from Exercise 7, here in the same notebook.
- Head to the interface on Google Colab, where you executed the code for Exercise 7, Loading an Image and Applying the Learned Methods.
- Import the libraries:
import numpy as np
import random
from sklearn import metrics
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.utils import shuffle
from matplotlib import pyplot as plt
import cv2
import os
import re
random.seed(42)
Note
We are setting the seed of the random method to 42, which is for reproducibility: all random steps have the same randomness and always give the same output. It could be set to any number that does not vary.
- Now we are going to import the MNIST dataset:
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
In the last line of the code, we are loading the data in x_train, which is the training set (60,000 examples of digits), y_train, which are the labels of those digits, x_test, which is the testing set, and y_test, which are the corresponding labels. These are in NumPy format.
- Let's show some of those digits using Matplotlib:
for idx in range(5):
rnd_index = random.randint(0, 59999)
plt.subplot(1,5,idx+1),plt.imshow(x_train[idx],'gray')
plt.xticks([]),plt.yticks([])
plt.show()
Figure 2.27: MNIST dataset
Note
These digits do not look like the ones that we extracted in the previous exercise. In order to make the models properly predict the digits from the image processed in the first exercise, we will need to add some of those digits to this dataset.
Here's the process for adding new digits that look like the ones we want to predict:
Add a Dataset folder with subfolders numbered from 0 to 9 (already done).
Get the code from the previous exercise.
Use the code to extract all the digits from the images that are stored in 'Dataset/numbers/' (already done).
Paste the generated digits to the corresponding folders with the name that corresponds to the digit generated (already done).
Add those images to the original dataset (step 5 in this exercise).
- To add those images to your training set, these two methods should be declared:
# ---------------------------------------------------------
def list_files(directory, ext=None):
return [os.path.join(directory, f) for f in os.listdir(directory)
if os.path.isfile(os.path.join(directory, f)) and ( ext==None or re.match('([\w_-]+\.(?:' + ext + '))', f) )]
# -------------------------------------------------------
def load_images(path,label):
X = []
Y = []
label = str(label)
for fname in list_files( path, ext='jpg' ):
img = cv2.imread(fname,0)
img = cv2.resize(img, (28, 28))
X.append(img)
Y.append(label)
if maximum != -1 :
X = X[:maximum]
Y = Y[:maximum]
X = np.asarray(X)
Y = np.asarray(Y)
return X, Y
The first method, list_files(), lists all the files within a folder with the specified extension, which in this case is jpg.
In the main method, load_images(), we are loading the images from those folders, which are from the digit folder, with its corresponding label. If the maximum is different to –1, we establish a limit to the quantity that is loaded for every digit. We do this because there should be similar samples for every digit. Finally, we convert the lists to NumPy arrays.
- Now we need to add these arrays to the training set so that our models can learn how to recognize the extracted digits:
print(x_train.shape)
print(x_test.shape)
X, Y = load_images('Dataset/%d'%(0),0,9)
for digit in range(1,10):
X_aux, Y_aux = load_images('Dataset/%d'%(digit),digit,9)
print(X_aux.shape)
X = np.concatenate((X, X_aux), axis=0)
Y = np.concatenate((Y, Y_aux), axis=0)
After adding those digits using the method declared in the preceding code, we concatenate those arrays to the sets created before the for loop mentioned:
from sklearn.model_selection import train_test_split
x_tr, x_te, y_tr, y_te = train_test_split(X, Y, test_size=0.2)
After this, the train_test_split method from sklearn is used in order to separate those digits – 20% for testing and the rest for training:
x_train = np.concatenate((x_train, x_tr), axis=0)
y_train = np.concatenate((y_train, y_tr), axis=0)
x_test = np.concatenate((x_test, x_te), axis=0)
y_test = np.concatenate((y_test, y_te), axis=0)
print(x_train.shape)
print(x_test.shape)
Once done, we concatenate those to the original training and testing sets. We have printed the shape of x_train and x_test before and after so those extra 60 digits can be seen. It goes from shape (60,000, 28, and 28) and (10,000, 28, and 28) to shape (60,072, 28, and 28) and (10,018, 28, and 28).
- For the models imported from sklearn that we are going to use in this exercise, we need to format the arrays to the shape (n samples and array), and now we have (n samples, array_height, and array_width):
x_train = x_train.reshape(x_train.shape[0],x_train.shape[1]*x_train.shape[2])
x_test = x_test.reshape(x_test.shape[0],x_test.shape[1]*x_test.shape[2])
print(x_train.shape)
print(x_test.shape)
We multiply the height and the width of the array in order to get the total length of the array, but only in one dimension: (28*28) = (784).
- Now we are ready to feed the data into the models. We will start training a decision tree:
print ("Applying Decision Tree...")
dtc = DecisionTreeClassifier()
dtc.fit(x_train, y_train)
In order to see how well this model performs, metric accuracy is used. This represents the number of samples from x_test that have been predicted, which we have already imported from the metrics module and from sklearn. Now we will be using accuracy_score() from that module to calculate the accuracy of the model. We need to predict the results from x_test using the predict() function from the model and see whether the output matches the y_test labels:
y_pred = dtc.predict(x_test)
accuracy = metrics.accuracy_score(y_test, y_pred)
print(accuracy*100)
After that, the accuracy is calculated and printed. The resulting accuracy percentage is 87.92%, which is not a bad result for a decision tree. It can be improved though.
- Let's try the random forest algorithm:
print ("Applying RandomForest...")
rfc = RandomForestClassifier(n_estimators=100)
rfc.fit(x_train, y_train)
Following the same methodology to calculate the accuracy, the accuracy obtained is 94.75%, which is way better and could be classified as a good model.
- Now, we will try AdaBoost initialized with random forest:
print ("Applying Adaboost...")
adaboost = AdaBoostClassifier(rfc,n_estimators=10)
adaboost.fit(x_train, y_train)
The accuracy obtained using AdaBoost is 95.67%. This algorithm takes much more time than the previous ones but gets better results.
- We are now going to apply random forest to the digits that were obtained in the last exercise. We apply this algorithm because it takes much less time than AdaBoost and gives better results. Before checking the following code, you need to run the code from the exercise one for the image stored in the Dataset/number.jpg folder, which is the one used in the first exercise, and for the other two images that are extracted for testing in the Dataset/testing/ folder. Once you have done that, you should have five images of digits in your directory for every image, ready to be loaded. Here's the code:
for number in range(5):
imgLoaded = cv2.imread('number%d.jpg'%(number),0)
img = cv2.resize(imgLoaded, (28, 28))
img = img.flatten()
img = img.reshape(1,-1)
plt.subplot(1,5,number+1),
plt.imshow(imgLoaded,'gray')
plt.title(rfc.predict(img)[0])
plt.xticks([]),plt.yticks([])
plt.show()
data:image/s3,"s3://crabby-images/da00e/da00ebdd8bf578aa23b7b8b0c062150af69a5e5e" alt=""
Figure 2.28: Random forest prediction for the digits 1, 6, 2, 1, and 6
Here, we are applying the predict() function of the random forest model, passing every image to it. Random forest seems to perform pretty well, as it has predicted all of the numbers correctly. Let's try another number that has not been used (there is a folder with some images for testing inside the Dataset folder):
data:image/s3,"s3://crabby-images/6a928/6a928992cbeadd9257018e2ec0d1460eb941a8f2" alt=""
Figure 2.29: Random forest prediction for the digits 1, 5, 8, 3, and 4
It is still performing well with the rest of the digits. Let's try another number:
data:image/s3,"s3://crabby-images/7325c/7325c13f35e41cc4f58c83eadde441f3ee239e5d" alt=""
Figure 2.30: Random forest prediction for the digits 1, 9, 4, 7, and 9
With the number 7, it seems to be having problems. It is probably because we have not introduced enough samples, and due to the simplicity of the model.
Note
The entire code for this exercise is available on GitHub in the Lesson02 | Exercise07-09 folder.
Now, in the next topic, we are going to explore the world of artificial neural networks, which are far more capable of achieving these tasks.
Artificial Neural Networks (ANNs)
Artificial neural networks (ANNs) are information processing systems that are modeled on and inspired by the human brain, which they try to mimic by learning how to recognize patterns in data. They accomplish tasks by having a well structured architecture. This architecture is composed of several small processing units called neurons, which are interconnected in order to solve major problems.
ANNs learn by having enough examples in the dataset that they are processing, and enough examples means thousands of examples, or even millions. The amount of data here can be a disadvantage, since if you do not have this data, you will have to create it yourself, and that means that you will probably need a lot of money to gather sufficient data.
Another disadvantage of these algorithms is that they need to be trained on specific hardware and software. They are well trained on high-performance GPUs, which are expensive. You can still do certain things using a GPU that does not cost that much, but the data will take much longer to be trained. You also need to have specific software, such as TensorFlow, Keras, PyTorch, or Fast.AI. For this book, we will be using TensorFlow and Keras, which runs on top of TensorFlow.
These algorithms work by taking all of the data as input, in which the first layer of neurons acts as the input. After that, every entry is passed to the next layer of neurons, where these are multiplied by some value and processed by an activation function, which makes "decisions" and passes those values to the next layer. The layers in the middle of the network are called hidden layers. This process keeps going until the last layer, where the output is given. When introducing the MNIST images as input to the neural network, the end of the network should have 10 neurons, each neuron representing each digit, and if the neural network guesses that an image is a specific digit, then the corresponding neuron will be activated. The ANN checks whether it has succeeded for the decision, and if not, it performs a correction process called backpropagation, where every pass of the network is checked and corrected, adjusting the weights of the neurons. In Figure 2.31, backpropagation is shown:
data:image/s3,"s3://crabby-images/1e04a/1e04ad9cfcba2fd02278bd1193773a0ed15f7e0a" alt=""
Figure 2.31: Backpropagation process
Here is a graphical representation of an ANN:
data:image/s3,"s3://crabby-images/efdf5/efdf5c71f7630adedb0d8a8acbf6e23f49ba7ea2" alt=""
Figure 2.32: ANN architecture
In the preceding diagram, we can see the neurons, which is where all the processing occurs, and the connections between them, which are the weights of the network.
We are going to gain an understanding of how to create one of these neural networks, but first, we need to take a look at the data that we have.
In the previous exercise, we had the shapes (60,072 and 784) and (10,018 and 784) as integer types, and 0 to 255 as pixel values, for training and testing, respectively. ANNs perform better and faster with normalized data, but what is that?
Having normalized data means converting that 0-255 range of values to a range of 0-1. The values must be adapted to fit between 0 and 1, which means they will be float numbers, because there is no other way to fit a higher range of numbers into a shorter range So, first we need to convert the data to a float and then normalize it. Here's the code for doing so:
x_train = (x_train.astype(np.float32))/255.0 #Converts to float and then normalize
x_test = (x_test.astype(np.float32))/255.0 #Same for the test set
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)
For the labels, we also need to change the format to one-hot encoding.
In order to do that, we need to use a function from Keras, from its utils package (the name has changed to np_utils), called to_categorical(), which transforms the number of the digit of every label to one-hot encoding. Here's the code:
y_train = np_utils.to_categorical(y_train, 10)
y_test = np_utils.to_categorical(y_test, 10)
If we print the first label of y_train, 5, and then we print the first value of y_train after the conversion, it will output [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]. This format puts a 1 in the sixth place of an array of 10 positions (because there are 10 numbers) for the number 5 (in the sixth place because the first one is for the 0, and not for the 1). Now we are ready to go ahead with the architecture of the neural network.
For a basic neural network, dense layers (or fully connected layers) are employed. These neural networks are also called fully connected neural networks. These contain a series of neurons that represent the neurons of the human brain. They need an activation function to be specified. An activation function is a function that takes the input and calculates a weighted sum of it, adding a bias and deciding whether it should be activated or not (outputs 1 and 0, respectively).
The two most used activation functions are sigmoid and ReLU, but ReLU has demonstrated better performance overall. They are represented on the following chart:
data:image/s3,"s3://crabby-images/87766/87766cab669b12e326ecbfe0eec28b8090731562" alt=""
Figure 2.33: The sigmoid and ReLU functions
The sigmoid and ReLU functions calculate the weighted sum and add the bias. They then output a value depending on the value of that calculation. The sigmoid function will give different values depending on the value of the calculation, from 0 to 1. But ReLU will give 0 for negative values or return the value of the calculation for positive values.
Toward the end of a neural network, normally the softmax activation function takes place, which will output a non-probabilistic number for every class, which is higher for the class that has the highest chance of corresponding to the input image. There are other activation functions, but this one is the best for the output of a network for multi-classification problems.
In Keras, a neural network could be coded as follows:
model = Sequential()
model.add(Dense(16, input_shape=input_shape))
model.add(Activation('relu'))
model.add(Dense(8))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(10, activation="softmax"))
The model is created as Sequential() as the layers are created sequentially. First, we add a dense layer with 16 neurons and the shape of the input is passed so that the neural network knows the shape of the input. After which, the ReLU activation function is applied. We use this function because it generally gives good results. We stack another layer with eight neurons and the same activation function.
At the end, we use the Flatten function to convert the array to one dimension and then the last dense layer is stacked, where the number of classes should represent the number of neurons (in this case, there would be 10 classes for the MNIST dataset). The softmax function is applied in order to get the results as a one-hot encoder, as we have mentioned before.
Now we have to compile the model. In order to do that, we use the compile method as follows:
model.compile(loss='categorical_crossentropy', optimizer=Adadelta(), metrics=['accuracy'])
We pass the loss function, which is used to calculate the error for the backpropagation process. For this problem, we will be using categorial cross-entropy as the loss function, as this is a categorical problem. The optimizer used is Adadelta, which performs very well in most situations. We establish accuracy as the main metric to be considered in the model.
We are going to use what is called a callback in Keras. These are called in every epoch during training. We will be using the Checkpoint function in order to save our model with the best validation result on every epoch:
ckpt = ModelCheckpoint('model.h5', save_best_only=True,monitor='val_loss', mode='min', save_weights_only=False)
The function to train this model is called fit() and is implemented as follows:
model.fit(x_train, y_train, batch_size=64, epochs=10, verbose=1, validation_data=(x_test, y_test),callbacks=[ckpt])
We pass the training set with its labels, and we establish a batch size of 64 (these are the images that are passed on every step of every epoch), out of which we choose to have 10 training epochs (on every epoch the data is processed). The validation set is also passed in order to see how the model performs on unseen data, and at the end, we set the callback that we created before.
All these parameters have to be adjusted according to the problem that we are facing. In order to put all of this into practice, we are going to perform an exercise – the same exercise that we did with decision trees, but with neural networks.
Exercise 9: Building Your First Neural Network
Note
We will be continuing the code from Exercise 8 here.
The entire code for this exercise can be found on GitHub in the Lesson02 | Exercise07-09 folder.
- Head to the interface on Google Colab where you executed the code for Exercise 8, Predicting Numbers Using the Decision Tree, Random Forest, and AdaBoost Algorithms.
- Now import the packages from the Keras library:
from keras.callbacks import ModelCheckpoint
from keras.layers import Dense, Flatten, Activation, BatchNormalization, Dropout
from keras.models import Sequential
from keras.optimizers import Adadelta
from keras import utils as np_utils
- We normalize the data as we explained in this part of the chapter. We also declare the input_shape instance that will be passed to the neural network, and we print it:
x_train = (x_train.astype(np.float32))/255.0
x_test = (x_test.astype(np.float32))/255.0
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)
y_train = np_utils.to_categorical(y_train, 10)
y_test = np_utils.to_categorical(y_test, 10)
input_shape = x_train.shape[1:]
print(input_shape)
print(x_train.shape)
The output is as follows:
Figure 2.34: Data output when passed for normalization using neural networks
- Now we are going to declare the model. The model that we built before was never going to perform well enough on this problem, so we have created a deeper model with more neurons and with a couple of new methods:
def DenseNN(input_shape):
model = Sequential()
model.add(Dense(512, input_shape=input_shape))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Dropout(0.2))
model.add(Dense(256))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(256))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Dropout(0.2))
model.add(Dense(10, activation="softmax"))
We have added a BatchNormalization() method, which helps the network converge faster and may give better results overall.
We have also added the Dropout() method, which helps the network to avoid overfitting (the accuracy of the training set is much higher than the accuracy of the validation set). It does that by disconnecting some neurons during training (0.2 -> 20% of neurons), which allows better generalization of the problem (better classification of unseen data).
Furthermore, the number of neurons has increased drastically. Also, the number of layers has increased. The more layers and neurons are added, the deeper the understanding is and more complex features are learned.
- Now we compile the model using categorical cross-entropy, as there are several classes, and we use Adadelta, which is great overall for these kinds of tasks. Also, we use accuracy as the main metric:
model.compile(loss='categorical_crossentropy', optimizer=Adadelta(), metrics=['accuracy'])
- Let's create the Checkpoint callback, where the model will be stored in the Models folder with the name model.h5. We will be using validation loss as the main method to be tracked and the model will be saved in its entirety:
ckpt = ModelCheckpoint('Models/model.h5', save_best_only=True,monitor='val_loss', mode='min', save_weights_only=False)
- Start to train the network with the fit() function, just like we explained before. We use 64 as the batch size, 10 epochs (which is enough as every epoch is going to last a very long time and between epochs it will not improve that much), and we will introduce the Checkpoint callback:
model.fit(x_train, y_train,
batch_size=64,
epochs=10,
verbose=1,
validation_data=(x_test, y_test),
callbacks=[ckpt])
This is going to take a while.
The output should look like this:
Figure 2.35: Neural network output
The final accuracy of the model corresponds to the last val_acc, which is 97.83%. This is a better result than we got using AdaBoost or random forest.
- Now let's make some predictions:
for number in range(5):
imgLoaded = cv2.imread('number%d.jpg'%(number),0)
img = cv2.resize(imgLoaded, (28, 28))
img = (img.astype(np.float32))/255.0
img = img.reshape(1, 28, 28, 1)
plt.subplot(1,5,number+1),plt.imshow(imgLoaded,'gray')
plt.title(np.argmax(model.predict(img)[0]))
plt.xticks([]),plt.yticks([])
plt.show()
The code looks similar to the code used in the last exercise but has some minor differences. One is that, as we changed the input format, we have to change the format of the input image too (float and normalize). The other is that the prediction is in one-hot encoding, so we use the argmax() NumPy function in order to get the position of the maximum value of the one-hot output vector, which would be the predicted digit.
Let's see the output of the last number that we tried using random forest:
data:image/s3,"s3://crabby-images/89ea9/89ea911f117d9b2789cf8608a01c59488a2521ab" alt=""
Figure 2.36: Prediction of numbers using neural networks
The output has been successful – even the 7 that the random forest model struggled with.
Note
The entire code can be found on GitHub in the Lesson02 | Exercise07-09 folder.
If you try the other numbers, it will classify them all very well – it has learned how to.
Congratulations! You have built your first neural network and you have applied it to a real-world problem! Now you are ready to go through the activity for this chapter.
Activity 2: Classify 10 Types of Clothes from the Fashion-MNIST Database
Now you are going to face a similar problem to the previous one but with types of clothes. This database is very similar to the original MNIST. It has 60,000 images – 28x28 in grayscale – for training and 10,000 for testing. You will have to follow the steps mentioned in the first exercise as this activity is not focused on the real world. You will have to put into practice the abilities learned in the last exercise by building a neural network on your own. For this, you will have to open a Google Colab notebook. The following steps will guide you in the right direction:
- Load the dataset from Keras:
from keras.datasets import fashion_mnist
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
Note
The data is preprocessed like MNIST, so the next steps should be similar to Exercise 5, Applying the Various Morphological Transformations to an Image.
- Import random and set the seed to 42. Import matplotlib and plot five random samples of the dataset, just as we did in the last exercise.
- Now normalize the data and reshape it to fit properly into the neural network and convert the labels to one-hot encoder.
- Start to build the architecture of the neural network by using dense layers. You have to build it inside a method that will return the model.
Note
We recommend starting off by building a very small, easy architecture and improving it by testing it with the given dataset.
- Compile the model with the appropriate parameters and start training the neural network.
- Once trained, we should make some predictions in order to test the model. We have uploaded some images into the same testing folder inside the Dataset folder of the last exercise. Make predictions using those images, just as we did in the last exercise.
Note
You have to consider that the images that were fed into the neural network had a black background and the clothes were white, so you should make corresponding adjustments to make the image look like those. If needed, you should invert white as black and vice versa. NumPy has a method that does that: image = np.invert(image).
- Check the results:
data:image/s3,"s3://crabby-images/084d1/084d1809ed1c10393f9c3430a189bf51a6979e07" alt=""
Figure 2.37: The output of the prediction is the index of the position in this list
Note
The solution for this activity is available on page 302.