The goal of this project was to build and train a deep neural network to classify traffic signs in Tensorflow using the German Traffic Sign Dataset. The steps of this project are the following:
- Load the data set (see below for links to the project data set)
- Explore, summarize and visualize the data set
- Design, train and test a model architecture
- Use the model to make predictions on new images
- Analyze the softmax probabilities of the new images
- Summarize the results with a written report
I used the pandas library and python to calculate summary statistics of the traffic signs data set:
- The size of training set is 34799.
- The size of the validation set is 4410.
- The size of test set is 12630.
- The shape of a traffic sign image is (32, 32, 3).
- The number of unique classes/labels in the data set is 43.
Here is an exploratory visualization of the data set. It is a bar chart showing how the train, validation and test data are distributed. As you can see in the illustration below, the train data set ist not uniformly distributed. This could mean that less-data classes are less likely to be predicted than classes with more data.
To get an idea what the images in the German Traffic Signs Dataset look like, I visualize one sign of each class.
As a first step, I decided to generate additional data because of the imbalance of the train data set. To add more data to the the data set, I rotate, contrast and sharpen the images randomly. I use this data augmentation technique to the classes with less samples, so that the final train data set is more balanced as we can see in the following illustration.
Here is an example of an original image and an augmented image:
Next, I decided to convert the images to grayscale to reduce the dimensions and complexity of the neural network. Here is an example of a traffic sign image before and after grayscaling.
As a last step, I normalized the image data to a range of [0,1] by using pixel / 255, so the network can treat every feature equally.
My final model consisted of the following layers:
Layer | Description |
---|---|
Input | 32x32x1 grayscale image |
Convolution 5x5 | 1x1 stride, valid padding, activation: relu, outputs 28x28x32 |
Convolution 5x5 | 1x1 stride, valid padding, activation: relu, outputs 24x24x32 |
Max pooling | 2x2 stride, outputs 12x12x32 |
Convolution 5x5 | 1x1 stride, valid padding, activation: relu, outputs 8x8x64 |
Convolution 5x5 | 1x1 stride, valid padding, activation: relu, outputs 4x4x64 |
Max pooling | 2x2 stride, outputs 2x2x64 |
Flatten | 256 |
Fully connected | 128, activation: relu, keep prob: 0.5 |
Fully connected | 64, activation: relu, keep prob: 0.75 |
Output | 43, activation: softmax |
To train the model, I used an Adam optimizer with a fix learning rate 1e-3. The batch size was set to 32 images. The weights were initialized by a truncated normal distribution. The network was trained for 10 epochs on a notebook and it takes about 1 hour.
My final model results were:
- training set accuracy of 100%
- validation set accuracy of 98.5%
- test set accuracy of 96.5%
If an iterative approach was chosen:
As first approach, I was trying the LeNet architecture from the LeNet lab on normalized data. These architecture was quite enough to reach an validation accuracy > 93%.
The problem of the initial architecture was that the model was not good enough and its poor performance on the test set.
First, I added dropout layers between the fully connected layers to prevent overfitting and then I added more convolutional layers before performing pooling as the well-known VGG16 architecture do.
It was more a try and error process. I was adjusting kernel and filter size of the convolution layers to get more features etc. Furthermore I was adjusting the keep probability rate if the validation accuracy was much more less than the training accuracy. Learning rate and epochs were not adjusted because they were suitable enough. A good practice to tune parameters is to perform grid search.
An important design choice is to use dropout layers, which randomly ignores neurons during training. The effect is that the network becomes less sensitive to the specific weights. That means that the network is more generalized and is less likely to overfit the training data. In general, convolutional layers are quite good for image classification because they are extracting features of images. In addition, I think that double the filter size of a convolution layer after performing max pooling is good practice.
Here are five German traffic signs that I found on the web:
All images show good brightness conditions and look very ideal for the classifier. Possible difficulties could be caused by the copyright labels in the pictures.
Here are the results of the prediction:
Image | Prediction |
---|---|
Priority road | Priority road |
Roundabout mandatory | Roundabout mandatory |
Speed limit (30km/h) | Speed limit (30km/h) |
Keep right | Keep right |
Stop sign | End of no passing by vehicles over 3.5 metric tons |
The model was able to correctly guess 4 of the 5 traffic signs, which gives an accuracy of 80%. This is not comparable with the accuracy on the test set because the test set contains more samples.
For the first image, the model is 100% sure that this is a priority road sign (probability of 100%). The other top five softmax probabilities are zero.
For the second image, the model is relatively sure that this is a roundabout mandatory. It achieves a probability of 55% for the "Roundabout mandatory" sign. The other top five softmax probabilities are shown in the illustration below.
For the third image, the model is sure that this is a speed limit (30km/h) sign with a probability of 100%.
For the fourth image, the model is again sure that this is a keep right sign with a probability of 100%.
For the fifth image, the model misclassified the stop sign. In other words, the model predicts an end of no passing by vehicles over 3.5 metric tons sign with a probability of 60%. Furter, the stop sign is not listed in the top five softmax probabilities which means that the model has trouble predicting on stop signs.