Which kind of dog do you most look like?

6 min readMay 26, 2020

Udacity Data Scientist Nanodegree Capstone — Dog App

https://www.buzzfeed.com/annakopsky/does-your-dog-have-a-celebrity-doppelganger

Introduction

This blog post shows every step of the dog breed classification project built with CNN (Convolutional Neural Network) and Transfer Learning technique. At the end of the classifier project, the code will accept any user-supplied image as input. If a dog is detected in the image, it will provide an estimate of the dog’s breed. If a human is detected, it will provide an estimate of the dog breed that is most resembling. Therefore, I will know which dog breed I look like if I upload my image to the check.

Problem statement:

There are three problems I will solve in this project:

(1)Humans detection (2) Dogs detection, and (3) create a CNN to classify dog breeds using Transfer Learning.

Let us start to investigate the data and build the classifier.

Data

The datasets are dog images and human images. Dog images data including 133 dog categories, and 8351 dog images. We split the images into three parts: 6680 training dog images, 835 validation dog images, and 836 test dog images. In the human images data, there are 13233 total human images.

Problem Solutions

(1) Humans detection

OpenCV provides many pre-trained face detectors, Haar feature-based cascade classifier is one of them, I use it to detect the human faces in images. This cascade function trains the pictures with human faces with positive labels Before applying this classifier, convert the images to grayscale is a standard procedure. The detectMultiScale executes the classifier stored in the feature-based cascade classifier and takes the grayscale image as a parameter.

Therefore, we write a function that takes a string-valued file path to an image as input and appears in the code block below.

# returns "True" if face is detected in image stored at img_path
def face_detector(img_path):
    img = cv2.imread(img_path)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    faces = face_cascade.detectMultiScale(gray)
    return len(faces) > 0

We extract the file paths for the first 100 images in the human face dataset and dog face dataset. The detector’s Feedback is not that ideal but acceptable. 100% of human faces are detected as human faces, but 11% of dog faces are detected as human faces, which is also an 11% false-positive rate.

(2) Dogs detection

To detect a dog, I use a pre-trained ResNet-50 model along with weights that have been trained on ImageNet. ImageNet contains over 10 million images with 1000 categories. The code for the dog detector is as follow:

#the categories corresponding to dogs are in the dictionary keys #151-268, the function returns “True” if a dog is detected in the 
#image stored at img_pathdef dog_detector(img_path):
 prediction = ResNet50_predict_labels(img_path)
 return ((prediction <= 268) & (prediction >= 151))

The percentage of the dog faces detected in humans is 0% and the percentage of the dog faces detected in dogs is 100%. The dog detector performs very well and no human faces are detected as dog faces. To detect a dog’s breed is so much harder that is because sometimes the differences between the dog breeds are subtle. In the following part I will use CNN to classify dog breeds.

(3) create a CNN to classify dog breeds using Transfer Learning

step 1: Create a CNN to Classify Dog Breeds (from Scratch)

I create a CNN from scratch to classify dog breeds at first and used an architecture which is a modification of the hinted architecture. In addition to the layer with 64 filters, I added 128 and 256 conv and maxpooling layers to increase the depth and expect better accuracy. I also used BatchNormalization at the end of every conv and maxpooling layers.

Through the use of batch normalization and additional layers based on the provided architecture, I believe the model will work well on the image classification tasks. Because the several convolutional layers are good enough to detect breed-specific features. And the GAP layer before the last layer significantly reduces the number of parameters to be trained. My layer is as follow:

This model is only trained for 5 epochs and the batch size is 30. At 5 epoch, the test accuracy is 19.61%. The result does not look very well. To reduce training time without sacrificing accuracy, it is time to use transfer learning pre-trained networks to create a CNN breed classifierThese networks are trained on large amounts of data, so they can even reach 90% accuracy. For example, ImageNet has been trained with 10 million images.

step 2: Create a CNN to Classify Dog Breeds (from Transfer Learning)

I implemented the pre-trained models like pre-trained VGG19model, and ResNet50 in Keras and compared these model’s accuracy.

VGG19model:

In the model uses the pre-trained VGG-16 model as a fixed feature extractor, the last convolutional output of VGG-16 is fed as input to our model. Therefore, only a global average pooling layer and a fully connected layer will be added. The layer is as follow and the test accuracy is only 37.32%.

ResNet50

In this model, the first layer is an average pooling layer, which could reduce the dimension in order to get faster training. In the second layer, I reduce the number of nodes into 512. The dropout layer prevents the model from overfishing. The softmax layer will predict the probability of each breed. It extracts numbers of breeds for outputting the most possible breed.

This architecture leverages transfer learning from a similar network Resnet50. A good result is obtained since both of the two data have similar features and our data is smaller. For transfer learning, I only need to feed the network with our target 133 classes. And to avoid overfitting, I added the Dropout layer before the last layer.

The model ResNet50 outperforms the VGG19 model. The test accuracy is 80.38%.

Result

We have the human detector, dog detector, and dog breed classifier. We put them together to check whether an image contains a dog or not. If it is a dog, then what the breed it will be. If it is not, then what kind of dog the person looks like. Once all these three detectors are put together, I upload 3 dog images and 3 human images to test the result. The outputs are as follows:

Potential improvements

I think the output is a bit better than my expectations for the dogs. However, I need many more samples to know the accuracy and get a relatively precise answer.

Possible points of improvement:
- Add more data for all kinds of dogs.
- Augment training images by rotating different angles to increase training accuracy.
- If a human is detected as a dog, a random dog of that type can be shown.
- More network architecture can be explored and apply transfer learning.

Conclusion

In this project, haar cascade classifier and ResNet50 pre-trained are used to detect the human and dog faces. I apply the VGG19 and ResNet50 models as fixed extractor and changed the global average pooling layer. Finally, the ResNet50 produced higher accuracy than others, around 81%. It is a challenge for me to get a high accuracy due to the differences among some breeds are subtle.

My repo:Data-Scientist-Nanodegree-Capstone-Dog-app