Computer Vision learning projects
featuring VAEs, GANs, image segmentation with U-Net, object detection, etc.
These are the projects from my self-directed learning. Most of them are from the two DeepLearning.AI courses by Andrew Ng: Advanced Computer Vision and Deep Generative Modelling. Each course consists of 4 modules, with one project per module. The project details for relevant topics could be found below. The topics and models covered in each module include:
-
Object Detection: model architecture of R-CNN, Fast R-CNN, Faster R-CNN, finetune RetinaNet.
-
Image Segmentation: model architecture of U-Net, segmentation implemention with FCN
-
Visualization and Interpretability: visualize model predictions and understand CNNs
-
Deep Generative Modelling: style transfer, VAEs and GANs
Project Details
These projects explore various computer vision techniques and generative deep learning models. Each project focuses on applying a specific model or technique to a practical dataset.
1. Style Transfer:
-
Main Objective: Implement neural style transfer using the Inception model as feature extractor.
-
Resutls: A new image of the original dog (left) with the style of the right image.
2. VAEs:
-
Main Objective: Train a Variational Autoencoder (VAE) using the anime faces dataset by MckInsey666. Then use this model to generate a gallery of anime faces.
-
Resutls:
3. GANs:
-
Main Objective: Build a Generative Adversarial Network (GAN) that generates pictures of hands. Trained on a dataset of hand images doing sign language.
-
Resutls: 62.50% of images were classified as hands with a confidence greater than 60% by a hand classifier.
4. Object Detection:
-
Main Objective: Retrain RetinaNet to spot Zombies using just 5 training images.
-
Specific Tasks: Setup the model to restore pretrained weights and fine tune the classification layers.
-
Results: The boxes my model generated match 99.58% of the ground truth boxes with a relative tolerance of 0.3.
5. Image Segmentation 1:
-
Main Objective: Build a model that predicts the segmentation masks (pixel-wise label map) of handwritten digits. This model will be trained on the M2NIST dataset, a multi digit MNIST.
-
Specific Tasks: Build a Convolutional Neural Network (CNN) from scratch for the downsampling path and use a Fully Convolutional Network, FCN-8, to upsample and produce the pixel-wise label map. The model will be evaluated using the intersection over union (IOU) and Dice Score.
-
Results: average IOU score of 75% when compared against the true segments.
6. Image Segmentation 2:
I did this project as an extension to the previous one with the purpose of learning more about U-net.
-
Main Objective: Load and augment the dataset. Create a U-Net model using Segmentation Models with loaded Imagenet weights. Then train and evaluate the model for image segmentation task.
-
Results:
Certificates