Computer Vision learning projects

These are the projects from my self-directed learning. Most of them are from the two DeepLearning.AI courses by Andrew Ng: Advanced Computer Vision and Deep Generative Modelling. Each course consists of 4 modules, with one project per module. The project details for relevant topics could be found below. The topics and models covered in each module include:

Object Detection: model architecture of R-CNN, Fast R-CNN, Faster R-CNN, finetune RetinaNet.
Image Segmentation: model architecture of U-Net, segmentation implemention with FCN
Visualization and Interpretability: visualize model predictions and understand CNNs
Deep Generative Modelling: style transfer, VAEs and GANs

Project Details

These projects explore various computer vision techniques and generative deep learning models. Each project focuses on applying a specific model or technique to a practical dataset.

1. Style Transfer:

Main Objective: Implement neural style transfer using the Inception model as feature extractor.
Resutls: A new image of the original dog (left) with the style of the right image.

2. VAEs:

Main Objective: Train a Variational Autoencoder (VAE) using the anime faces dataset by MckInsey666. Then use this model to generate a gallery of anime faces.
Resutls:

3. GANs:

Main Objective: Build a Generative Adversarial Network (GAN) that generates pictures of hands. Trained on a dataset of hand images doing sign language.
Resutls: 62.50% of images were classified as hands with a confidence greater than 60% by a hand classifier.

4. Object Detection:

Main Objective: Retrain RetinaNet to spot Zombies using just 5 training images.
Specific Tasks: Setup the model to restore pretrained weights and fine tune the classification layers.
Results: The boxes my model generated match 99.58% of the ground truth boxes with a relative tolerance of 0.3.

5. Image Segmentation 1:

Main Objective: Build a model that predicts the segmentation masks (pixel-wise label map) of handwritten digits. This model will be trained on the M2NIST dataset, a multi digit MNIST.
Specific Tasks: Build a Convolutional Neural Network (CNN) from scratch for the downsampling path and use a Fully Convolutional Network, FCN-8, to upsample and produce the pixel-wise label map. The model will be evaluated using the intersection over union (IOU) and Dice Score.
Results: average IOU score of 75% when compared against the true segments.

6. Image Segmentation 2:

I did this project as an extension to the previous one with the purpose of learning more about U-net.

Main Objective: Load and augment the dataset. Create a U-Net model using Segmentation Models with loaded Imagenet weights. Then train and evaluate the model for image segmentation task.
Results:

Certificates