Human-Recognition-in-Surveillance-Settings

The primary objective of this project is to enhance the quality of low-resolution images to high-resolution images. This main goal can be broken down into several specific tasks as listed below:

Image Pre-Processing
- Standardize the proportions of the images by making them square, with black bands on both sides.
- Estimate the head pose (Head Pose Estimation).
- Perform facial segmentation (Face Segmentation).
Train the Pix2Pix Model
- Train the Pix2Pix model on the respective dataset with and without segmentation.
Critical Analysis
- Conduct a critical analysis of the results obtained from the model training.

This project aims to leverage advanced image processing techniques and machine learning models to improve image quality, providing a comprehensive approach from pre-processing to model training and result analysis.

Database Structure

The database consists of a collection of videos in .mp4 format from 159 users, each contributing four videos. The videos are categorized based on their quality:

Good Quality Videos (Code "E"): Two videos per user, where users look directly at the camera in a relaxed pose.
Poor Quality Videos (Code "U"): Two videos per user, with variations in distance, angle, perspective, environment, lighting, and occlusions.

Each user has a total of one minute of video content, divided into four videos of thirty seconds each. The videos are recorded at a frame rate of ten frames per second and are in horizontal format.

Frame Extraction and Organization

The videos have been converted into frames and organized into subfolders within a main folder named "face_square" (that is not available in repo). This folder structure allows for easy access and management of the frames corresponding to each video.

Main Folder: face_square
- Subfolders: Each subfolder contains frames from a specific video.

This organization ensures that frames from good quality and poor quality videos are easily distinguishable and accessible for further processing or analysis.

Pre-processing

Square Face Illustration

The faces were squared to convert them to a unique size.

Poor Quality Videos (Code "U")	Good Quality Videos (Code "E")

Face Segmentation

The faces were segmented to isolate the region of interest (ROI).

Poor Quality Videos (Code "U")	Good Quality Videos (Code "E")

Pix2Pix Model from Scratch

The Pix2Pix model is a type of Generative Adversarial Network (GAN) aimed at mapping an input image to an output image.

In this project, the input images are frames from poor quality videos, and the output images are frames from good quality videos. The goal is to train the model to generate high-quality images from low-quality images.

U-Net Architecture

Colab Implementation Code

Total Parameters: 54,425,859 (207.62 MB)
Trainable Parameters: 54,414,979 (207.58 MB)
Non-trainable Parameters: 10,880 (42.50 KB)

Initial Values (Before Fine-tuning)

Batch Size: 32
Early Stopping: 2 (number of epochs without improvement)
Loss Function: Mean Square Error (penalizes large errors more than small errors)
Number of Epochs: Initially planned for 4 but had to be reduced due to computational cost

Results

Non-Segmented Faces

Training Conditions:

Utilization of 25% of the available dataset (34 users)
One epoch
Batch Size: 32

Segmented Faces

Training Conditions:

Utilization of ~32% of the available dataset (50 users)
One epoch
Batch Size: 32

Results (not that great) Brief Disclaimer

The results obtained from the model training were not as expected due to the following reasons:

The model has too many parameters, which requires a lot of time to complete one epoch (approximately 26 hours). To address this, the number of layers should be reduced to decrease the number of parameters, allowing for more epochs.
Only one epoch was performed with a reduced percentage (25-30%) of the dataset, which significantly limits the final results.

To Find More

Presentation (Portuguese): Presentation-Template.pptx
Report (Portuguese): Relatory - Human_Recogn (6).pdf

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
obtained_images		obtained_images
Human_Recogn (6).pdf		Human_Recogn (6).pdf
Presentation-Template.pptx		Presentation-Template.pptx
README.md		README.md
face_seg_fcn_resnet50.ipynb		face_seg_fcn_resnet50.ipynb
pix2pix_model.ipynb		pix2pix_model.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Human-Recognition-in-Surveillance-Settings

Database Structure

Frame Extraction and Organization

Pre-processing

Square Face Illustration

Face Segmentation

Pix2Pix Model from Scratch

U-Net Architecture

Initial Values (Before Fine-tuning)

Results

Non-Segmented Faces

Segmented Faces

Results (not that great) Brief Disclaimer

To Find More

About

Releases

Packages

Languages

jmbmartins/Human-Recognition-in-Surveillance-Settings

Folders and files

Latest commit

History

Repository files navigation

Human-Recognition-in-Surveillance-Settings

Database Structure

Frame Extraction and Organization

Pre-processing

Square Face Illustration

Face Segmentation

Pix2Pix Model from Scratch

U-Net Architecture

Initial Values (Before Fine-tuning)

Results

Non-Segmented Faces

Segmented Faces

Results (not that great) Brief Disclaimer

To Find More

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages