A General Approach for Using 2D Object Detection for Facial ID

November 16, 2018

|

AI Industry Insights

A General Approach for Using 2D Object Detection for Facial ID

No items found.

There are several techniques to leverage computer vision for facial recognition. In this scenario, we are demonstrating an approach for facial recognition to identify persons of interest. There are multiple approaches to achieve a positive ID. Deep learning may be applied to map and identify a three dimensional plane for human faces. Alternatively, 2D object detection techniques to focus on human faces may also be applied. The former requires a large-scale scanned 3D faces which is prohibitively expensive to acquire, requires techniques which are presently being researched and developed, and achieves accuracy levels far below 80%. Seeing success with this approach will be difficult, uncertain, and expensive. We believe this approach is less practical. The latter (2D object detection techniques) requires high resolution photography as a dataset — which is achievable with a high quality camera. 2D deep learning techniques, convolutional neural nets (of CNNs) are applied today to various types of identification. Leveraging 2D facial recognition techniques identify a person of interest is practical, can achieve a high degree of accuracy (above 80%), and reduces total technical debt. We recommend developing a 2D facial recognition model prototype as a pragmatic approach for positive facial identification of a person of interest.

A General Approach:

Data is key. Initially, you need to assess and map the data acquisition process and data structures (e.g. what camera is used, how is the lighting, how many people generally appear in the photo). Based on the above, you should gather speed and accuracy requirements, establishing what is deemed “acceptable”. Then, assess and document hardware constraints.

Once a proper assessment is complete, begin and establish UI/UX flows for using the system and define API endpoints. Establish a system for data collection of videos/photos and labels. Then parse datasets into training, validation, and testing buckets. A proper data preprocessing pipeline will need to be developed to reduce blur and noise in the image and video data.

Architect a deep learning algorithm (CNN) or apply transfer learning to an existing model; trained on specific face detection, embedding and classification models. After the core face detection and recognition models are developed, you will need to develop a tracking model to track multiple objects throughout the video.

Finally, a database and backend that integrates and serves the models will be developed. A human-in-the-loop process to continuously improve the model will need to be defined and implemented.

Model Selection/Development:

We recommend starting from open source solutions for face detection, keypoint extraction, alignment and embeddings. This allows a quick evaluation of existing techniques on the application of interest, and informs us the directions to focus to improve either the modeling techniques or data acquisition. Then labeled images or videos are collected, with faces bounding boxes and identities annotated.

Tune the selected model with further feature engineering and hyperparameter adjustments. Then, test the system and identify where the system fails. Once the model has been selected, you will need to create a plan for re-training, testing, deployment and monitoring plans.

Technique Recommendation:

For face detection and recognition models, we recommend using a convolutional neural net (e.g. ResNet) as a backbone. For face detection loss function, we recommend using SSD loss (cross-entropy and regression loss). There are various choices for loss functions for the face embedding model:

Cross entropy loss
Triplet loss
Center loss

For keypoint extraction, we recommend using regression loss.

And, of course, you need training data. This may be a challenge. To achieve high levels of accuracy, you will need labeled training data (1 to 10 million faces).

Face Detection

Given an image, we need to detect the pixel region in which the face is present. Below is an example of our work where face detection is displayed.

AI computer vision

Input: image

Output: 1) bounding boxes (top, left, bottom, right) that contain frontal or slightly side-facing faces; 2) keypoints on faces that correspond to eyes, nose and mouth

Adopt multi-path state-of-the-art convolutional neural nets for computation.

convolutional neural nets for computation

Face Alignment

Use facial keypoints to perform 2D transformations on the image, so that eyes and month are in roughly normalized positions.

computer visions 2D transformation

Input: a cropped image patch with detected face

Output: image patch with face aligned

Face Embedding (Descriptor)

Input: cropped face image patch

Output: a vector that describe the face

The face embedding model will be a convolutional neural net. It can be based on a pre-trained model (For example: originally for classifying 1000 object categories — e.g. cats and dogs). Perform transfer learning, and fine tune the model on our face matching dataset, using either cross-entropy loss or triplet loss (there are new losses being proposed in more recent literature).

face cluster computer vision

Face Clustering on the Graph

The embedded face descriptors live in a vector space. The distance of two faces in this vector space indicates how different they are. Clustering algorithms are performed on the face graph specified by these pairwise distances between face vectors. Each cluster includes faces that are very likely of the same person.

clustering computer vision

Input: descriptors of faces

Output: face clusters, each face cluster is a set of faces that very likely belong to the same person.

Project Flow Chart:

Pre-processing and Standardization

Labeled training data is required to provide photos, labeled bounding boxes for faces, and identity (people’s name or ID) for each bounding box. The labels need to be organized into an easy-to-consume format, like PASCAL VOC format. For the scope of the prototype, we will assume data availability and structure.

computer vision process

Improving the System After Deployment

Construct a human-in-the-loop component that audits the model in production and annotates the correct labels of bounding boxes and person identities. This includes developing a batch training process that keeps ingesting new training data.

The purpose of this human-in-the-loop component is to enlarge the database of known faces and embedding vectors. They will try adding variation to the photos due to different lighting, face angle, with glass / w.o. glass etc. To increase the recall rate of recognizing that person in a new photo.

Accuracy depends on the distractor ratio (# of distractors per query face) of the 1:N face comparison. On Megaface (N=1 million), the most accurate systems achieve around 30% accuracy. On 1:1 face comparison, 99%+ accuracy can be achieved.

Potential Challenges:

The quality of the images for training and validating will impede project success:

Lighting too weak or too strong.
Low resolution.
Motion blur.
Fisheye camera.

Complicated scenarios provide additional noise, affecting the capability to identify faces accurately. For example, crowded scenes and occlusion from other objects must be isolated.

Additionally, the data size can be a burden on technology infrastructure and resources. Tens of millions of photos create huge amount of data. This is problematic for limited compute resources and creates long training cycles. This yields slow iteration.

Hardware provides constraints. If the model is deployed “on the edge” (e.g. on a Raspberry Pi or mobile phone) instead of a cloud instance, only a small subset of models can be used. High amounts of usage can be problematic. The requests to the face recognition API can swamp the compute capacity, if many streams of camera data simultaneously consume the API.

There will be speed constraints. If model is required to generate real-time predictions, this restricts the size and type of models that can be used.

Overcoming Challenges:

Image Quality

Use a high quality and high resolution camera; install the camera in such a way to achieve the optimal angle, distance, and lighting.

Complicated Scenario

Set expectation on model’s limitation. Focus on narrower use cases that identify the highest value targets.

Data Size

More compute resource may be required. We will develop a better distributed model training algorithm to lighten the load on resources.

High Amount of Usage

Provision enough machine resources during model deployment.

Training Process:

First, data needs to be gathered into a bucket in object storage (e.g. AWS S3). Work on developing a training pipeline using a TensorFlow/PyTorch/Caffe framework. Then GPU resources need to be provisioned. Once the models are trained, model artifacts and metrics are persisted into an object storage.

During model training, we recommend performing continuous hyperparameter tuning. The nobs to which we tune include and are not limited to:

Backbone architecture: VGG, ResNet, DenseNet, MobileNet, NASNet.
Learning rate scheduling.
Batch size.

Distributed training may be needed. If the number of images exceeds millions, distributed training on multiple GPU instances is often needed. We recommend Horovod (from Uber).

While 3D facial recognition is a viable solution, it’s not without challenges. It’s computationally expensive and current accuracy benchmarks may be too low for many applications. 2D object detection for face ID should be good enough to handle most use cases. The above provides an exploration of one approach of many. We encourage others to share theirs!

You Made It to Production: Now What?

Rethinking the AI Development Lifecycle

Why 90% of AI Projects Fail Before They Launch

A Gold Medal Moment for AI

Part 3: How to Choose an AI Governance Model That Works for Your Organization

The Real Breakthrough Behind DeepSeek R1

Anthropic Cracks Open the Black Box of AI

Predicting Cancer Before It Starts: An AI Milestone in Women’s Health

Reinforcement Learning: AI’s Next Big Leap

Copyright, Fair Use, and the Fight Over AI Training Data

The Real Illusion in Apple’s “Illusion of Thinking” Paper

Part 2: Designing AI Governance That Works

Part 1: Why AI Governance is a Strategic Imperative

Most People Don't Expect AI to Benefit Them. What Can We Do About That?

From Brain to Machine: How Neuroscience Is Shaping the Future of AI

KUNGFU.AI Partners with NACD to Equip Boards for the Age of AI

What Does “Productivity” Mean in an AI-Enabled World?

The Emergence of Product Analytics: An Under-appreciated Yet Critical Part of AI Development

The Academic in Industry: A Cultural and Pragmatic Shift

AI & Authenticity—What Does It Mean to Be "Real" in 2025?

AI is Like a Road Trip: Why You Need a Flexible Strategy, Not Just a Destination

Why Most AI Implementations Fail—And How to Get It Right

Reclaiming Attention in the Age of AI

Are Agents the Future?

Tired of the Hype? Let’s Baseline 10 Commonly Misused AI Terms

KUNGFU.AI’s AI Hiring Survival Guide

Part 3: How to Procure AI Services Through an RFP Process

Data Science: Bridging the Gap Between Business and Analytics

From Consumerism to Sustainability: AI’s Role in Shaping the Future of Economic Growth

Part 2: Planning for Next Year’s AI Budget: A Strategic Guide for C-Level Executives

Part 1: Building vs. Buying an AI Team: What’s Best for Your Business?

Mash-Up: AI and Potatoes USA Join Forces Against Misinformation

KUNGFU.AI Updates Ethical Pledge on Facial Recognition

3 Steps to Designing AI That Fits Like a Glove

LLMs are Engines. It’s Time for Vehicles.

Product Sense: A Hidden Lynchpin in Data Science and AI

Not Budgeting for AI Today is like Having Bet on the Slide Rule, Calculator or Fax

The Top AI Events We’re Looking Forward to in 2024

2024 Will Be The Year of The AI Budget

Engineering Explained: GPT-4V(ision)

KUNGFU.AI and CDAO Collaborate on AI Strategy for Defense Enterprise Ecosystem

Engineering Explained: Opportunity Sizing and ROI Analysis

Engineering Explained: Bayesian Mechanics

Celebrating Our Success: We Made the Inc. 5000 List of Fastest-Growing Private Companies in America!

10 Things Companies Should Think About When Devising an AI Strategy

Engineering Explained: Large Language Models

Engineering Explained: Diffusion Models

Understanding Data Science and Related Sub Sciences

KUNGFU.AI Joins Tradewinds’ Marketplace, Empowering Businesses with Cutting-Edge AI Services

How to Navigate the AI Industry: Join our Career Workshops

Innovation in the Age of Regulation: Building AI with Federated Learning

AI is the Future. ChatGPT is the assistant.

KUNGFU.AI’s Approach to Developing an ‘AI Center of Excellence’

KUNGFU.AI Joins INSA to Expand Government Partnerships and Reach

Data-Driven Decision-Making: Making Confident and Proactive Business Decisions

Navigating the Ethical Implications of Data Interpretation

Overcoming Cognitive Bias in Data Analysis and Decision-Making

ConvNeXt: A Transformer-Inspired CNN Architecture

How to Build a Great AI Engineering Team

Engineering Explained: LayoutLMv3 and the Future of Document AI

Turning Away Our First Client

AI Simplified: An Introduction to Artificial Intelligence

Introducing KUNGFU.AI Lab Days

Large Language Models: Three Stages of Adoption

The Future of AI: Can Open-Source Community Keep Up with Large Corporations?

How to Use ChatGPT: Our Step by Step Guide

What is ChatGPT? Everything You Need to Know.

Savimbo and KUNGFU.AI Partner to Bring AI to Rainforest Conservation

Data, Security, and Ethical Risks of AI Use in Healthcare

Engineering Explained: OpenAI's ChatGPT

4 Ways to Mitigate Bias and Prioritize Patients

We Used ChatGPT to Figure Out How Businesses Can Use ChatGPT

Want to WFH? Check Out These 10 Flexible Remote Companies

Where We Are and What's Coming

Meet the Team: Benjamin Klein

The First Mile of Any AI Project is Most Critical

Edge Computing for Business: What You Should Know

What You Should Know Before Investing in Computer Vision

KUNGFU.AI Presents: Using Computer Vision to Solve Business Challenges with WM

KUNGFU.AI Presents: Unlocking Greater Business Intelligence with Graphs

How Multitask Learning in Computer Vision Can Solve Your Business Challenges

Now Is the Time to Invest in Computer Vision and Secure a Competitive Advantage

Designing Your First NLP Annotation Job

Autism Acceptance Day

KUNGFU.AI Announces Chief Growth Officer and Record Growth

5 Ways to Realize ROI on AI investments

Join Us for Giving Tuesday

KUNGFU.AI Achieves Machine Learning Partner Specialization in the Google Cloud Partner Advantage Program

KUNGFU.AI Presents: The Obstacles in Building Product AI and How to Overcome Them

KUNGFU.AI Presents: The AI Ethical Imperative

Want to win with AI? Focus on your leadership, not the competition.

KUNGFU.AI Partners with Parasanti to Support U.S. Navy Foreign Object Detection Project

KUNGFU.AI and makepath Partner to Demonstrate Power of Machine Learning and Data Visualization

Deadline 2024: Why you only have 3 years left to adopt AI

How to Determine if AI can Solve Your Business Problem

Infographic: 10 Artificial Intelligence Trends To Watch Out For In 2021

Building Internal AI Capabilities: How to incorporate AI Ops into your organization

Building Internal AI Capabilities: Bridge the gap between data science and DevOps

Building Internal AI Capabilities: How to execute A.I. at scale

Building Internal AI Capabilities: How to ensure you have the right infrastructure & expertise

Related resources

AI Industry Insights

Part 3: How to Procure AI Services Through an RFP Process

basket of potatoes

AI Industry Insights

Mash-Up: AI and Potatoes USA Join Forces Against Misinformation

women baking in boxing gloves

AI Industry Insights

3 Steps to Designing AI That Fits Like a Glove