FindMe Facial Recognition

Michael duPont

Project Overview

I am making a robot that can recognize and track me and only me. I'm building it using a Raspberry Pi running Raspbian, the official RPi camera module, a robot chasis and motor PHat from Adafruit, and Python to integrate the motors, camera, and facial recognition models. For the purposes of the capstone project, I will limit the scope of this report to the image processing and facial recognition portion of the overall project.

Project Components

Problem Statement

Just focussing on the facial recognition component, we will need to train a model to find and recognize my face in an image. Those two verbs actually describe two different models we will need to have:

  • Find - Locate all arbitrary faces in an image and return their locations
  • Recognize - Determine if a given face is or is not the target face

There will need to be some handling between the two models, and we'll want all of this functionality wrapped into a single function. Here's what the image-model pipeline would look like:

  1. Function takes an image
  2. The first model finds all faces in the image and returns their bounding boxes
  3. Crop each face from its bounding box and scale to a standard size
  4. Feed each preprocessed face into the second model
  5. If the target face is found, return True and its bounding box
  6. If no faces are found or none of them belong to the target, return False and None

That should make the models easy to work with in the larger project code.

The first model will need to scan the image pixels looking for the best face candidates. Fortunately, this particular model is not project-specific, and I know where I can find and implement an existing one to save a lot of time. The second model will need to be trained to distinguish my face from others' given to it. I've implemented facial recognition using Scikit-Learn and SVMs before, but I want to use Keras (with TensorFlow) for this particular project. I've used it once before for object detection, and this seems like the perfect next step.


Since we'll be making two separate models, we'll need two sets of metrics, but first here are some contextual definitions for each model that we'll use below:

  • True Positive
    1. A bounding box containing a face
    2. A target face identified as the target
  • False Positive
    1. A bounding box not containing a face
    2. A non-target face identified as the target
  • True Negative
    1. A non-face not given a bounding box
    2. A non-target face not identified as the target
  • False Negative
    1. A face not given a bounding box
    2. A target face not identified as the target
  • Precision
    1. How many returned bounding boxes contain faces?
    2. How many of the faces identified as the target were actually the target?
  • Recall
    1. How many faces in the image were given a bounding box?
    2. How many of the target faces were identified as the target?

Our first model needs to be able to identify all faces in an image while limiting the number of non-faces. We'll prioritize recall (how many ) for the first model since it's more important that every face be identified. Even if a few false positives make it to the second model, they should have little chance of passing a model trained to identify a specific face.

Because we're working with facial recognition, our second model needs to limit false positives more than True negatives. We'll prioritize precision for this second model because the model will constantly be fed images, and it is preferable for the model to be overly trained and miss a few valid frames.

Find Faces

The first thing we need to do is pick out faces from a larger image. Because the model for this is not user or case specific, we can use an existing model. Since we're going to be using OpenCV to do some image preprocessing, the easiest choice would be to use a Haar Cascade for finding each face. Briefly, a Haar Cascade is a series of hierarchical classifiers where the lowest levels look for specific orientations of points, edges, and lines. Check out OpenCV's documentation page on Haar Cascades for more info. There's a webpage I've used before which lists a few different Haar Cascades for body and face detection. This specific version will identify a full face when viewed from the front. We still have to tune a couple of hyperparameters, but it's better than having to build and train an entire model (which we will have to do later).

In [98]:
import cv2
import numpy as np

CASCADE = cv2.CascadeClassifier('findme/haar_cc_front_face.xml')

def find_faces(img: np.ndarray, sf=1.1282, mn=5) -> np.array([[int]]):
    """Returns a list of bounding boxes for every face found in an image"""
    return CASCADE.detectMultiScale(
        cv2.cvtColor(img, cv2.COLOR_RGB2GRAY),
        minSize=(30, 30),

That's really all we need. OpenCV has native support for cascade classifiers, which makes our model code short and simple. We have three tunable hyperparameters:

  • Scale Factor: "How much the image size is reduced at each image scale." detectMultiScale will feed the given image into the cascade at different sizes to find the optimal bounding box. This affects the rate at which the image scales between runs.
  • Minimum Neighbors: "How many neighbors each candidate rectangle should have to retain it." As the cascade builds up to larger candidate rectangles, the rectangle in the current layer must contain this minimum number of rectangles which passed the previous classifier.
  • Minimum Size: "Minimum size of candidate rectangle" aka potential face. This will make the model a little faster and limit distant faces from being recognized.

Now let's test it by drawing rectangles around a few images of groups. Each individual is unique across every photo, which will be important in our next step. Here's one example:

In [99]:
import matplotlib.pyplot as plt
from matplotlib.image import imread, imsave
%matplotlib inline

<matplotlib.image.AxesImage at 0x11987c080>
In [100]:
from glob import glob

def draw_boxes(bboxes: [[int]], img: np.ndarray, line_width: int=2) -> np.ndarray:
    """Returns an image array with the bounding boxes drawn around potential faces"""
    for x, y, w, h in bboxes:
        cv2.rectangle(img, (x, y), (x+w, y+h), (0, 255, 0), line_width)
    return img

#Find faces for each test image
for fname in glob('test_imgs/initial/group*.jpg'):
    img = imread(fname)
    bboxes = find_faces(img)
    print('Bounding Boxes for file:', fname)
    print(bboxes, '\n')
    imsave(fname.replace('initial', 'find_faces'), draw_boxes(bboxes, img))

Bounding Boxes for file: test_imgs/initial/group0.jpg
[[151 118  68  68]
 [363 112  61  61]
 [231 106  65  65]
 [ 73 131  71  71]
 [298 101  74  74]] 

Bounding Boxes for file: test_imgs/initial/group1.jpg
[[467  48  60  60]
 [304  60  56  56]
 [ 68  61  69  69]
 [168  71  60  60]
 [369 100  65  65]
 [235 120  56  56]
 [303 138  67  67]] 

Bounding Boxes for file: test_imgs/initial/group2.jpg
[[417  48  73  73]
 [125  27  78  78]
 [767  34  89  89]
 [253  34  81  81]] 

Bounding Boxes for file: test_imgs/initial/group3.jpg
[[289  50  44  44]
 [153  31  40  40]
 [ 83  88  49  49]
 [154  98  39  39]
 [232  96  48  48]] 

Bounding Boxes for file: test_imgs/initial/group4.jpg
[[450  73  67  67]
 [633  30  83  83]
 [113  46  73  73]
 [244  83  72  72]
 [358  97  80  80]
 [524  88  87  87]
 [177 122  74  74]] 

<matplotlib.image.AxesImage at 0x119b98eb8>

After tuning the hyperparameters, we're getting good face identification over our test images. We recognized all but two of the 29 faces in the images and had only one false positive pictured below. The two faces that it missed were tilted (one as much as 45°), and the model is only trained to find faces that are mostly upright (< 20°).

In [102]:
<matplotlib.image.AxesImage at 0x115f20cf8>

Now for this model's metrics. We said we'd focus on recall, but, while that is still true, we can't really measure it according to the definition. He's the problem with applying pure metrics in this case: there's no good way to get a meaningful value given that the number of true negatives vastly outnumbers true positives. There might only be a few faces in an image, but every attempted rectangle which is correctly identified as not a face is a true negative. Because of the way this classifier works, this leads to ratios like 100k false negatives and 2 true positives.

With that in mind, tuning the hyperparameters was more subjective than scientific, but I am confident that it is tuned well enough to compare to the benchmark.


Microsoft offers a similar service via its Azure Cognitive Services Face API. We'll use this as a benchmark tool for this part of the project. We'll use a client library to send our sample image to the service and draw the bounding boxes that it returns.

In [23]:
import cognitive_face as faceapi


def bboxes_benchmark(img_path: str, display_path: str=None):
    """Displays an image with bounding boxes sourced by AzureCS Face API
    Can map the results onto a separate image if a display_path is given"""
    faces = faceapi.face.detect(img_path)
    bboxes = []
    for face in faces:
        rect = face['faceRectangle']
        bboxes.append([rect['left'], rect['top'], rect['width'], rect['height']])
    plt.imshow(draw_boxes(bboxes, imread(display_path if display_path else img_path)))

bboxes_benchmark('test_imgs/initial/group0.jpg', 'test_imgs/find_faces/group0.jpg')
[[79, 141, 59, 59], [305, 113, 59, 59], [236, 115, 57, 57], [156, 129, 57, 57], [366, 120, 56, 56]]

We're replicating what we just did, except we are now using a commercial service to return the bounding boxes instead of our model. In the test image, the green boxes were returned by our model, and the white boxes were returned by Face API. The commercial service was able to give a tighter bounding box for each face, but both versions were able to give roughly the same results and identify each face.

Build Dataset

Base Corpus

Now let's use this to build a base corpus of "these faces are not mine" so we can augment it later with the face we want to target. Because our first model feeds data into our second, it makes sense to use it to create our second model's training data. We'll use the faces from the test images in the previous section because all of them are unique and cover a decently wide demographic. The code below will use the bounding boxes to save cropped images of each found face.

In [101]:
#Creates cropped faces for imgs matching 'test_imgs/group*.jpg'

def crop(img: np.ndarray, x: int, y: int, width: int, height: int) -> np.ndarray:
    """Returns an image cropped to a given bounding box of top-left coords, width, and height"""
    return img[y:y+height, x:x+width]

def pull_faces(glob_in: str, path_out: str) -> int:
    """Pulls faces out of images found in glob_in and saves them as path_out
    Returns the total number of faces found
    i = 0
    for fname in glob(glob_in):
        img = imread(fname)
        bboxes = find_faces(img)
        for bbox in bboxes:
            cropped = crop(img, *bbox)
            imsave(path_out.format(i), cropped)
            i += 1
    return i

found = pull_faces('test_imgs/initial/group*.jpg', 'test_imgs/corpus/face{}.jpg')

print('Total number of base corpus faces found:', found)
Total number of base corpus faces found: 28
<matplotlib.image.AxesImage at 0x116392048>

29 actual faces + 1 false positive - 2 false negatives = 28 corpus images

That verifies what we found earlier. I've manually removed the false positive pictured earlier from the corpus, so the final count comes to twenty-seven. Now that we have some faces to work with, let's save them to a pickle file for use later on.

In [115]:
from pickle import dump

#Creates base_corpus.pkl from face imgs in test_imgs/corpus
imgs = [imread(fname) for fname in glob('test_imgs/corpus/face*.jpg')]
dump(imgs, open('findme/base_corpus.pkl', 'wb'))

Target Corpus

Now we need to add our target data. Since this is going to power a personal project, I'm going to train it to recognize my face. Other than adding some new images, we can reuse the code from before but just supplying a different glob string.

In [105]:
found = pull_faces('test_imgs/initial/me*.jpg', 'test_imgs/corpus/me{}.jpg')

print('Total number of target faces found:', found)
Total number of target faces found: 59
<matplotlib.image.AxesImage at 0x10ed1e6a0>