Michael duPont
2017-04-18
I am making a robot that can recognize and track me and only me. I'm building it using a Raspberry Pi running Raspbian, the official RPi camera module, a robot chasis and motor PHat from Adafruit, and Python to integrate the motors, camera, and facial recognition models. For the purposes of the capstone project, I will limit the scope of this report to the image processing and facial recognition portion of the overall project.
Just focussing on the facial recognition component, we will need to train a model to find and recognize my face in an image. Those two verbs actually describe two different models we will need to have:
There will need to be some handling between the two models, and we'll want all of this functionality wrapped into a single function. Here's what the image-model pipeline would look like:
That should make the models easy to work with in the larger project code.
The first model will need to scan the image pixels looking for the best face candidates. Fortunately, this particular model is not project-specific, and I know where I can find and implement an existing one to save a lot of time. The second model will need to be trained to distinguish my face from others' given to it. I've implemented facial recognition using Scikit-Learn and SVMs before, but I want to use Keras (with TensorFlow) for this particular project. I've used it once before for object detection, and this seems like the perfect next step.
Since we'll be making two separate models, we'll need two sets of metrics, but first here are some contextual definitions for each model that we'll use below:
Our first model needs to be able to identify all faces in an image while limiting the number of non-faces. We'll prioritize recall (how many ) for the first model since it's more important that every face be identified. Even if a few false positives make it to the second model, they should have little chance of passing a model trained to identify a specific face.
Because we're working with facial recognition, our second model needs to limit false positives more than True negatives. We'll prioritize precision for this second model because the model will constantly be fed images, and it is preferable for the model to be overly trained and miss a few valid frames.
The first thing we need to do is pick out faces from a larger image. Because the model for this is not user or case specific, we can use an existing model. Since we're going to be using OpenCV to do some image preprocessing, the easiest choice would be to use a Haar Cascade for finding each face. Briefly, a Haar Cascade is a series of hierarchical classifiers where the lowest levels look for specific orientations of points, edges, and lines. Check out OpenCV's documentation page on Haar Cascades for more info. There's a webpage I've used before which lists a few different Haar Cascades for body and face detection. This specific version will identify a full face when viewed from the front. We still have to tune a couple of hyperparameters, but it's better than having to build and train an entire model (which we will have to do later).
import cv2
import numpy as np
CASCADE = cv2.CascadeClassifier('findme/haar_cc_front_face.xml')
def find_faces(img: np.ndarray, sf=1.1282, mn=5) -> np.array([[int]]):
"""Returns a list of bounding boxes for every face found in an image"""
return CASCADE.detectMultiScale(
cv2.cvtColor(img, cv2.COLOR_RGB2GRAY),
scaleFactor=sf,
minNeighbors=mn,
minSize=(30, 30),
flags=cv2.CASCADE_SCALE_IMAGE
)
That's really all we need. OpenCV has native support for cascade classifiers, which makes our model code short and simple. We have three tunable hyperparameters:
Now let's test it by drawing rectangles around a few images of groups. Each individual is unique across every photo, which will be important in our next step. Here's one example:
import matplotlib.pyplot as plt
from matplotlib.image import imread, imsave
%matplotlib inline
plt.imshow(imread('test_imgs/initial/group0.jpg'))
from glob import glob
def draw_boxes(bboxes: [[int]], img: np.ndarray, line_width: int=2) -> np.ndarray:
"""Returns an image array with the bounding boxes drawn around potential faces"""
for x, y, w, h in bboxes:
cv2.rectangle(img, (x, y), (x+w, y+h), (0, 255, 0), line_width)
return img
#Find faces for each test image
for fname in glob('test_imgs/initial/group*.jpg'):
img = imread(fname)
bboxes = find_faces(img)
print('Bounding Boxes for file:', fname)
print(bboxes, '\n')
imsave(fname.replace('initial', 'find_faces'), draw_boxes(bboxes, img))
plt.imshow(imread('test_imgs/find_faces/group0.jpg'))
After tuning the hyperparameters, we're getting good face identification over our test images. We recognized all but two of the 29 faces in the images and had only one false positive pictured below. The two faces that it missed were tilted (one as much as 45°), and the model is only trained to find faces that are mostly upright (< 20°).
plt.imshow(imread('report_imgs/falsepositive.jpg'))
Now for this model's metrics. We said we'd focus on recall, but, while that is still true, we can't really measure it according to the definition. He's the problem with applying pure metrics in this case: there's no good way to get a meaningful value given that the number of true negatives vastly outnumbers true positives. There might only be a few faces in an image, but every attempted rectangle which is correctly identified as not a face is a true negative. Because of the way this classifier works, this leads to ratios like 100k false negatives and 2 true positives.
With that in mind, tuning the hyperparameters was more subjective than scientific, but I am confident that it is tuned well enough to compare to the benchmark.
Microsoft offers a similar service via its Azure Cognitive Services Face API. We'll use this as a benchmark tool for this part of the project. We'll use a client library to send our sample image to the service and draw the bounding boxes that it returns.
import cognitive_face as faceapi
faceapi.Key.set('ad4edd8c666e41638b199a3bb20c6216')
def bboxes_benchmark(img_path: str, display_path: str=None):
"""Displays an image with bounding boxes sourced by AzureCS Face API
Can map the results onto a separate image if a display_path is given"""
faces = faceapi.face.detect(img_path)
bboxes = []
for face in faces:
rect = face['faceRectangle']
bboxes.append([rect['left'], rect['top'], rect['width'], rect['height']])
print(bboxes)
plt.imshow(draw_boxes(bboxes, imread(display_path if display_path else img_path)))
bboxes_benchmark('test_imgs/initial/group0.jpg', 'test_imgs/find_faces/group0.jpg')
We're replicating what we just did, except we are now using a commercial service to return the bounding boxes instead of our model. In the test image, the green boxes were returned by our model, and the white boxes were returned by Face API. The commercial service was able to give a tighter bounding box for each face, but both versions were able to give roughly the same results and identify each face.
Now let's use this to build a base corpus of "these faces are not mine" so we can augment it later with the face we want to target. Because our first model feeds data into our second, it makes sense to use it to create our second model's training data. We'll use the faces from the test images in the previous section because all of them are unique and cover a decently wide demographic. The code below will use the bounding boxes to save cropped images of each found face.
#Creates cropped faces for imgs matching 'test_imgs/group*.jpg'
def crop(img: np.ndarray, x: int, y: int, width: int, height: int) -> np.ndarray:
"""Returns an image cropped to a given bounding box of top-left coords, width, and height"""
return img[y:y+height, x:x+width]
def pull_faces(glob_in: str, path_out: str) -> int:
"""Pulls faces out of images found in glob_in and saves them as path_out
Returns the total number of faces found
"""
i = 0
for fname in glob(glob_in):
print(fname)
img = imread(fname)
bboxes = find_faces(img)
for bbox in bboxes:
cropped = crop(img, *bbox)
imsave(path_out.format(i), cropped)
i += 1
return i
found = pull_faces('test_imgs/initial/group*.jpg', 'test_imgs/corpus/face{}.jpg')
print('Total number of base corpus faces found:', found)
plt.imshow(imread('test_imgs/corpus/face0.jpg'))
29 actual faces + 1 false positive - 2 false negatives = 28 corpus images
That verifies what we found earlier. I've manually removed the false positive pictured earlier from the corpus, so the final count comes to twenty-seven. Now that we have some faces to work with, let's save them to a pickle file for use later on.
from pickle import dump
#Creates base_corpus.pkl from face imgs in test_imgs/corpus
imgs = [imread(fname) for fname in glob('test_imgs/corpus/face*.jpg')]
dump(imgs, open('findme/base_corpus.pkl', 'wb'))
Now we need to add our target data. Since this is going to power a personal project, I'm going to train it to recognize my face. Other than adding some new images, we can reuse the code from before but just supplying a different glob string.
found = pull_faces('test_imgs/initial/me*.jpg', 'test_imgs/corpus/me{}.jpg')
print('Total number of target faces found:', found)
plt.imshow(imread('test_imgs/corpus/me0.jpg'))