The Michael Jordan in me
Ever heard of Michael B. Jordan? (Hint: He is not a basketball player)
He is the actor behind “that guy on fire” in Fantastic Four. Guess what? He looks 22% similar to me and here’s how I found it.
So I was reading about PCA and wanted to find which Hollywood actor looks closest to me. Well, who wouldn’t want to?
Eigenfaces are basically an approach that capture the variations (AKA Principle Components) of a set of images and uses this information to compare a new image. The way Eigenfaces work is out of the scope of this post.
There’s a distinction between face detection and recognition which many places on interweb seem to get wrong. Face detection is finding a face in an image. This is usually done with a haar classifier. Training a haar classifier is detailed in one of my previous posts. Face recognition on the other hand is finding the similarity of a given image with another image from a set of images. This is pretty hard compared to face detection.
Back on the topic.
The process involves 3 steps.
- Collecting the faces of actors
- Training the classifier
- Recognizing a new image
I ran a google search on ‘Hollywood actors’ saved the results page. And then copied all the images from the resources directory. That’s a pretty old school way of doing it. Initially I wrote a little script to scrape off the images but it returned only 20 images as Google lazy loads the rest as we scroll.
Images straight out of google cannot be used as we want the faces and nothing else. I used a haar classifier to detect and crop the faces off the images.
Here’s the code.
import cv2
import glob
def detect(path):
img = cv2.imread(path)
cascade = cv2.CascadeClassifier("haar/haarcascade_frontalface_alt.xml")
rects = cascade.detectMultiScale(img, 1.3, 4, cv2.cv.CV_HAAR_SCALE_IMAGE, (20,20))
if len(rects) == 0:
return [], img
rects[:, 2:] += rects[:, :2]
return rects, img
def box(rects, img, file_name):
for x1, y1, x2, y2 in rects:
cut = img[y1:y2, x1:x2] # Defines the rectangle containing a face
file_name = str(file_name) + '.jpg'
print 'Writing ' + file_name
cv2.imwrite('cropped/' + str(file_name), cut) # Write the file
def main():
imglist = glob.glob("raw/*.jpg")
counter = 1 # File name
for image in imglist:
rects, img = detect(image)
box(rects, img, counter)
counter += 1
if __name__ == "__main__":
main()
Then realized that cropped images are of different dimensions. We need to have images of same dimensions to train the classifier. So I wrote another script to resize the cropped faces (On hindsight I should have just edited the previous script to save the resized images without writing a new script :/)
import sys
import glob
import cv2
def resize(img, size):
return cv2.resize(img, size, interpolation = cv2.INTER_AREA)
def store(img, file_name):
cv2.imwrite('cropped2/' + str(file_name) + '.jpg', img)
def main():
path = sys.argv[1] + "/*.jpg"
imglist = glob.glob(path)
imglist = [cv2.imread(x) for x in imglist]
# print imglist
resizedlist = [resize(x, (25, 25)) for x in imglist]
counter = 0
for i in resizedlist:
store(i, counter)
counter += 1
if __name__ == "__main__":
if len(sys.argv) < 1:
print "USAGE: resizer.py </path/to/images>"
sys.exit()
main()
I used pyfaces, a pretty old implementation of eigenfaces as the classifier. It works on top of OpenCV and numpy (as always). The code looks neat. https://code.google.com/archive/p/pyface
Feed the images into the pyface along with the image you want to compare. It will create eigenfaces based on the set of images and compare with the other image you feed. Result is the closest image.
python pyfacesdemo yourimage dirname numofeigenfaces threshold
And you heard it, I look 22% similar to Michael B. Jordan compared to the rest of the actors.