By Carly Etlinger
Artificial intelligence and facial recognition are known for being used in authentication and authorization processes, such as unlocking your smartphone or accessing confidential and sensitive data, but the technology has evolved to being used in law enforcement agencies to identify suspects in various crimes. For example, in 2019, the New York Police Department used facial recognition to identify an alleged rapist in under 24 hours. They have also used a photo of actor Woody Harrelson to catch a beer thief. However, despite the past successes with the technology in police departments, it has become very problematic in terms of its inaccuracies when it comes to identifying people in minority groups.
In January 2020, Robert Williams was arrested by the Detroit Police Department for a robbery of a jewelry store that he did not commit. The police department used grainy video surveillance footage for the facial recognition software, and he was falsely identified as the suspect. The police department relied too much on the software to realize its inaccuracy towards identifying people in minority groups, given that Robert Williams is an African American man. This incident caused shockwaves around the country, which led to multiple police departments banning the use of facial recognition. Although not related to this incident, San Francisco was the first city to ban facial recognition in their law enforcement agencies in May 2019 because of its proven inaccuracies.
I wanted to see if I could replicate this facial recognition problem on a smaller scale, since law enforcement databases contain hundreds of thousands of images of people. I researched and found numerous facial recognition tools to use, and I decided on using a combination of Python, OpenCV, and deep metric learning on an Ubuntu virtual machine. Deep metric learning is a type of artificial intelligence that involves training a neural network to accept a single input image and output a real-valued feature vector, which is a combination of numbers to quantify the face. OpenCV is a library of Python bindings designed to solve computer vision, and dlib was the neural network I decided to train for this project. Dlib contains the implementation of deep metric learning to construct the face embeddings used for the actual recognition process.
My project structure, containing the dataset, example images, Python files, and the face encodings.
My dataset was comprised of 16 actors, actresses, and singers of different genders and races, each with having more than 60 images of the person. I used the Bing Search API in a separate Python file called bing_image_api.py to quickly create my dataset with the usage, for example, python3 bing_image_api.py –query “lady gaga” –output dataset/lady_gaga. Lucy Liu stood out in the dataset because the API returned several photos of her in the same purple dress but in different angles. James Martinez also stood out because the API returned images of people who were named “James Martinez” since it is a common name, so I needed to delete images of the people who were not of the actor James Martinez.
Images 0-5, 7, and 13 show Lucy Liu with the same purple dress but in different angles.
The “examples” directory contains images of the people that are not in the dataset for the actual face recognition. The encode_faces.py file creates the encodings.pickle file, which holds the real-valued feature vectors to quantify the face. The usage to encode the faces is python3 encode_faces.py –dataset dataset –encodings encodings.pickle –detection-method hog. I ran into an issue while I was encoding the faces where my images were too large, and the encoding would abort after attempting to vectorize the first face. Therefore, I needed to resize all the images in my dataset by half, and I needed to switch from the CNN (convolutional neural network) method, to HOG (histogram of oriented gradients).
A screenshot of what happened after using the CNN detection method.
After my face encodings completed, I passed that pickle file as well as an example image not included in my dataset into recognize_faces.py, with the usage python3 recognize_faces.py –encodings encodings.pickle –image examples/<image filename>. My program was accurately able to identify actors and actresses that were in my dataset, such as a photo of Lady Gaga and Jennifer Lopez together and one of Lupita N’yongo.
Outputs of the program correctly identifying Jennifer Lopez and Lady Gaga (left), and Lupita N’yongo (top).
I decided to include a photo of actor Randall Park, who was included in my dataset, and actress Constance Wu, who was not, to see how the program would identify someone who was not in my dataset. In this case, she was incorrectly identified as Lucy Liu, most likely because they are both Asian women and the program determined that she looked the most like Constance, which is a problem when this is used in real-life applications.
Constance Wu (left) incorrectly identified as Lucy Liu, Randall Park (right).
I decided to try this again with a photo of five actors from the show American Horror Story with two of them being in my dataset, and one with someone who is not in my set at all. For the image with five actors, I received the expected results with the Finn Wittrock and Max Greenfield being identified correctly, but the program printed out the “unknown” identifier for the first time for two people who were not in my set. I am assuming my program could not determine the person that looked the most like those two actors. I chose a photo of rapper/TV show host Nick Cannon to use next because I have heard that a lot of people believe he looks very much like actor Michael B. Jordan, who was in my dataset. As expected, my program labeled Cannon as Michael B. Jordan.
Studies have shown that law enforcement databases and databases used in social media and big tech companies in the have been ineffective at identifying people in minority groups due of the lack of their images in these databases. My dataset has been even and uniform since I have included people of different races and genders equally, but I wanted to introduce only 5 new images of Nick Cannon into the set to see if the neural network will keep incorrectly identifying him as Michael B. Jordan due to the lack of images of him, or if it will label him correctly despite it. I repeated the same steps of collecting the images with the Bing Search API Python program, encoding the faces in the dataset into a pickle file, then putting the encodings through the program that completes the recognition.
My new dataset structure, including 5 images of Nick Cannon.
I had my recognize_faces_image.py file attempt to recognize Nick Cannon in two other photos, and I was met with some interesting results. One image correctly identified him, while the other still incorrectly identified him as Michael B. Jordan. This contradiction perplexed me a bit since his facial expressions in the photos are similar, but I guess not too similar to correctly identify Cannon in both photos.
Nick Cannon correctly identified.
There are many reasons why this contradiction occurred, even though these pictures appear to be taken around the same age. One reason this happened could be due to the fact that I used the HOG detection method instead of the CNN method when I started encoding their faces, which has been proven to be more accurate. However, like shown before, using the CNN method killed my program run because my virtual machine ran out of memory. Another reason could be due to lighting on his face. The encoding program could have placed facial features in different areas due to the inconsistent lighting or it simply does not have enough of Nick Cannon’s photos to work with.
In order to prove that my hypothesis about Nick Cannon not having enough photos in the dataset, I added five more photos to his folder and recalculated the encodings for my last experiment. I had the recognize_faces_image.py program attempt to recognize the same photo of him that was originally incorrectly identified, and it was successful in labeling him correctly as Nick Cannon. This is a direct representation of the problem in facial recognition programs when they misidentify someone who has very little images in a database to their name.
Looking back at the results of the program and my dataset, it seems like my small-scale interpretation of facial recognition databases only needed around ten photos of a person to accurately identify them, so I did not really need to include at least 60 photos of each actor and actress for the network to learn their faces. However, I thought it was a good learning experience for me to gather all those photos with the Bing Search API because much larger datasets will need to be created quickly and efficiently, and therefore a program will be needed to collect those images.
My experimental research is not completely indicative of all the facial recognition programs and technologies that are used in big tech and startup companies. I decided to comprise my dataset of actors and actresses because it is easy to find thousands of photos of them on the Internet. However, for larger datasets that are meant to identify citizens in the general public with little to no online presence, it is much more difficult to find enough photos of them to be accurately recognized. There is also the issue with infringing on someone’s privacy. Project Green Light, Detroit’s public-private facial recognition surveillance camera program, has been described from people who live in Detroit as “very invasive” and it “feels like you have eyes on you at all times”. It is also against Facebook’s and other social media sites’ terms of service to scrape users’ images off of the site, so that makes it more difficult to gather data legally and ethically. Therefore, several cities have decided to ban the use of facial recognition in police departments, airports, and other law enforcement agencies rather than risking privacy and potential inaccuracies, like what happened in Robert Williams’ case.
For future research and a possible project, I want to look more into the dlib neural network and if I could tweak it in a way that it could still accurately identify people with not a lot of images to their name, or if I could use the CNN detection method in the future.