The Evaluation Problem
All current benchmarks (Labeled Faces in the Wild, LJB-A, YouTube Faces DB, Mega Face, etc.) use faces without masks. Even if a benchmark with masked faces were available, there’s no set way of interpreting results. We would like not only to get some accuracy, but also to evaluate the model’s resistance to wearing a mask. Specifically, we intend to compare the results on a benchmark without a mask and a benchmark with masks of the same complexity. Our approach to this problem was to create such a benchmark artificially. We decided to use a common benchmark like LFW and put masks on all of its photos. This should be enough to estimate the model’s resistance to mask wearing.
We found a lot of tools that allow you to put a mask on a photo of a face. We have selected those that use the key points of the face and put on a variety of masks. The tool from FaceX-Zoo seemed to be the best option. It uses a UV projection of faces and masks and matches them using key points.
Without going into mathematical details, a person’s photo is considered as a three-dimensional object and transformed into two-dimensional space. This process works the same way as transforming a globe into a flat map. The mask goes through the same transformation, then the flat mask is matched with the flat face using key points. The final result is obtained with a reversed transformation into three-dimensional space.
This is what LFW looks like after this transformation:
As you can see, the masks look quite natural. The only drawback is that we used only eight different masks, but we made sure they were the most-used versions and that each mask was worn on each face differently.
The results of our models
Exadel CompreFace supports face recognition systems based on FaceNet and InsightFace. For both, there are several models for calculating embeddings that differ in quality and speed. For our tests, we took the fastest models that are best suited for commercial use: Inception-ResNet-v1 for FaceNet and MobileFacenet for InsightFace.
|Benchmark \ model||InceptionResnetV1||MobileFacenet|
The results show that the models make mistakes much more often on the dataset with masks, but the resulting accuracy is still not bad. It should be taken into account that even for an ideal model, the accuracy will fall, because the mask covers the half of the face and hides a lot of information that can be used by the model. For our analysis, we decided to find out how far from ideal the model inevitably falls.
Another Approach For Evaluating Models
When someone isn’t wearing a mask, you can recognize their entire face, and when someone is wearing a mask, you recognize them from their whole face, minus the area that’s covered. We want about the same behavior from an ideal model. This means that if a person is wearing a mask, the model should not take into account information from the area where the mask is located. We have tested our models for this property.
To do this, we took MFR2 – the best of the few masked benchmarks that we managed to find. It consists of only 269 photos of 53 people, but the photos are quite high-quality, diverse in angle and light, and balanced by gender and race. Next, we formed an abstract mask – a black object that will cover the lower part of the face, regardless of whether it is in a mask or not. Then we transformed MFR2 by putting this abstract mask on each photo to find a new benchmark.
Before => After
We ran Exadel CompreFace models on both sets of photos. The idea was that at the original benchmark the models will decide for themself whether to take into account the masked area or not, and at the black-mask benchmark we artificially forbade them from taking into account the area under the mask.
This experiment generated very interesting results. For MobileFacenet, the accuracy was the same on both benchmarks; there was only a slight difference in the loss function. We checked all our MXNet models, and the result was similar. This means that the models determine themselves that a mask is worn on the face and do not take this area into account. This becomes even more interesting after remembering that on the artificial LFW in masks, the accuracy decreased, which means that in the original LFW, the models used information in the area of the mask and thus improved the quality. In general, with this approach to evaluation, CompreFace models look great.
Masks for Facial Recognition: Is It Possible To Improve?
We tried to improve the existing models by training them on the masked dataset. To do this, we used the above-mentioned tool from FaceX-Zoo and “put” masks on half of the Casia Webface dataset. As a result, we had a dataset consisting of 500k photos of 10k people, balanced in terms of wearing masks and with one drawback: masks are worn artificially. On this dataset, we trained two models: InceptionResetV1 from the Tensorflow part of the CompreFace and MobileFacenet from the MXNet part. We ran validation on LFW and LFW-Masked. Hyperparameters were selected in order to achieve a balance between them. Here are the best results that we managed to achieve in comparison with the original models:
As you can see, the results are ambiguous. The most obvious improvement occurred on LFW-Masked, but this is an artificial benchmark obtained in the same way as the training dataset. MFR2 also grew, but not by much, while LFW fell. Most likely, the models have become a little better at recognizing faces with masks at the expense of recognizing faces without masks.
It is important that the results on LFW-Masked and MFR2 were improved. This means that although the existing models are good at recognizing masked faces, they are not perfect, as could be concluded from the results of the previous paragraph.
According to the results of this research, we can conclude that modern models are quite good at recognizing faces in masks. Masks do not block face recognition at all. If you are dissatisfied with the quality of faces in mask recognition, your best option is to try newer models like those from Exadel Compreface, which were discussed in this article. If you still want to improve face recognition in masks, we would not recommend using heuristics like the black abstract mask described above. Experiments show that modern models are powerful enough to discard unnecessary information themselves. For a significant increase in masked face recognition, you will probably have to build a masked dataset yourself. If that’s not possible, then we would recommend concentrating on creating a tool that allows you to put on masks as realistically as possible. You can start by enriching the FaceX-Zoo tool with additional masks for more variety, or moving towards generating a mask using GANs.
Author: Ivan Kurnosau