How to create unknown face dataset for face recognition python

Question

I have a python face recognition where I am using open-face model and SVM to detect and recognize faces. The general steps I am following to recognize image is below:

Detect face using face detection model: Reason for using open face model instead of HAAR cascase is that cascade is not able to detect side face
Extracting face embedding: Extracting the 128 d face embedding using open face model
Training: Using SVM I am training the face embedding with appropriate label like below:

params = {"C": [0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0], "gamma": [1e-1, 1e-2, 1e-3, 1e-4, 1e-5]}

model = GridSearchCV(SVC(kernel="rbf", gamma="auto", probability=True), params, cv=3, n_jobs=-1)

model.fit(data["embeddings"], labels)
Testing: Extracting the face embedding of the test image, and predicting the results like below:

model.predict_proba()

I have unknown random face dataset and known person face dataset. The problem here is that if I add around 30 known person image and if I have around 10 unknown person image, it is recognizing the known person fine but if any unknown person comes in, it is also recognizing that unknown person as known person with high confidence which in actual should be unknown.

If I add more random person in unknown data set lets say around 50 images and if I have 30 known person image. It is recognizing known person image fine but confidence is low and if any unknown person comes in, it is now recognized as unknown

It looks like for good face recognition results we need to have appox same number of known and unknown person image which is practically not possible as known person images can increase to 100 or more than that for each known person we add. I am very confused here and not sure what to do. Is there any other way of recognizing known/unknown persons. Please help. Thanks

emilioho2020 · Accepted Answer · 2020-04-16 15:26:42Z

1

It is normal that confidence decreases as the number of possible persons (number of labels) increases, as there are more possibilities. I'm trying to understand what you meant: you have a label for each person and then an additional label for unknown? That is not the way to go, as unknown is treated as any other person embedding. You should use a cutoff probability, and everything that falls below that is considered unknown.

Remember that there is a trade-off between the size of your prediction (more persons, more possibilities) and accuracy

edited Apr 16, 2020 at 15:26

answered Apr 16, 2020 at 15:22

emilioho2020

1029 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

S Andrew Over a year ago

How can I decide upon cutoff probability. Do you have any link to article/code.?

emilioho2020 Over a year ago

Just compute the optimal value on your test data, by optimizing the unknown person detection rate in function of the cutoff probability. It is only one parameter so I would just set it manually. You could just compute HITS@k = (number of faces correctly classified as unknown)/(total numer of unknown face) for some values k or use any other more advanced metric. I don't immediately have any code or article, but I I found this thread on github that I think relates to your issue: [github.com/cmusatyalab/openface/issues/144 ]

Andrey Smorodov · Accepted Answer · 2020-04-16 15:29:34Z

1

I don't think svm will work well here. It is binary classifier by native. It will try to compute the border between two 128D points sets (known and unknown classes), but these classes are not internally connected with any relations. Known may be similar to unknown more than to another known in embedding space. That will be a problem for generalization for SVM. SVM may be used on closed sets, but you have open set for unknown faces.

It is more practical to use non-parametric methods, and use Bayesian approach, computing likelihoods as function of distance for known data in embedding space. Like in your previous question.

answered Apr 16, 2020 at 15:29

Andrey Smorodov

10.9k2 gold badges37 silver badges43 bronze badges

5 Comments

Andrey Smorodov Over a year ago

May be this byclb.com/TR/Tutorials/neural_networks/ch11_1.htm there is also good book. Bishop "machine learning and pattern recognition".

Andrey Smorodov Over a year ago

You will have the same thresholds for all your known points as in your previous post. The rule is: distance > threshold for all photos of known persons -> unknown

Andrey Smorodov Over a year ago

This also may be useful: arxiv.org/pdf/1810.11160.pdf

S Andrew Over a year ago

Hi Andrey, one quick thing wanted to know. As you said that svm will not work well here. By this you did you mean that we can still use it for training purpose but cannot use it to predict. Or its not even good for training as well.?

Andrey Smorodov Over a year ago

SVM may be used for face recognition task. When you have fixed set of pesons and not need to identify unknown ones. Because SVM divides all available spaca by class regions, no unclassified regions in embedding space remains. So, there are stages to make recognizer: train feature space (very large DS) ( you have it done ), compute threshold (large DS), use your small DS to compute distances to quired face. If dist < thres, these faces are same. If no mathcing DS face, then it unknown.

Collectives™ on Stack Overflow

How to create unknown face dataset for face recognition python

2 Answers 2

2 Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related