Projects 2022

Spring 2022
Detection of Over-Rotated Biometric Images and Incorrect Labeling of Fingerprints
M.G. Sarwar Murshed (CU), Keivan Bahmani (CU), Stephanie Schuckers (CU), Faraz Hussain (CU) Deen Dayal Mohan (UB), Nishant Sankaran (UB), Srirangaraj Setlur (UB)
Biometric matching systems that rely on face/fingerprint/iris images are inherently vulnerable to rotations of the source images leading to a reduction in matching performance. In this work, we propose a two-pronged approach to reduce the effect of over-rotations. First, we aim to utilize our human-annotated bounding boxes in our previously acquired dataset and rotate each fingerprint in slaps to simulate the effect of over-rotated fingerprints. We plan to develop and train an instance segmentation model[3] using our new dataset containing both naturally and synthetically over-rotated fingerprints and out-of-order fingerprints to produce a model robust against over-rotation and incorrect labeling of the fingerprint. Secondly, we aim to design self-supervised learning (SSL) approach to train a CNN to detect the rotation angle of biometric features in an image. The approach (similar to contrastive learning[1]) would take an image and its rotated variant as input and use an encoder network to compute their embeddings and use another projector network to predict the rotation applied to the image. Considering landmarks in the biometric image, we can employ additional supervision signals to improve the accuracy of the projector network. Using SSL, we can leverage infinite amounts of training data without the need for large amounts of data annotation to train a robust rotation detection model.

A Benchmark Dataset for Neural Vocoder Identification
Siwei Lyu (UB), David Doermann, and Srirangaraj Setlur (UB)
Audio DeepFakes created by the AI algorithms are becoming a more important form of impersonation and disinformation. It is of practical importance to develop detection methods for audio DeepFakes. One important component common in the Audio DeepFake synthesis algorithms is the vocoder, which is the model converts spectrogram to audio waveforms. As such, an effective approach to detect audio DeepFakes is to identify the traces of vocoders in the waveforms. However, there exists no dataset of audio samples created using various vocoders from the same input audio samples, which is the most critical bottleneck for the study of vocoder-based audio DeepFake detection. In the proposed work, we aim to build a large-scale benchmarking dataset for vocoder identification. We will use human voice samples from LibriTTS and VTCK which contains 2.4k speaker samples as the input, and create synthetic voices using nine state-of-the-art vocoder models, including WaveNet, nv-WaveNet, FFTNet, WaveRNN, LPCNet, WaveGlow, waveGAN, waveGrad, and specGAN. We will also study vocoder identification algorithms based on the developed dataset.

Biometric Aging in Children: Phase IV
S. Schuckers, M. Imtiaz, P. Das, L. Holsopple
Study of biometric recognition in children has applications in areas including immigration, refugee efforts, and distribution of benefits. We have continued data collection over 6 years from 9 sessions (missed 3 sessions due to the pandemic) towards creating a longitudinal dataset from the same 239 subjects aged 3 to 17 yrs., at an interval of 6 months, with the Potsdam, NY Elementary, Middle and High Schools. Six modalities are being collected: finger, face, iris, foot, ear, voice. New participants are being added to the study every year. Partial assessment of the data has been performed towards multiple goals related to biometric aging. We propose to continue with the data collection for 2 additional years to expand the longitudinal dataset of child biometrics of 6 modalities.

Extended Multi-Modal Gait and Anthropometric Data Collection and Analysis
Karthik Dantu (UB), Srirangaraj Setlur (UB)

Recent advances as well as initiatives from federal agencies such as IARPA BRIAR are bringing human identity recognition in the wild to reality. To test these algorithms, there is a dire need of realistic datasets collected in harsh conditions such as high pitch angle, long distances and limited visibility of the face. Last year, we developed infrastructure to perform registration and data collection of 40 subjects in indoor and outdoor scenarios. Indoors, we used motion capture system for accurate gait keypoint and pose detection. Outdoors, we used external camera and post processing for gait keypoint and pose tracking. In this proposal, we will extend the dataset to have 100 subjects with diversity in age, gender, ethnicity and height. Such diversity is extremely important to develop fair and unbiased algorithms for use in the real world. Second, we will enhance the semi-automated annotation of videos so that we can annotate outdoor data automatically. Finally, we will benchmark state-of-the-art face and whole-body recognition algorithms on the extended dataset before release to the public. This can serve as a baseline for algorithm developers to improve and build upon.

Facial Image Quality Vector Assessment
Nasser M. Nasrabadi (WVU) and Amol Joshi (WVU)

in this project, we are proposing a new facial image quality vector assessment (FIQVA) that predicts the image quality but also performs a facial analysis and provides a list of facial factors that resulted in the estimated image quality. The proposed algorithm is based on a multitask deep neural network (MDNN) that estimates the face quality (the primary task) and quantifies the nuisance parameters (the related secondary tasks) such as the illumination level, pose, blurriness, closed eyes, open mouth and occlusion. The proposed MDNN jointly exploits all the related tasks to estimate a more accurate facial quality and provides quantitative values for the predicted nuisance factors that are affecting the quality of the facial image.

Introducing Intentional Distortion to Distinguish Between Live and Fake Fingers
Srirangaraj Setlur (UB), Venu Govindaraju (UB)

Fingerprint spoof attacks are one of the most common forms of biometric presentation attacks. While a lot of work has been done on formulating fingerprint spoof detection as a general image classification problem, limited work has been performed in composing fingerprint spoof detection as a temporal learning problem. Under motion-induced distortion, the difference in elastic properties of a live finger and a synthetically generated spoof can be observed. In this work, we propose to investigate these elastic differences for detecting fake fingers by introducing intentional distortions during acquisition via sliding and twisting motions. Widely used spoof datasets such as LivDet11-17 lack the temporal information required for this study, hence we propose to collect a new distortion based fake fingerprint dataset using various types of spoof materials and different kinds of distortions. Later, 3D CNNs, and Spatio-temporal transformers would be used to learn discriminative features for classifying fake fingerprints based on distortion.

Latent Fingerprint Image Enhancement to Improve Recognition
Jeremy Dawson (WVU), Nasser Nasrabadi (WVU)
We propose a multi-task conditional Generative Adversarial Network (cGAN) latent fingerprint enhancement algorithm to amplify the ridge structure quality to help the fingerprint examiner determine the quality and usefulness of the latents. The cGAN architecture is an image-to-image translation model which uses the corrupted or incomplete latent as input and generates a clear latent with no artifacts. The image translation must be learned in such a way that the ridges, valleys and other fingerprint features, including minutiae of the original livescan fingerprint, are preserved. Therefore, we propose to train a multi-task cGAN architecture, where we also generate a quality map, ridge patterns and minutiae of the enhanced latent to match to that of the original livescan fingerprint.

On the Capacity and Uniqueness of Synthetic Face Images
Vishnu Boddeti (MSU), Arun Ross (MSU)
In this project we will address the following questions, given a generative face model, 1) how many unique identities can it generate? and 2) how many unique identities can it generate across different demographic subsets? In other words, we seek to establish the capacity of generative face models. A scientific basis for answering these questions will not only benefit the evaluation and comparison of different generative face models but will also establish an upper bound on the number of unique identities that can be generated.

One-to-One Face Recognition with Human Examiner in the Loop
Nasser M. Nasrabadi (WVU) and Jeremy Dawson (WVU)
Current one-to-one face verification algorithms are still struggling to achieve a high matching accuracy when they encounter a face probe from a real-world (wild) scenario to be matched against a big gallery of faces. One way, to improve the performance of a face matcher is to capture some auxiliary information (textual description) about the test face and integrate it with the features from the face data. The facial description will be provided by an expert human examiner in the loop. In this project, we are proposing to extract key words from the textual description (e.g., hairstyle, ethnicity, gender, age, scars, freckles, pose, etc.) provided by the human examiner about a wild face photo and fuse this information with the features obtained from the facial data during the one-to-one face matching. By fusing the textual description with the facial features, we expect to boost the performance of the face matcher and retrieve a more accurate matching face from a gallery of mug-shot faces. We will use the natural language representation (NLP) tools such as the “Word2Vec” to convert the key words in the textual description into vector representations and integrate them as an auxiliary feature into our deep face matcher. We will use some of the key facial descriptions (e.g., pose, blurriness, illumination) to pre-process the fac probe and fuse the remaining soft biometrics-based textural descriptions (e.g., gender, age, ethnicity, and facial marks) with the face data feature embeddings.

Robust Contactless Fingerprint Processing Tool
M. G. Sarwar Murshed (CU), Stephanie Schuckers (CU), Faraz Hussain (CU)
Contactless acquisition of fingerprints happens in a fundamentally different way from contact-based capture. Usually, an image of four fingers of one hand is captured from a distance using a camera which is then processed. The contactless fingerprint processing pipeline has to initially segment the input image into the four separate fingerprints. Due to the likely inconsistency in both the distance and camera angle during the acquisition process of contactless fingerprints, the segmentation of each fingerprint correctly is key for better performance of the fingerprint recognition system. In our previous
work, we designed a new segmentation system (CFSEG) and showed that it outperforms NFESG [3] in terms of segmenting contact-based fingerprints. In this work, we aim to develop a new contact-less fingerprint system that can segment the fingerprints correctly and store them in formats such as WSQ and JSON in order to allow broad compatibility across AFIS worldwide.

Biometric Recognition in Newborns, Infants and Toddlers (Gateway)
Stephanie Schuckers (CU), Masudul Imtiaz, Kathleen Terrnce (CPH), Anil Jain (MSU)
There are a range of applications where biometric recognition in children would be a useful tool including infant re-identification in the hospital, vaccine administration, refugees, missing children, child trafficking, travel documents, etc. However, the science of biometric recognition for children is still very sparse and includes such questions as whether biometric recognition is effective at early ages, are there methods to adjust the biometric to accommodate growth, what is the earliest age a biometric is viable, etc. At Clarkson, we have studied biometric recognition in children starting at age 4 in the Potsdam elementary school. In our research we have followed the same children for five years and have been one of the first to publish a range of papers around iris, face, voice and fingerprint, examples provided below. This proposal is a new study for children from newborn to age 3. In this study, we envision a two-stage approach, both collecting biometric data in (1) newborns in the labor and delivery department at the St. Lawrence Health System (Potsdam, NY) and (2) infants and toddlers who have regular well child visits at the pediatrician’s office of Dr. Kathleen Terrence, also part of St. Lawrence Health System. We would perform all research under parental consent form approved by Clarkson Institutional Review Board (IRB) and would ensure we meet any requirements, IRB and otherwise, from St. Lawrence Health. We plan to pay the research subjects, as well as provide remuneration for medical staff, where appropriate. The project is being funded as a gateway project supported by Synolo who will provide a non-contact fingerprint scanner; thus fingerprint will be given priority in the collection. However, other modalities will be collected, where appropriate, and there is opportunity.

Independent Evaluation of Behavioral Biometrics (Gateway Project)
Daqing Hou (CU), Stephanie Schuckers (CU)
The continued media coverage of major data breaches and ransomware events reminds us that the nation needs effective user authentication methods. In fact, President Biden's Executive Order on May 12, 2021 mandated the adoption of multi-factor authentication by federal agencies within 180 days. Behavioral biometrics such as swipes and touch dynamics have emerged as attractive options due to their continuity, ubiquitousness, user-friendliness, and low-cost. Unfortunately, so far customers must rely on vendor-claimed performance measures of their own behavioral biometric products, which can be problematic due to biases in experimental settings (environment, behavior, and demographics). The objective of this proposal is to establish an independent performance evaluation framework including metrics, test procedures, and test harnesses for mobile behavioral biometrics following the principles and framework laid out in ISO/IEC 19795-1:2021 and the FIDO biometric certification process. One challenge in evaluating behavioral biometrics is to account for in the metrics and testing the fact that it is often used in a continuous basis for the user.

Fall 2022
A Perpetual Deep Face Recognition System
Nasser M. Nasrabadi (WVU) and Mahedi Hasan (WVU)
Deep learning-based Face Recognition (FR) systems have recently achieved state-of-the-art results in face recognition tasks. However, current deep FR systems are trained using only the training data from the available subjects (classes) during the training phase and are not equipped to easily consider new classes of data. In an ideal FR system, the new classes should be integrated into the existing FR model, sharing the previously learned parameters. However, when trained with incrementally added new classes, deep FR models suffer from the concept of ‘catastrophic forgetting’ when the earlier learned knowledge is lost, resulting in an overall decrease in recognition performance of old classes. Therefore, to solve this problem, we propose a deep continual learning-based FR model, using incoming new data and only a small exemplar set corresponding to samples from the old classes. Our approach will be based on a distillation loss concept that retains the knowledge acquired from the old classes, and a cross-entropy loss to learn the new classes.

A Study to Benchmark Smartphone Hardware and Software for High Quality Iris Data Collection
Soumyabrata Dey (CU), Masudul Imtiaz (CU)
Iris is one of the most popular biometric modalities that can provide highly accurate matching performance. Many of the iris-based authentication systems use IR imaging for better quality data capture. However, normal phone cameras can also be used for iris capture. For an iris authentication system to work with high efficiency, the phone camera captured data quality needs to maintain certain standards. In this work we propose to develop a mobile application that can consistently capture high quality images with controlled variation of capture quality. With some adjustment, the work can be used for other modalities such as non-contact fingerprints, faces, and palms. Our project objectives are the following. 1.Develop a smartphone application to capture high quality iris data using phone cameras. The application will have the following features. a) The app will automatically detect the eyes and iris regions and select camera focal points such that sharp quality iris can be captured. b) A phone with multiple cameras can be used to estimate stereo-based 3D distance between phone and the captured iris. We will use this information to guide the user to move the phone closer to the face or away from the face so that a uniform distance can be maintained for all captures. c) Phone flashlights will be used to minimize the environmental light variation, d) When the data capture button is pressed, multiple images will be taken each time focusing on a different region of iris. Finally, multiple images will be merged to create a single image with sharp iris regions. e) The app will inform the user if recollection is needed. 2.We will capture data using different phones and different lighting conditions to understand the best hardware platform and ideal data capture conditions. 3.We will collect an iris dataset of at least 100 subjects including senior citizens and children to understand whether the data quality is correlated with subjects age group. The quality of the dataset will be measured based on the iris matching accuracy.

Detecting Real-time DeepFakes with Active Forensics and Biometrics
Siwei Lyu and Srirangaraj Seltur (UB)
The COVID pandemic led to wide adoption of online video calls in recent years. However, the increasing reliance on video calls provides opportunities for new impersonation attacks by fraudsters using the advanced real-time deep fakes. Real time deep fakes pose new challenges to detection methods, which have to run in real-time as a video call is ongoing. In this project, we aim to develop a new active forensic method to detect real-time deep fakes. Specifically, we authenticate video calls by displaying a distinct pattern on the screen and using the corneal reflection extracted from the images of the call participant’s face. This pattern can be induced by a call participant displaying on a shared screen, or directly integrated into the video-call client. In either case, no specialized imaging or lighting hardware is required. We will evaluate the reliability of this approach under a range of imaging scenarios and validate this approach in real-world settings.

Effect of Specific Data Variations on Forensic Speaker Recognition Results
Jeremy Dawson (WVU), Nasser Nasrabadi (WVU)
The employment of multi-biometric identification booking processes is leading to increased interest in voice samples being included in a biometric record. However, while voice capture in controlled settings can lead to reliable matching of voice samples, real-world environmental conditions and hardware (input device, channel, etc.) variations, as well as human behavior (i.e., emotions), present serious challenges to the use of opportunistic voice samples in forensic identification. The goal of this project is to evaluate the impact of multiple nuisance factors on both automated and aural/acoustic approaches to speaker recognition. A comparison of impact will be performed between typical aural/acoustic parameters (fundamental frequency F0, formants, time varying patterns, jitter,shimmer, etc.) and black-box COTS speaker recognition methods such as MFCCs, i-vector, or other academic approaches.

Explainable Face and Fingerprint Matching via Improved Localization
Faraz Hussain (CU)
Due to the widespread adoption and deployment of deep learning based systems, recent years have seen immense interest in the explainability of such ML/AI systems, i.e. providing the reason why the system made a particular prediction/decision/classification [1]. Owing to the size and complexity of deep learning models, developing algorithms that can explain the results for the end-user remain a significant challenge [2]. For biometric systems like face/fingerprint matching, achieving better explainability can provide much needed transparency to the process. Research on explainability of computer vision systems focusing on face/fingerprint biometrics is still in its infancy e.g. there has been some preliminary work on explainable presentation attack detection techniques [3] and a benchmark was proposed for explainable face recognition in [4]. Recent research has used Class Activation Maps (CAM) for explainability of computer vision models [5]. Our main goal is to use modern CAM-based discriminative localization techniques such as Amplified Directed Divergence with ensembles and Scaled Directed Divergence [6] to achieve visual explanations of face/fingerprint matching systems. Our method will clearly indicate the spatial regions of a face/fingerprint image that the deep learning model considers as the most relevant for in arriving at its result, i.e. match/non-match.

LargE scale synthetically Generated fAce datasets (LEGAL2)
Sebastien Marcel (Idiap)
Recent work from the Machine Learning and Computer Vision communities is focusing on the use of Generative Adversarial Networks (GANs) for the creation of synthetic face images with some level of control on semantic factors (pose, expression, illumination, age, gender, …). However, investigations on the use of these synthetic samples as a biometric trait (face identities) are still lacking. This proposal is a continuation of the project CITeR-21F-02i-M that will still be focused on i-) the generation of synthetic biometric face datasets with a novel approach and ii-) the usage of such datasets to train different Deep Learning-based face recognition architectures and to benchmark face recognition systems. The proposed approach aims to learn a mapping within the StyleGAN latent space conditioned by semantic factors such that the synthetized face minimizes both a reconstruction and an identity loss.

Large-Scale Semi-Supervised Learning for Engine Audio Abnormality Detection and Understanding
Srirangaraj Setlur, Venu Govindaraju
Previously, the ability to create an “automobile fingerprint” has been developed through the use of deep learning and audio processing methods to identify vehicles and their abnormalities using multiple modalities. In this proposal, we continue work in this area, specifically investigating the use of semi and self-supervised learning strategies to improve acoustic engine abnormality detection with limited labels. Although there are large amounts of unlabeled engine audio recordings available, the cost of labeling these engine audio recordings is extremely high due to the complexity of the labeling task. Therefore, a semi-supervised learning framework may prove useful to improve detection capabilities without the high cost of labeling. Using already-labeled data from previous projects, we seek to leverage semi-supervised learning to improve engine abnormality detection capabilities. We also seek to explore multi-modal variations of semi and self-supervised learning, namely incorporating other modalities such as vibration and metadata, which have been previously shown to provide complementary information alongside audio to improve detection performance. We will also explore an additional modality to improve engine abnormality detection: ultrasonic signals.
Quality-Aware Deep Multimodal Biometric Recognition Systems
Nasser M. Nasrabadi (WVU) and Jerremy M. Dawson (WVU)
In this project, we are proposing a multimodal deep neural network (DNN)-based recognition model consisting of several quality-aware, modality-dedicated DNNs optimized jointly to provide a personnel identification system. Each individual modality dedicated DNN estimates a quality score and simultaneously maps its input modality into a compact latent feature vector. The extracted latent feature vectors from all the modalities are then weighted by their corresponding estimated quality scores, which are then fused via a so-called quality-aware fusion network, which consists of several fully connected layers. Our proposed multimodal network is trained to simultaneously learn the quality scores and the corresponding latent feature vectors for all the modalities such that an optimal classification decision can be reached.

Towards the Creation of a Large Dataset of High-Quality Face Morphs – Phase II
Nasrabadi (WVU), Dawson (WVU), Li (WVU), Liu (CU), Schuckers (CU), Doermann (UB), Setlur (UB), Lyu (UB)
In the Phase I of this DHS special Project, each university successfully generated several morph data sets using different morphing algorithms. Different face databases (i.e., FRGC-v2, FERET, WVU/CU multimodal face, and Celeb-a-HQ) were used to generate morphed faces. The team also submitted a single morph detector to NIST for evaluation, see NIST Morph FRVT report 2022. In Phase II, we will first prepare and deliver our current morphed data sets to NIST for evaluation. Each university in the second phase will develop more advanced morphing techniques such as exploring the latent codes of the StyleGAN, adding discriminative cost terms to the objective functions of the StylGAN network, using transformer architectures, self-attention mechanisms, diffusion-based techniques. The UB team will also employ a manual approach to remove visible morph artifacts introduced during the morph generation process. The CU and WVU teams will provide print & scan, compressed, adversarially perturbed and manually manipulated morph data sets to evade and mitigate the morphing artifacts. Finally, the team will provide single and differential morph detectors to be submitted to NIST for Morph FRVT evaluation in Phase II.

Fully Homomorphic Encryption in Biometrics: Phase 2
Nalini Ratha (UB), Vishnu Boddeti, Arun Ross (MSU)
The goal of this project is to continue the journey of exploring the use of fully homomorphic encryption (FHE) in biometrics. FHE provides quantum secure computations on the cloud which can impart privacy to biometrics data as well as inferencing outcomes. In this phase, we would like to propose three subtasks: (i) multi-modal fusion of homomorphically encrypted biometric templates for template-level fusion using non-linear techniques, (ii) template protection and matching in the encrypted domain, and (iii) score-level and decision-level fusion based on these schemes. There are several open-sourced FHE SDKs available. We will use both HEAAN and SEAL libraries in our research. During phase I, we focused on template fusion using linear projections. In contrast, in this phase, we plan to explore non-linear methods. We will consider shallow neural networks (2-4 layers deep) and approximate non-linear activation functions via composite polynomials. We will evaluate recent methods suitable for encrypted deep embedding templates. Further, we will design and develop FHE-aware score normalization methods and decision level fusion to demonstrate the value of privacy enhancements through FHE.

Scenario Testing for Presentation Attack Detection: Test Design and Requirements for Government Applications
Stephanie Schuckers (CU)
Presentation attacks are defined as presentation to the biometric data capture subsystem with the goal of interfering with the operation of the biometric system. Examples are creating artificial biometrics, called a presentation attack instruments (PAIs), such as gelatin or PlayDoh fingerprints, printed or displayed face/iris photo, or replayed video or audio recordings. PAIs could be based on another individual with the goal of posing as that person or could be used to avoid being detected as themselves. This is critical vulnerability to many applications, particularly high security applications, such as border security, trusted traveler programs, and remote identity proofing. For mobile authentication, FIDO has developed a set of requirements and test plans for independent testing and certification of biometric components. This addresses the mobile authentication use case specifically. For other use cases, requirements and test plans may differ; for example, types of PAIs tested, source of the biometric characteristics-cooperative or uncooperative; metrics measured-system: IAPAR/FRR, sub-system: BPCER/APCER; and spoofing for enrollment or verification.

Performance Evaluation and Demographic Analysis of Biometric Cryptosystems
Daqing Hou (CU), Stephanie Schuckers (CU)
The continued media coverage of large-scale data breaches reminds us of the pressing societal need of effective methods for securing and protecting user data and privacy. In particular, the rapid advance in novel biometrics such as face recognition has motivated the community to safeguard biometric templates and implement cancellable biometrics. Emerging biometric cryptosystems on the market encrypt the biometric template in a way that still allows for identity matching but makes it impossible to recover information related to the user’s identity based on only the secured template. However, it is also known that the secured template will negatively impact biometric matching performance. Thus the objective of this proposal is to establish a performance evaluation framework including metrics, test procedures, and test harnesses for biometric cryptosystems following the principles and framework laid out in ISO/IEC 19795-1:2021 and the FIDO biometric certification process. One focus is to account for the representativeness of demographic factors such as races / ethnicities, ages, and genders in evaluating the biometric cryptosystems.