Differential privacy, federated learning, and privacy-preserving credit risk modeling
Date
2023
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Given the sheer size of the consumer credit market and the huge number of consumer credit users, credit risk modeling, or predicting delinquent (or default) probabilities of borrowers to aid financial institutions in granting and managing consumer credits, has become a critical problem in the consumer credit industry. The advent of the big data era and advanced machine learning techniques have opened up new opportunities in the field of credit risk modeling. The ability to collect large volumes of alternative data, such as borrowers’ mobile phone communication data and social network data, has provided financial institutions with fuller and more accurate profiles concerning borrowers' creditworthiness. However, alternative data are stored at external entities independent to financial institutions. Insecure use of alternative data for credit risk modeling would severely infringe on consumer privacy. Aiming at effectively considering information from external resources for credit risk modeling while preserving users' privacy simultaneously, this dissertation proposes three new and crucial credit risk modeling problems and develops novel solution methods to these problems. ☐ The first study in this dissertation highlights a challenge in effectively predicting credit risk by utilizing both intrinsic user data and social network data, where collecting the latter is difficult due to privacy concerns. To address this challenge, we propose to use latent network information instead of social network data. Accordingly, we develop a novel credit risk prediction model that considers both users' intrinsic data and latent network information. We then design a new method that estimates the model parameters, learns latent network information, and integrates this information with users' intrinsic data for credit risk prediction. We further extend our method to the multiclass and numerical credit risk prediction problems. Extensive empirical evaluations with real-world data demonstrate the superior predictive power of our method over benchmark methods for a broad spectrum of credit risk prediction problems. ☐ In the second study, we focus on the challenge of how to effectively utilize alternative data for credit risk prediction while preserving users’ privacy at the same time. In response, we define a new problem of privacy-preserving credit risk prediction with alternative data and propose a novel federated learning method to solve this problem. We demonstrate the lossless, privacy-preserving, and model-confidential properties of our method theoretically. Using real-world data, we show that our method achieves the same performance as the method that insecurely uses alternative data for credit risk prediction. We also show substantial performance advantages of our method over a common practice of credit risk prediction at financial institutions. ☐ The third study introduces a novel challenge of publishing social graphs with vertically partitioned credit attributes. The problem aims to empower a financial institution that privately owns credit attributes to securely generate and release a synthetic social graph with credit attributes of nodes for data analytical purposes, while ensuring the data confidentiality of the social network owner who exclusively owns the social graph. To address this problem, we propose a method that allows the financial institution and social network owner to fit the attributed graph model jointly. Then, a synthetic graph with credit attributes is sampled from the fitted model via reject sampling. In particular, the Paillier encryption and a differential privacy mechanism are adopted in the model fitting procedure to ensure the confidentiality of datasets.
Description
Keywords
Credit risk modeling, Differential privacy, Federated learning, Privacy-preserving machine learning, Creditworthiness