Studies on classification problems and application in consumer lending optimization system

Date
2022
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Classification is a fundamental problem in machine learning and has wide range of applications. In this dissertation, we introduce and study the likelihood and Bayesian probability transformations on feature variables of the binary classification problem. We systematically investigate their theoretical properties and associated benefits when applied in many machine learning topics, algorithms, and applications. ☐ In the first part of the dissertation, we define the likelihood and Bayesian probability transformations and study their theoretical properties. The transformations are then used to improve confusion-matrix based classification performance measure characterization, ROC curve, and cost-sensitive analysis. In particular, we propose a unified framework for all existing classification performance measures and show that the transformations lead to dominant performance measurements and efficient threshold calculations, guaranteed concavity of ROC curve, and milder assumptions for cost-sensitive analysis. ☐ The second part of the dissertation focuses on two major extensions of the binary classification problem. Extension 1 introduces an ambiguous region in the binary classification problem, which usually requires collecting additional information for a classification decision. We obtain optimal ambiguity thresholds under various performance measures and cost-sensitive environments. Extension 2 proposes a dynamic ensemble of two binary classifiers. We show that the ensemble dominates both component classifiers in terms of the ROC curves and also verify the dominance numerically and empirically by using Lending Club data. ☐ The last part of the dissertation applies some of our proposed classification techniques in consumer lending. We develop a consumer lending optimization system by taking risk and pricing sensitivity in a holistic view to determine customers' risk category and pricing. We numerically demonstrate the enhancement of the optimization system over the popular risk-based pricing currently used in lending industry.
Description
Keywords
Classification, Machine learning, Optimization, Performance measure
Citation