Statistical divergences and density estimation for anomaly detection and generative modeling

Date
2025
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Anomaly detection (AD) is the task of identifying instances that deviate from an expected distribution. Unsupervised anomaly detection does not require any examples of anomalous data, which, if available, are typically insufficient to comprehensively define all aspects of anomaly. We propose to estimate an unnormalized density function from a dataset via noise contrastive estimation (NCE), on top of a composite feature representation combining an auto-encoder's latent features with the auto-encoder's reconstruction loss values. As an alternative to an auto-encoder, a pretrained model followed by PCA can also be used to construct the composite feature from principal components and the loss values of the principal component reconstruction. To further enhance the effectiveness of the NCE framework for AD tasks, we introduce two strategies to adapt the NCE framework: augmenting the training data by varying the reconstruction features to reduce the false negative rate, and optimizing the contrastive Gaussian noise distribution to better approximate the data distribution. Experimental assessments on multiple benchmark datasets demonstrate that the proposed approach not only matches the performance of prevalent state-of-the-art anomaly detection algorithms but also exhibits enhanced robustness on multimodal training datasets. ☐ In the second work, we introduce the decoupled Jensen–Shannon (DJS) divergence, a novel statistical divergence family that extends the Jensen-Shanon (JS) divergence, and includes the Kullback–Leibler (KL), reverse KL, and Jeffreys divergence as limit cases. While NCE is optimized to approximate the JS divergence, a density estimator could be formulated using any $f$-divergence under the same framework, if the optimal critical function can provide the information to calculate the density ratio as in the standard NCE. However, the KL divergence and reverse KL divergence possess mode-covering and mode-seeking properties, respectively, that result in the overestimation and underestimation of the density function. Our proposed DJS divergence is a convex combination of skewed KL and skewed reverse KL divergences, designed to mitigate these estimation biases. To facilitate its application to anomaly detection and generative modeling, we derive a variational formula for the DJS divergence and obtain a statistically consistent estimator when limited to finite samples. We explore the application of DJS divergence in anomaly detection and generative modeling, such as generative adversarial neural networks (GANs) in high-dimensional image spaces.
Description
Keywords
Anomaly detection, Noise contrastive estimation, Divergence, Multimodal training datasets
Citation