Applications of computational optimal transport in machine learning and signal processing
Date
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Recently, there has been a surge of interest in using optimal transport between probability distributions to measure the Wasserstein distance and enable better machine learning systems. More specifically, optimal transport can be used to define clustering algorithms, semi-supervised learning algorithms, and techniques for data compression and for correcting for covariate shifts in classification tasks. Additionally, Wasserstein distances can be used as cost functions in generative modeling and as constraints for robust modeling. The tremendous success of these techniques in wide application domains is due to the fact that optimal transport combines the related but distinct concepts of geometric distances and statistical divergences. ☐ The first work in this dissertation thoroughly investigates variants of optimal transport to deal with the cases where a subset of the support of one distribution aligns with complete support of another distribution, such as in the case of a carefully curated dataset that can be augmented by a source of less reliable data. In our experiments we demonstrated the utility of our approach in partial point cloud alignment, color transfer, positive-unlabeled (PU) learning and semi-supervised learning. Additionally, we propose to investigate the effect of partial alignment in generative modeling and to examine partial alignment in the case of global covariate-shift correction in classification tasks. ☐ In the second work for this dissertation, we investigate partial optimal transport in the case of two or more stochastic processes with application to matching bio-signals represented as univariate stochastic processes from a population of subjects, where the representation space underlying the transport is not Euclidean. In particular, we consider the case where spectral patterns observed in short-time windows can occur at different time scales for different processes. We seek a monotonic transformation of the spectra of each process that minimizes the Wasserstein distance between the distribution of spectra across windows. We anticipate that the spectral alignment for multiple subjects with different frequency spreads can enhance the performance of downstream learning systems. That is, learning on the aligned data performs better than learning on the original data. This has wide applications in cases where the machine learning system is better off learning to be invariant to the time scale. ☐ In the third work for this dissertation, we focus on the development of algorithms for neural network parametrized support subset selection approaches, where we only have access to the sample from underlying data distributions. More specifically, we developed algorithms for training neural network parameterized Monge-like maps in static formulation of continuous subset alignment and velocity-fields in dynamic formulation of continuous subset alignment. We applied our frameworks to PU-Learning and latent-space image alignment problems.
Description
Keywords
Machine learning, Algorithms, Neural network, Optimal transport, Clustering algorithms
