Applications and evaluation of deep generative augmentation methods
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
High-quality datasets drive the success of machine learning applications; However, quality datasets are not always available as financial, security, and temporal factors all contribute to the difficulty of publishing datasets sufficient for machine learning applications. Often, when data is present but insufficient, data augmentation techniques are utilized to improve the performance of predictors. ☐ While data augmentation methods are well understood and have been used effectively in the image domain, it is often unclear how to create variations on tabular and time series data that preserve important patterns in the new synthetic data. ☐ In this work, we investigate the effectiveness of generative machine learning in augmenting tabular and time series datasets. To do so, we introduce a novel generative model evaluation tool, the GvR metric, to compare different generators in terms of their capacity for augmentation. We also create a new methodology for creating comparative visualizations of real and generated datasets. ☐ Finally, we demonstrate the abstract efficacy of these methods by applying them in widely disparate domains: network traffic classification and predicting streamflow for a local river system.
Description
Keywords
Data augmentation, Generative machine learning, Intrusion detection, Streamflow forecasting
