Localization approaches for predictive models based on spectral or process data with diverse applications

Date
2018
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Chemometrics is an interdisciplinary field aimed at extracting information from chemically relevant systems via data-driven means, primarily using the tools of modern statistical and machine learning theory. This dissertation concerns the development of novel methodology in the field of chemometrics for the advancement of numerous applications, including interpretation of spectral data, calibration transfer of multivariate regression models, and adaptive model building for predictions on dynamic systems. ☐ In most applications of data-driven modeling, a single model is built utilizing all of the available data. In chemical data, there are often localized portions of the data that are more or less informative for the specific task at hand. Numerous potential advantages are possible when modeling these local aspects of the data independently in an ensemble model, such as better prediction accuracy or enhanced model interpretation. The focus of this dissertation is the construction of such ensemble models. Two paradigms of local modeling are investigated in this work: time/wavelength-localized modeling and frequency-localized modeling. Utilizing these localization frameworks, we investigate novel methodology for diverse regression tasks. Chapter 1 of this dissertation serves to introduce the background and unifying theory of the concepts utilized throughout the remainder of the dissertation. ☐ In Chapter 2, a static modeling method under a wavelength-localized paradigm combining sparse partial least squares and stacked interval partial least squares is presented. The combination of variable selection and local model weighting permits a straightforward interpretation of the model regression vector when applied to spectral data. The proposed method also performs favorably, in terms of prediction error, when compared to other variable selection and model weighting methods. A number of experiments on the effects of outliers and measurement resolution are also undertaken. ☐ In Chapter 3, a static modeling method using frequency-localization via the discrete wavelet transform paired with orthogonal projection for the calibration transfer of regression models based on spectral data is described. We show that the proposed method is competitive with standard calibration transfer methods. Additional experiments show that the method is superior to standard methods when applying transferred models onto spectra from unseen instruments. ☐ In Chapter 4, a dynamic modeling method using frequency-localization via the undecimated wavelet transform paired with recursive partial least squares for the soft sensing of chemical processes is investigated. We show that the method greatly improves standard adaptive modeling by down-weighting noise that is present in the process variables. It is also shown that the improvement compared to the standard method is statistically significant irrespective of the memory used when updating the model. ☐ In Chapter 5, a dynamic modeling method using time-localization via a large number of overlapping models with memory attenuation for soft sensing of chemical processes is outlined. Covariance based variable selection is utilized on each local model to account for the presence of distinct states in the process data and to create diversity in the ensemble. Experiments conducted at various updating frequencies indicate that the method represents a statistical improvement in prediction error compared to the standard method, as well as the proposed method without variable selection. ☐ In Chapter 6, a dynamic modeling method using self-correction strategies to select local modeling regions and adjust model memory for improved soft sensing is developed. The method uses a regression based on a neural network hidden layer input to the recursive partial least squares algorithm. Additionally, a memory diverse ensemble paired with greedy weight updating is utilized to allow real-time model memory adjustment. We show that modeling is superior compared to other local soft sensors at statistically significant levels, and that the parameters allow enhanced data interpretation. ☐ In Chapter 7, the conclusions of the research are given, as well as numerous potential future directions.
Description
Keywords
Pure sciences, Chemometrics, Ensemble modeling, Local modeling
Citation