Deep learning predictive modelling for electronic health records

Journal Title
Journal ISSN
Volume Title
University of Delaware
With the digitization of health records over the last two decades there is a large amount of health records data collected electronically. This data provides unprecedented research opportunities to build clinical prediction models to estimate future health risks. Working with EHRs is known to be challenging due to their volume, different data types, and quality issues. The complex characteristics and quality issues in EHR can be listed as: 1) large feature space, 2) unequal lengths of medical histories, 3) a different number of observations (per patient), 4) irregular intervals between visits, and 5) missing values. Tackling these issues is the fundamental task to efficiently utilize the massive EHR data to build clinical prediction models. ☐ The non-linear complexity and temporal relationships in electronic health records (EHRs) limit the capability of traditional machine learning methods to perform clinical predictive tasks with high accuracy. In this work, we focus on using deep learning techniques to capture complex patterns in EHR data and address its data quality issues to build deep learning clinical prediction models with improved accuracy. We present a hybrid sequential deep learning model to capture static and longitudinal patterns in EHR data. To address missingness and irregular time intervals in EHR time-series data, we propose a model to interpolate and extrapolate values to perform concurrent imputation and prediction. We also propose a prediction model design to learn from different lengths of medical histories and provide the varying lengths of future time-series prediction. Lastly, we provide open source software with the collection of various cleaning, pre-processing, modeling, and evaluation techniques to build clinical prediction models using open-source EHR data. Our proposed models and techniques can be utilized for different disease prediction tasks using EHR data although we primarily focus on their application in childhood obesity prediction.
Deep learning, Electronic health records, Sequential models, Time-series prediction, Clinical predictive tasks