Predicting outcomes for rare diseases using machine learning techniques

Author(s)Ferrato, Mauricio H.
Date Accessioned2024-01-24T15:04:54Z
Date Available2024-01-24T15:04:54Z
Publication Date2023
SWORD Update2024-01-22T20:09:16Z
AbstractThe application of machine learning (ML) techniques in the medical field has demonstrated both successes and challenges in the era of precision medicine. The ability to accurately predict outcomes for subjects with rare diseases is still an active area of research, pushing the field to create new approaches and apply machine learning. However, often times these approaches can become extensively complex, mimicking black-box systems, and creating uncertainty on the biological validity and the proper use of these models in the clinical decision-making process. Also, due to the complex nature and high dimensionality of rare disease datasets, especially those in the field of genomics, these approaches tend to be computationally exhaustive and require substantial use of computational resources to perform efficiently. ☐ To address this problem, we propose a scalable ML application called RNA-seq Count Drug-response Machine Learning (RCDML). We follow a workflow consisting of pre-processing, informative / explainable feature extraction, and tree-ensemble ML classifier algorithms. Multiple feature selection techniques were tested, such as Principal Component Analysis (PCA), SHapley Additive ExPlanations (SHAP), Rare Allele Enrichment (RAE) and Differential Gene Expression Analysis (DGE), with three different classifiers, XGBoost, LightGBM, and Random Forest. Sensitivity versus specificity was analyzed using the area under the curve (AUC) - receiver operating curves (ROC) and Precision Recall for every model developed. The RCDML application uses the SHAP approach to provide meaning for the predictive decisions taken by our ML pipeline when applied to a binary classification task. ☐ For this study, we leveraged publicly available data through the BeatAML initiative. Specifically, we used gene count data, generated by RNA sequencing, from 451 individuals matched with ex vivo data generated from treatment with RTK-type III inhibitors. We also used a Parkinson's disease dataset, which included variant data for 144 subjects. ☐ The results of this work show that the SHAP technique outperformed the other feature selection techniques and was able to predict the outcome of drug response with high precision, with the highest performing model. Foretinib with 89\% AUC using the SHAP technique and the Random Forest classifier. The results also demonstrated that the feature selection technique, rather than the classifier, had the greatest impact on model performance. Our ML pipeline demonstrates that at the time of diagnosis, there is a transcriptome signature that can potentially predict the response to treatment, demonstrating the importance of explainable ML approaches and the potential of their use in precision medicine efforts. ☐ Work was carried out to analyze imbalance in genomic data, where PD and AML models were exposed to the class imbalance problem and their predictive performance was compared with the results of 10 different undersampling techniques. Early stage work includes optimization of this approach, using GPU frameworks such as RAPIDs and other parallel programming tools that can provide the RCDML workflow with the ability to use GPUs and scale to large datasets.
AdvisorChandrasekaran, Sunita
DegreePh.D.
DepartmentUniversity of Delaware, Department of Computer and Information Sciences
DOIhttps://doi.org/10.58088/82vj-nc92
Unique Identifier1430376162
URLhttps://udspace.udel.edu/handle/19716/33878
Languageen
PublisherUniversity of Delaware
URIhttps://www.proquest.com/pqdtlocal1006271/dissertations-theses/predicting-outcomes-rare-diseases-using-machine/docview/2917439617/sem-2?accountid=10457
KeywordsGene expression
KeywordsGene variant
KeywordsMachine learning
KeywordsPrecision medicine
KeywordsDecision-making
TitlePredicting outcomes for rare diseases using machine learning techniques
TypeThesis
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Ferrato_udel_0060D_15741.pdf
Size:
7.18 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.22 KB
Format:
Item-specific license agreed upon to submission
Description: