DSpace Repository :: Browsing by Author "Ferrato, Mauricio H."

Browsing by Author "Ferrato, Mauricio H."

Now showing 1 - 2 of 2

Machine learning classifier approaches for predicting response to RTK-type-III inhibitors demonstrate high accuracy using transcriptomic signatures and ex vivo data
(Bioinformatics Advances, 2023-03-24) Ferrato, Mauricio H.; Marsh, Adam G.; Franke, Karl R.; Huang, Benjamin J.; Kolb, E. Anders; DeRyckere, Deborah; Grahm, Douglas K.; Chandrasekaran, Sunita; Crowgey, Erin L.
Motivation: The application of machine learning (ML) techniques in the medical field has demonstrated both successes and challenges in the precision medicine era. The ability to accurately classify a subject as a potential responder versus a nonresponder to a given therapy is still an active area of research pushing the field to create new approaches for applying machine-learning techniques. In this study, we leveraged publicly available data through the BeatAML initiative. Specifically, we used gene count data, generated via RNA-seq, from 451 individuals matched with ex vivo data generated from treatment with RTK-type-III inhibitors. Three feature selection techniques were tested, principal component analysis, Shapley Additive Explanation (SHAP) technique and differential gene expression analysis, with three different classifiers, XGBoost, LightGBM and random forest (RF). Sensitivity versus specificity was analyzed using the area under the curve (AUC)-receiver operating curves (ROCs) for every model developed. Results: Our work demonstrated that feature selection technique, rather than the classifier, had the greatest impact on model performance. The SHAP technique outperformed the other feature selection techniques and was able to with high accuracy predict outcome response, with the highest performing model: Foretinib with 89% AUC using the SHAP technique and RF classifier. Our ML pipelines demonstrate that at the time of diagnosis, a transcriptomics signature exists that can potentially predict response to treatment, demonstrating the potential of using ML applications in precision medicine efforts. Availability and implementation: https://github.com/UD-CRPL/RCDML Supplementary information: Supplementary data are available at Bioinformatics Advances online at: https://doi.org/10.1093/bioadv/vbad034
Predicting outcomes for rare diseases using machine learning techniques
(University of Delaware, 2023) Ferrato, Mauricio H.
The application of machine learning (ML) techniques in the medical field has demonstrated both successes and challenges in the era of precision medicine. The ability to accurately predict outcomes for subjects with rare diseases is still an active area of research, pushing the field to create new approaches and apply machine learning. However, often times these approaches can become extensively complex, mimicking black-box systems, and creating uncertainty on the biological validity and the proper use of these models in the clinical decision-making process. Also, due to the complex nature and high dimensionality of rare disease datasets, especially those in the field of genomics, these approaches tend to be computationally exhaustive and require substantial use of computational resources to perform efficiently. ☐ To address this problem, we propose a scalable ML application called RNA-seq Count Drug-response Machine Learning (RCDML). We follow a workflow consisting of pre-processing, informative / explainable feature extraction, and tree-ensemble ML classifier algorithms. Multiple feature selection techniques were tested, such as Principal Component Analysis (PCA), SHapley Additive ExPlanations (SHAP), Rare Allele Enrichment (RAE) and Differential Gene Expression Analysis (DGE), with three different classifiers, XGBoost, LightGBM, and Random Forest. Sensitivity versus specificity was analyzed using the area under the curve (AUC) - receiver operating curves (ROC) and Precision Recall for every model developed. The RCDML application uses the SHAP approach to provide meaning for the predictive decisions taken by our ML pipeline when applied to a binary classification task. ☐ For this study, we leveraged publicly available data through the BeatAML initiative. Specifically, we used gene count data, generated by RNA sequencing, from 451 individuals matched with ex vivo data generated from treatment with RTK-type III inhibitors. We also used a Parkinson's disease dataset, which included variant data for 144 subjects. ☐ The results of this work show that the SHAP technique outperformed the other feature selection techniques and was able to predict the outcome of drug response with high precision, with the highest performing model. Foretinib with 89\% AUC using the SHAP technique and the Random Forest classifier. The results also demonstrated that the feature selection technique, rather than the classifier, had the greatest impact on model performance. Our ML pipeline demonstrates that at the time of diagnosis, there is a transcriptome signature that can potentially predict the response to treatment, demonstrating the importance of explainable ML approaches and the potential of their use in precision medicine efforts. ☐ Work was carried out to analyze imbalance in genomic data, where PD and AML models were exposed to the class imbalance problem and their predictive performance was compared with the results of 10 different undersampling techniques. Early stage work includes optimization of this approach, using GPU frameworks such as RAPIDs and other parallel programming tools that can provide the RCDML workflow with the ability to use GPUs and scale to large datasets.

Browsing by Author "Ferrato, Mauricio H."

Results Per Page

Sort Options