Open Access Publications

Permanent URI for this collection

Open access publications by faculty, postdocs, and graduate students in the Department of Computer and Information Sciences.


Recent Submissions

Now showing 1 - 20 of 49
  • Item
    Quantum computing for finance
    (Nature Reviews Physics, 2023-07-11) Herman, Dylan; Googin, Cody; Liu, Xiaoyuan; Sun, Yue; Galda, Alexey; Safro, Ilya; Pistoia, Marco; Alexeev, Yuri
    Quantum computers are expected to surpass the computational capabilities of classical computers and have a transformative impact on numerous industry sectors. We present a comprehensive summary of the state of the art of quantum computing for financial applications, with particular emphasis on stochastic modelling, optimization and machine learning. This Review is aimed at physicists, so it outlines the classical techniques used by the financial industry and discusses the potential advantages and limitations of quantum techniques. Finally, we look at the challenges that physicists could help tackle. Key points - Quantum algorithms for stochastic modelling, optimization and machine learning are applicable to various financial problems. - Quantum Monte Carlo integration and gradient estimation can provide quadratic speedup over classical methods, but more work is required to reduce the amount of quantum resources for early fault-tolerant feasibility and achieving an actual speedup. - Financial optimization problems can be continuous (convex or non-convex), discrete or mixed, and thus quantum algorithms for these problems can be applied. - The advantages and challenges of quantum machine learning for classical problems are also apparent in finance.
  • Item
    Improvements in viral gene annotation using large language models and soft alignments
    (BMC Bioinformatics, 2024-04-25) Harrigan, William L.; Ferrell, Barbra D.; Wommack, K. Eric; Polson, Shawn W.; Schreiber, Zachary D.; Belcaid, Mahdi
    Background The annotation of protein sequences in public databases has long posed a challenge in molecular biology. This issue is particularly acute for viral proteins, which demonstrate limited homology to known proteins when using alignment, k-mer, or profile-based homology search approaches. A novel methodology employing Large Language Models (LLMs) addresses this methodological challenge by annotating protein sequences based on embeddings. Results Central to our contribution is the soft alignment algorithm, drawing from traditional protein alignment but leveraging embedding similarity at the amino acid level to bypass the need for conventional scoring matrices. This method not only surpasses pooled embedding-based models in efficiency but also in interpretability, enabling users to easily trace homologous amino acids and delve deeper into the alignments. Far from being a black box, our approach provides transparent, BLAST-like alignment visualizations, combining traditional biological research with AI advancements to elevate protein annotation through embedding-based analysis while ensuring interpretability. Tests using the Virus Orthologous Groups and ViralZone protein databases indicated that the novel soft alignment approach recognized and annotated sequences that both blastp and pooling-based methods, which are commonly used for sequence annotation, failed to detect. Conclusion The embeddings approach shows the great potential of LLMs for enhancing protein sequence annotation, especially in viral genomics. These findings present a promising avenue for more efficient and accurate protein function inference in molecular biology.
  • Item
    Integrative data analysis to identify persistent post-concussion deficits and subsequent musculoskeletal injury risk: project structure and methods
    (BMJ Open Sport & Exercise Medicine, 2024-01-19) Anderson, Melissa; Claros, Claudio Cesar; Qian, Wei; Brockmeier, Austin; Buckley, Thomas A
    Concussions are a serious public health problem, with significant healthcare costs and risks. One of the most serious complications of concussions is an increased risk of subsequent musculoskeletal injuries (MSKI). However, there is currently no reliable way to identify which individuals are at highest risk for post-concussion MSKIs. This study proposes a novel data analysis strategy for developing a clinically feasible risk score for post-concussion MSKIs in student-athletes. The data set consists of one-time tests (eg, mental health questionnaires), relevant information on demographics, health history (including details regarding the concussion such as day of the year and time lost) and athletic participation (current sport and contact level) that were collected at a single time point as well as multiple time points (baseline and follow-up time points after the concussion) of the clinical assessments (ie, cognitive, postural stability, reaction time and vestibular and ocular motor testing). The follow-up time point measurements were treated as individual variables and as differences from the baseline. Our approach used a weight-of-evidence (WoE) transformation to handle missing data and variable heterogeneity and machine learning methods for variable selection and model fitting. We applied a training-testing sample splitting scheme and performed variable preprocessing with the WoE transformation. Then, machine learning methods were applied to predict the MSKI indicator prediction, thereby constructing a composite risk score for the training-testing sample. This methodology demonstrates the potential of using machine learning methods to improve the accuracy and interpretability of risk scores for MSKI.
  • Item
    Targeting of plasmodesmal proteins requires unconventional signals
    (The Plant Cell, 2023-08-02) Luna, Gabriel Robles; Li, Jiefu; Wang, Xu; Liao, Li; Lee, Jung-Youn
    Effective cellular signaling relies on precise spatial localization and dynamic interactions among proteins in specific subcellular compartments or niches, such as cell-to-cell contact sites and junctions. In plants, endogenous and pathogenic proteins gained the ability to target plasmodesmata, membrane-lined cytoplasmic connections, through evolution to regulate or exploit cellular signaling across cell wall boundaries. For example, the receptor-like membrane protein PLASMODESMATA-LOCATED PROTEIN 5 (PDLP5), a potent regulator of plasmodesmal permeability, generates feed-forward or feed-back signals important for plant immunity and root development. However, the molecular features that determine the plasmodesmal association of PDLP5 or other proteins remain largely unknown, and no protein motifs have been identified as plasmodesmal targeting signals. Here, we developed an approach combining custom-built machine-learning algorithms and targeted mutagenesis to examine PDLP5 in Arabidopsis thaliana and Nicotiana benthamiana. We report that PDLP5 and its closely related proteins carry unconventional targeting signals consisting of short stretches of amino acids. PDLP5 contains 2 divergent, tandemly arranged signals, either of which is sufficient for localization and biological function in regulating viral movement through plasmodesmata. Notably, plasmodesmal targeting signals exhibit little sequence conservation but are located similarly proximal to the membrane. These features appear to be a common theme in plasmodesmal targeting.
  • Item
    Towards C-V2X Enabled Collaborative Autonomous Driving
    (IEEE Transactions on Vehicular Technology, 2023-08-14) He, Yuankai; Wu, Baofu; Dong, Zheng; Wan, Jian; Shi, Weisong
    Intelligent vehicles, including autonomous vehicles and vehicles equipped with ADAS systems, are single-agent systems that navigate solely on the information collected by themselves. However, despite rapid advancements in hardware and algorithms, many accidents still occur due to the limited sensing coverage from a single-agent perception angle. These tragedies raise a critical question of whether single-agent autonomous driving is safe. Preliminary investigations on this safety issue led us to create a C-V2X-enabled collaborative autonomous driving framework (CCAD) to observe the driving circumstance from multiple perception angles. Our framework uses C-V2X technology to connect infrastructure with vehicles and vehicles with vehicles to transmit safety-critical information and to add safety redundancies. By enabling these communication channels, we connect previously independent single-agent vehicles and existing infrastructure. This paper presents a prototype of our CCAD framework with RSU and OBU as communication devices and an edge-computing device for data processing. We also present a case study of successfully implementing an infrastructure-based collaborative lane-keeping with the CCAD framework. Our case study evaluations demonstrate that the CCAD framework can transmit, in real-time, personalized lane-keeping guidance information when the vehicle cannot find the lanes. The evaluations also indicate that the CCAD framework can drastically improve the safety of single-agent intelligent vehicles and open the doors to many more collaborative autonomous driving applications.
  • Item
    E3-UAV: An Edge-Based Energy-Efficient Object Detection System for Unmanned Aerial Vehicles
    (IEEE Internet of Things Journal, 2023-08-03) Suo, Jiashun; Zhang, Xingzhou; Shi, Weisong; Zhou, Wei
    Motivated by the advances in deep learning techniques, the application of Unmanned Aerial Vehicle (UAV)-based object detection has proliferated across a range of fields, including vehicle counting, fire detection, and city monitoring. While most existing research studies only a subset of the challenges inherent to UAV-based object detection, there are few studies that balance various aspects to design a practical system for energy consumption reduction. In response, we present the E3-UAV, an edge-based energy-efficient object detection system for UAVs. The system is designed to dynamically support various UAV devices, edge devices, and detection algorithms, with the aim of minimizing energy consumption by deciding the most energy-efficient flight parameters (including flight altitude, flight speed, detection algorithm, and sampling rate) required to fulfill the detection requirements of the task. We first present an effective evaluation metric for actual tasks and construct a transparent energy consumption model based on hundreds of actual flight data to formalize the relationship between energy consumption and flight parameters. Then we present a lightweight energy-efficient priority decision algorithm based on a large quantity of actual flight data to assist the system in deciding flight parameters. Finally, we evaluate the performance of the system, and our experimental results demonstrate that it can significantly decrease energy consumption in real-world scenarios. Additionally, we provide four insights that can assist researchers and engineers in their efforts to study UAV-based object detection further.
  • Item
    A comprehensive analysis of the integration of team research between sport psychology and management
    (Psychology of Sport and Exercise, 2020-06-13) Emich, Kyle J.; Norder, Kurt; Lu, Li; Sawhney, Aman
    Both sports and organizations rely on teams. As such, the sport psychology and management literatures have contributed greatly to our understanding of team functioning. Despite this, previous reviews based on subsets of articles in these literatures indicate a lack of communication between them. In this article, we assess the state of integration between the entirety of the sport psychology and management literatures on teams by considering the full set of interconnected team articles in the SCOPUS database (6974 articles over 69 years). We use this data to conduct a combination of citation network analysis and content analysis via topic modeling to evaluate conceptual integration. The data show that interdisciplinary discussion between these two fields is lacking, particularly regarding the integration of sport psychology into management research. Whereas 7% of references to team articles in sport psychology come from management journals, only 0.6% of team references in management journals come from sport psychology. Despite this, longitudinal analysis indicates that in the last 10 years the rate of integration between these fields is increasing. We identify specific topics that have accounted for this integration and suggest topics ripe for future integration.
  • Item
    Improving Inter-Helix Contact Prediction With Local 2D Topological Information
    (IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2023-05-08) Li, Jiefu; Sawhney, Aman; Lee, Jung-Youn; Liao, Li
    Inter-helix contact prediction is to identify residue contact across different helices in α-helical integral membrane proteins. Despite the progress made by various computational methods, contact prediction remains as a challenging task, and there is no method to our knowledge that directly tap into the contact map in an alignment free manner. We build 2D contact models from an independent dataset to capture the topological patterns in the neighborhood of a residue pair depending it is a contact or not, and apply the models to the state-of-art method's predictions to extract the features reflecting 2D inter-helix contact patterns. A secondary classifier is trained on such features. Realizing that the achievable improvement is intrinsically hinged on the quality of original predictions, we devise a mechanism to deal with the issue by introducing, 1) partial discretization of original prediction scores to more effectively leverage useful information 2) fuzzy score to assess the quality of the original prediction to help with selecting the residue pairs where improvement is more achievable. The cross-validation results show that the prediction from our method outperforms other methods including the state-of-the-art method (DeepHelicon) by a notable degree even without using the refinement selection scheme. By applying the refinement selection scheme, our method outperforms the state-of-the-art method significantly in these selected sequences.
  • Item
    WiLDAR: WiFi Signal-Based Lightweight Deep Learning Model for Human Activity Recognition
    (IEEE Internet of Things Journal, 2023-07-11) Deng, Fuxiang; Jovanov, Emil; Song, Houbing; Shi, Weisong; Zhang, Yuan; Xu, Wenyao
    In recent years, the WiFi channel state information (CSI) has been increasingly used for human activity recognition (HAR) during activities of daily living, because of non-intrusiveness and privacy preserving properties. However, most previous works require complex processing of CSI signals, and the large number of classification network parameters significantly increases the recognition time and deployment costs. Accordingly, a WiFi signal based lightweight deep learning (WiLDAR) network is developed in this study to ensure systematic operation on edge computing devices. We combine the random convolution kernel with deep separable convolution and residual structure, so that WiLDAR can easily extract CSI signal features without filtering and denoising. The parameter number and training time of WiLDAR are thus much less than those of previous neural networks. In addition, a tiny HAR system using only Raspberry Pi and router is implemented. Experiments verify that WiLDAR can achieve real-time HAR on IoT devices, which makes HAR deployment more convenient. We test WiLDAR on three different fine-grained action datasets to achieve 99%, 93.5% and 97.5% recognition accuracy, respectively. The demonstrated learning capability of WiLDAR makes it an excellent option for the remote HAR.
  • Item
    Towards Resilient Network Slicing for Satellite-Terrestrial Edge Computing IoT
    (IEEE Internet of Things Journal, 2023-05-18) Esmat, Haitham H.; Lorenzo, Beatriz; Shi, Weisong
    Satellite-Terrestrial Edge Computing Networks (STECNs) emerged as a global solution to support multiple Internet of Things (IoT) applications in 6G networks. The enabling technologies to slice STECNs such as Software-Defined Networking (SDN), satellite edge computing, and Network Function Virtualization (NFV) are key to realizing this vision. In this paper, we survey and analyze network slicing solutions for STECNs. We discuss slice management and orchestration for different STECNs integration architectures, satellite edge computing, mmWave/THz, and AI solutions to make network slicing adaptive. In addition, we identify challenges and open issues to slice STECNs. In particular, resilient network slicing is crucial for essential and critical services. Network failures are unavoidable in large networks and can cause significant disruptions in network slicing, compromising many services. To this end, we present a resilient network slicing design to cope with failures and guarantee service continuity which is agnostic to the integration architecture and inherently multi-domain. Further, we present strategies to achieve resilient networking and slicing in STECNs including planning and provisioning of redundant network resources, design rules for service level agreement decomposition, and cross-domain solutions to detect and mitigate failures. Finally, promising future research directions are highlighted. This paper provides valuable guidelines for slicing STECNs and will benefit key sectors, such as smart healthcare, e-commerce, industrial IoT, and education, among others.
  • Item
    Incremental Dense Reconstruction From Monocular Video With Guided Sparse Feature Volume Fusion
    (IEEE Robotics and Automation Letters, 2023-05-08) Zuo, Xingxing; Yang, Nan; Merrill, Nathaniel; Xu, Binbin; Leutenegger, Stefan
    Incrementally recovering 3D dense structures from monocular videos is of paramount importance since it enables various robotics and AR applications. Feature volumes have recently been shown to enable efficient and accurate incremental dense reconstruction without the need to first estimate depth, but they are not able to achieve as high of a resolution as depth-based methods due to the large memory consumption of high-resolution feature volumes. This letter proposes a real-time feature volume-based dense reconstruction method that predicts TSDF (Truncated Signed Distance Function) values from a novel sparsified deep feature volume, which is able to achieve higher resolutions than previous feature volume-based methods, and is favorable in outdoor large-scale scenarios where the majority of voxels are empty. An uncertainty-aware multi-view stereo (MVS) network is leveraged to infer initial voxel locations of the physical surface in a sparse feature volume. Then for refining the recovered 3D geometry, deep features are attentively aggregated from multi-view images at potential surface locations, and temporally fused. Besides achieving higher resolutions than before, our method is shown to produce more complete reconstructions with finer detail in many cases. Extensive evaluations on both public and self-collected datasets demonstrate a very competitive real-time reconstruction result for our method compared to state-of-the-art reconstruction methods in both indoor and outdoor settings.
  • Item
    Joint Optimization of Security Strength and Resource Allocation for Computation Offloading in Vehicular Edge Computing
    (IEEE Transactions on Wireless Communications, 2023-04-13) Xiao, Huizi; Zhao, Jun; Feng, Jie; Liu, Lei; Pei, Qingqi; Shi, Weisong
    Vehicular Edge Computing (VEC) is a promising new paradigm that has attracted much attention in recent years, which can enhance the storage and computing capabilities of vehicular networks to provide users with low latency and high-quality services. Due to the open access and unreliable wireless channels, some appropriate security measures should be implemented in the VEC to ensure information security. However, the operation of the security mechanism dominates supererogatory computing resources, thus affecting the performance of VEC systems. The scarcity of computation and energy resources of the vehicles conflicts with the requirement of tasks for time delay and information security. In this paper, taking the driving velocity and position of the vehicles, the number of lanes, the model and density of the attackers, and security strength into consideration, we formulate a max-min optimization problem to jointly optimize offloading decision, transmit power, task computation frequency, encryption computation frequency, edge computation frequency, and block length to obtain optimal secure information capacity and local computation delay. The formulated optimization problem is a mixed integer nonlinear programming (MINLP), which is intractable. We apply the generalized benders decomposition (GBD)-based method to solve it. The simulation results show that our proposed algorithms have convergence and effectiveness and achieve fairness among vehicles on the road.
  • Item
    A scoping review of the use of lab streaming layer framework in virtual and augmented reality research
    (Virtual Reality, 2023-05-02) Wang, Qile; Zhang, Qinqi; Sun, Weitong; Boulay, Chadwick; Kim, Kangsoo; Barmaki, Roghayeh Leila
    The use of multimodal data allows excellent opportunities for human–computer interaction research and novel techniques regarding virtual and augmented reality (VR/AR) experiences. Collecting, coordinating, and synchronizing a large amount of data from multiple VR/AR hardware while maintaining a high framerate can be a daunting task, despite the compelling nature of multimodal data. The Lab Streaming Layer (LSL) is an open-source framework that enables the synchronous collection of various types of multimodal data, unlike existing expensive alternatives. However, despite its potential, this framework has not been fully adopted by the VR/AR research community. In this paper, we present a guideline of the LSL framework’s use in VR/AR research as well as report current trends by performing a comprehensive literature review on the subject. We extract 549 publications using LSL from January 2015 to March 2022. We analyze types of data, displays, and targeted application areas. We describe in-depth reviews of 38 selected papers and provide use of LSL in the VR/AR research community while highlighting benefits, challenges, and future opportunities.
  • Item
    Machine learning classifier approaches for predicting response to RTK-type-III inhibitors demonstrate high accuracy using transcriptomic signatures and ex vivo data
    (Bioinformatics Advances, 2023-03-24) Ferrato, Mauricio H.; Marsh, Adam G.; Franke, Karl R.; Huang, Benjamin J.; Kolb, E. Anders; DeRyckere, Deborah; Grahm, Douglas K.; Chandrasekaran, Sunita; Crowgey, Erin L.
    Motivation: The application of machine learning (ML) techniques in the medical field has demonstrated both successes and challenges in the precision medicine era. The ability to accurately classify a subject as a potential responder versus a nonresponder to a given therapy is still an active area of research pushing the field to create new approaches for applying machine-learning techniques. In this study, we leveraged publicly available data through the BeatAML initiative. Specifically, we used gene count data, generated via RNA-seq, from 451 individuals matched with ex vivo data generated from treatment with RTK-type-III inhibitors. Three feature selection techniques were tested, principal component analysis, Shapley Additive Explanation (SHAP) technique and differential gene expression analysis, with three different classifiers, XGBoost, LightGBM and random forest (RF). Sensitivity versus specificity was analyzed using the area under the curve (AUC)-receiver operating curves (ROCs) for every model developed. Results: Our work demonstrated that feature selection technique, rather than the classifier, had the greatest impact on model performance. The SHAP technique outperformed the other feature selection techniques and was able to with high accuracy predict outcome response, with the highest performing model: Foretinib with 89% AUC using the SHAP technique and RF classifier. Our ML pipelines demonstrate that at the time of diagnosis, a transcriptomics signature exists that can potentially predict response to treatment, demonstrating the potential of using ML applications in precision medicine efforts. Availability and implementation: Supplementary information: Supplementary data are available at Bioinformatics Advances online at:
  • Item
    A Game-Theoretic Approach to Energy-Efficient Elevator Scheduling in Smart Buildings
    (IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023-02-22) Maleki, Erfan Farhangi; Bhatta, Dixit; Mashayekhy, Lena
    Buildings, producing more carbon footprints than the transportation sector, account for a significant portion of the United States’ total energy consumption. By designing modern automation techniques, smart buildings can significantly reduce energy consumption, protect the environment, and consequently improve quality of life. This article focuses on the automation of elevator scheduling, which is an NP-Hard problem, to reduce energy usage in smart buildings and improve users’ quality of experience. We propose an optimal mathematical model for the elevator scheduling problem using integer programming. We then propose a novel game-theoretic approach that captures interactions within the elevator system to reduce energy consumption and enhance user experience. We propose a request coalition formation game, where nonoverlapping coalitions of user requests are served by elevators to minimize their movements and energy consumption while reducing service time and stops for users. We analyze the performance of our proposed approach using the optimal solution as a benchmark and Nearest Car and Fixed Sectoring algorithms as rivals. The experiments show that our approach is significantly efficient in terms of energy consumption and service time, making it suitable for smart buildings.
  • Item
    Subtyping Patients With Chronic Disease Using Longitudinal BMI Patterns
    (IEEE Journal of Biomedical and Health Informatics, 2023-01-17) Mottalib, Md Mozaharul; Jones-Smith, Jessica C.; Sheridan, Bethany; Beheshti, Rahmatollah
    Obesity is a major health problem, increasing the risk of various major chronic diseases, such as diabetes, cancer, and stroke. While the role of obesity identified by cross-sectional BMI recordings has been heavily studied, the role of BMI trajectories is much less explored. In this study, we use a machine learning approach to subtype individuals' risk of developing 18 major chronic diseases by using their BMI trajectories extracted from a large and geographically diverse EHR dataset capturing the health status of around two million individuals for a period of six years. We define nine new interpretable and evidence-based variables based on the BMI trajectories to cluster the patients into subgroups using the k-means clustering method. We thoroughly review each cluster's characteristics in terms of demographic, socioeconomic, and physiological measurement variables to specify the distinct properties of the patients in the clusters. In our experiments, the direct relationship of obesity with diabetes, hypertension, Alzheimer's, and dementia has been re-established and distinct clusters with specific characteristics for several of the chronic diseases have been found to be conforming or complementary to the existing body of knowledge.
  • Item
    Resource Optimization of MAB-based Reputation Management for Data Trading in Vehicular Edge Computing
    (IEEE Transactions on Wireless Communications, 2023-01-09) Xiao, Huizi; Cai, Lin; Feng, Jie; Pei, Qingqi; Shi, Weisong
    Vehicles are hesitant to upload data to edge servers in vehicle edge computing (VEC) as many vehicle data collected and perceived by various on-board sensors contain sensitive and personal information and lack economic incentive. Instead of free access to shared data, encrypted data trading will alleviate security and privacy concerns and provide an incentive for vehicle owners to share their data. The edge server needs to pay the price in data trading, and reputation management is a great method to help it trade with reliable and available vehicles. In this paper, we propose a multi-armed bandit (MAB)-based reputation management scheme, so the edge servers can select the high reputation vehicles for data trading, which can ensure the credibility and reliability of the data. The encryption scheme is applied to achieve the required transmission security level and defend the rights and interests of the edge server. On the other hand, implementing security measures will consume the computation and communication resources of the vehicles. We formulate an optimization problem that maximizes the revenue of vehicles in data trading under the constraints of time delay, energy consumption, and security level. Simulation results demonstrate that the proposed scheme is effective and efficient for vehicle reputation management, data trading selection, and resource allocation.
  • Item
    Semi-Identical Twins Variational AutoEncoder for Few-Shot Learning
    (IEEE Transactions on Neural Networks and Learning Systems, 2023-01-09) Zhang, Yi; Huang, Sheng; Peng, Xi; Yang, Dan
    Data augmentation is a popular way for few-shot learning (FSL). It generates more samples as supplements and then transforms the FSL task into a common supervised learning problem for a solution. However, most data-augmentation-based FSL approaches only consider the prior visual knowledge for feature generation, thereby leading to low diversity and poor quality of generated data. In this study, we attempt to address this issue by incorporating both prior visual and prior semantic knowledge to condition the feature generation process. Inspired by some genetic characteristics of semi-identical twins, a novel multimodal generative FSL approach was developed named semi-identical twins variational autoencoder (STVAE) to better exploit the complementarity of these modality information by considering the multimodal conditional feature generation process as a process that semi-identical twins are born and collaborate to simulate their father. STVAE conducts feature synthesis by pairing two conditional variational autoencoders (CVAEs) with the same seed but different modality conditions. Subsequently, the generated features of two CVAEs are considered as semi-identical twins and adaptively combined to yield the final feature, which is considered as their fake father. STVAE requires that the final feature can be converted back into its paired conditions while ensuring these conditions remain consistent with the original in both representation and function. Moreover, STVAE is able to work in the partial modality-absence case due to the adaptive linear feature combination strategy. STVAE essentially provides a novel idea to exploit the complementarity of different modality prior information inspired by genetics in FSL. Extensive experimental results demonstrate that our work achieves promising performances in comparison to the recent state-of-the-art approaches, as well as validate its effectiveness on FSL under various modality settings.
  • Item
    An Efficient Approach to Predict Eye Diseases from Symptoms Using Machine Learning and Ranker-Based Feature Selection Methods
    (Bioengineering, 2022-12-24) Marouf, Ahmed Al; Mottalib, Md Mozaharul; Alhajj, Reda; Rokne, Jon; Jafarullah, Omar
    The eye is generally considered to be the most important sensory organ of humans. Diseases and other degenerative conditions of the eye are therefore of great concern as they affect the function of this vital organ. With proper early diagnosis by experts and with optimal use of medicines and surgical techniques, these diseases or conditions can in many cases be either cured or greatly mitigated. Experts that perform the diagnosis are in high demand and their services are expensive, hence the appropriate identification of the cause of vision problems is either postponed or not done at all such that corrective measures are either not done or done too late. An efficient model to predict eye diseases using machine learning (ML) and ranker-based feature selection (r-FS) methods is therefore proposed which will aid in obtaining a correct diagnosis. The aim of this model is to automatically predict one or more of five common eye diseases namely, Cataracts (CT), Acute Angle-Closure Glaucoma (AACG), Primary Congenital Glaucoma (PCG), Exophthalmos or Bulging Eyes (BE) and Ocular Hypertension (OH). We have used efficient data collection methods, data annotations by professional ophthalmologists, applied five different feature selection methods, two types of data splitting techniques (train-test and stratified k-fold cross validation), and applied nine ML methods for the overall prediction approach. While applying ML methods, we have chosen suitable classic ML methods, such as Decision Tree (DT), Random Forest (RF), Naive Bayes (NB), AdaBoost (AB), Logistic Regression (LR), k-Nearest Neighbour (k-NN), Bagging (Bg), Boosting (BS) and Support Vector Machine (SVM). We have performed a symptomatic analysis of the prominent symptoms of each of the five eye diseases. The results of the analysis and comparison between methods are shown separately. While comparing the methods, we have adopted traditional performance indices, such as accuracy, precision, sensitivity, F1-Score, etc. Finally, SVM outperformed other models obtaining the highest accuracy of 99.11% for 10-fold cross-validation and LR obtained 98.58% for the split ratio of 80:20.
  • Item
    Overview of the COVID-19 text mining tool interactive demonstration track in BioCreative VII
    (Database, 2022-10-05) Chatr-aryamontri, Andrew; Hirschman, Lynette; Ross, Karen E.; Oughtred, Rose; Krallinger, Martin; Dolinski, Kara; Tyers, Mike; Korves, Tonia; Arighi, Cecilia N.
    The coronavirus disease 2019 (COVID-19) pandemic has compelled biomedical researchers to communicate data in real time to establish more effective medical treatments and public health policies. Nontraditional sources such as preprint publications, i.e. articles not yet validated by peer review, have become crucial hubs for the dissemination of scientific results. Natural language processing (NLP) systems have been recently developed to extract and organize COVID-19 data in reasoning systems. Given this scenario, the BioCreative COVID-19 text mining tool interactive demonstration track was created to assess the landscape of the available tools and to gauge user interest, thereby providing a two-way communication channel between NLP system developers and potential end users. The goal was to inform system designers about the performance and usability of their products and to suggest new additional features. Considering the exploratory nature of this track, the call for participation solicited teams to apply for the track, based on their system’s ability to perform COVID-19-related tasks and interest in receiving user feedback. We also recruited volunteer users to test systems. Seven teams registered systems for the track, and >30 individuals volunteered as test users; these volunteer users covered a broad range of specialties, including bench scientists, bioinformaticians and biocurators. The users, who had the option to participate anonymously, were provided with written and video documentation to familiarize themselves with the NLP tools and completed a survey to record their evaluation. Additional feedback was also provided by NLP system developers. The track was well received as shown by the overall positive feedback from the participating teams and the users.
Copyright: Please look at individual material in order to see what the copyright and licensing terms are. Some material may be available for reuse under a Creative Commons license; other material may be the copyright of the individual author(s) or the publisher of the journal.