Scene understanding and applications towards empowering individuals with visual impairment
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
This dissertation presents innovative methods to enhance assistive technologies for individuals with visual impairment, focusing on improving answer grounding accuracy in Visual Question Answering (VQA). Central to this is the introduction of five novel methods that significantly increase state-of-the-art accuгасу: ☐ • Neural Ordinary Differential Equation Model: This model boasts a remarkable accuracy on the VizWiz-VQA-Grounding dataset. It stands out for its efficiency, using fewer parameters and requiring substantially less memory, compared to existing state-of-the-art models. • James-Stein Normalization Layers: The application of the James-Stein estimator enhances the estimation of mean and variance in normalization layers. This results in higher accuracy across various computer vision tasks with minimal additional computational demand. • Universal Image Restoration: A pioneering task-agnostic universal approach for image restoration that enhances image quality by implicitly learning a wide range of image imperfections. By adopting a more holistic approach, the proposed method goes beyond specific tasks and embraces a comprehensive perspective on image restoration and quality improvement. • Embedding Attention Block: This block recalibrates channel-wise image feature maps by explicitly modeling the relationships between image feature maps and the image-question-answer embedding. Implementing this attention mechanism led to a 74.1% accuracy on the VizWiz-VQA-Grounding dataset, securing the top position on the 2023 VizWiz-VQA-Grounding challenge leaderboard. • Novel Use of Apple Live Photos and Android Motion Photos: An innovative approach is proposed for comparing the performance of Live/Motion Photos and static images. This analysis reveals that Live/Motion Photos perform better in common tasks used by visual assisting applications, highlighting their potential to enhance the visual experience. ☐ These advancements not only contribute to the field of assistive technology for people with visual impairment but also push the boundaries of machine learning and computer vision.
Description
Keywords
Answer groundings, James-Stein estimators, Transformers, Visual impairment, Visual Question Answering