Towards multi-scale inter-frame attention to improve deep learning tasks

dc.contributor.authorBhattarai, Ashuta
dc.date.accessioned2025-05-13T17:42:38Z
dc.date.available2025-05-13T17:42:38Z
dc.date.issued2025
dc.date.updated2025-04-28T04:03:51Z
dc.description.abstractAccess to specialized medical screening remains a challenge for individuals with sickle cell disease (SCD), particularly those in low-income and rural communities, where advanced diagnostic tools and expert evaluations are limited. In ophthalmology, Sickle Cell Retinopathy (SCR) diagnosis relies on ophthalmologic evaluation, including Optical Coherence Tomography (OCT) scans, but the manual interpretation is prone to subjectivity, fatigue-induced errors, and inconsistencies across clinicians. Similarly, video-based event analysis—such as reconstructing crime scenes from fragmented surveillance footage—is a time-intensive process that requires manual ordering and interpretation of unordered clips. These challenges highlight the need for automated solutions that enhance medical diagnostics and video-based decision-making. ☐ To address these issues, we propose Multi-scale Inter-frame Attention (MIA), a novel framework that enhances deep learning models for processing volumetric and video datasets. Our approach leverages spatial and spatio-temporal attention mechanisms to improve feature extraction and representation learning. We integrate MIA into two specialized models: the Cross-Scan Attention Transformer (CSAT) for SCR detection and the Sequential Ordering of Frames in Time (SOFT) for video-based action recognition. Experimental results demonstrate that CSAT+MIA outperforms conventional object detection models in diagnosing SCR, while SOFT+MIA enhances action recognition, particularly in temporally shuffled scenarios. ☐ Beyond domain-specific improvements, our research aims to establish a unified deep-learning method capable of capturing both inter-frame and intra-frame relationships for broader applications in medical imaging, surveillance, and video understanding. By integrating multi-scale inter-frame attention, we advance the field of automated diagnosis and event reconstruction, paving the way for more efficient, reliable, and intelligent decision-making systems.
dc.description.advisorKambhamettu, Chandra
dc.description.degreePh.D.
dc.description.departmentUniversity of Delaware, Department of Computer and Information Sciences
dc.identifier.doihttps://udspace.udel.edu/handle/19716/36137
dc.identifier.unique1519583855
dc.identifier.urihttps://udspace.udel.edu/handle/19716/36137
dc.language.rfc3066en
dc.publisherUniversity of Delaware
dc.relation.urihttps://www.proquest.com/pqdtlocal1006271/dissertations-theses/towards-multi-scale-inter-frame-attention-improve/docview/3196068009/sem-2?accountid=10457
dc.subjectCross scan attention transformer
dc.subjectSickle cell retinopathy
dc.subjectVideo understanding
dc.subjectSickle cell disease
dc.titleTowards multi-scale inter-frame attention to improve deep learning tasks
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Bhattarai_udel_0060D_16508.pdf
Size:
48.36 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.22 KB
Format:
Item-specific license agreed upon to submission
Description: