Improving learning under data scarcity constraints: application in brain MRI, sonar, and natural images

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

University of Delaware

Abstract

Lack of data significantly hampers machine learning approaches in domains with limited data. This shortage impedes the effective use of deep learning models, which are prone to overfitting and often perform poorly when processing data not seen during their training. The nature of this problem varies by application, necessitating tailored solutions. We focus on using machine learning to achieve different computer vision tasks on various imaging modalities: structural MRI from brain scans, sonar images, and natural images. ☐ To achieve abnormal tissue segmentation (brain lesion detection) from structural MRI, we propose a self-supervised task that exploits the intricate spatial structure in the brain. We take a patch taken from an MRI slice and attempt to learn the mapping to its location relative within a brain. We add to this task an estimation for the uncertainty of the predicted location. Then, for the downstream tasks of abnormality detection and segmentation, we use a combination of two scores, namely the estimated location error and the uncertainty, as an unsupervised abnormality score for the input patch. ☐ While this approach focuses on leveraging spatial context within available structural images, in many clinical scenarios, some MRI modalities may be missing or unavailable due to limited resources, acquisition time, or patient-specific constraints. To address this complementary challenge of modality scarcity, we propose a 3D two-stage model for many-to-many modality translation. This model achieves state-of-the-art performance in both reconstruction quality and inference time, making it a practical solution for completing missing modalities in multi-modal MRI pipelines. ☐ For natural images, we utilize the fact that they are composed of two parts: background and foreground objects, where the latter is defined as the salient parts of the images, in training a masking network to separate the two. In sonar images of the sea floor, this can separate objects from the background sea floor. To do this we propose a weakly-unsupervised training scheme to train a masking network that takes an input image and generates a mask for the foreground objects in the input image. This mask is used to generate a synthetic image with the foreground superimposed on a different background-only image, yielding a counterfactual image. We use the cluster assignments of background content of images to define a conditional statistical divergence between the generated counterfactual images and the real ones for each target background cluster. The trained model that minimizes this divergence can be used in downstream tasks such as foreground segmentation and classification. Additionally, counterfactual images composed of foreground objects overlaid onto different backgrounds that are not present in the training data are useful for data augmentation. ☐ While the proposed methods address core aspects of learning under data scarcity, they also reveal new directions for future work. First, finer-grained localization in Patch2Loc could be achieved by applying out-of-distribution detection techniques to spatially organized latent spaces, particularly to overcome the limitations imposed by fixed patch sizes. In the context of weakly supervised segmentation, the background clustering mechanism could be extended with dynamic or adaptive clustering methods to handle more complex, real-world backgrounds. Additionally, to mitigate hallucinations such as partial object removal, a discriminator could be employed. For modality translation, incorporating uncertainty modeling would help identify when a translation is ill-posed due to missing modality-specific content, thereby improving reliability in clinical settings. We also plan to extend the approach to other modalities such as stiffness maps estimated from MRE images. Furthermore, we observed that dynamic models can better estimate missing information during translation (e.g., the contrast of T1CE), but they may alter the structural integrity of the brain. Introducing structural regularization into these generative models could preserve anatomical fidelity and enhance translation performance.

Description

Citation

Endorsement

Review

Supplemented By

Referenced By