CATKit: a color and thermal computer vision toolkit and benchmark for cross-modal similarity and semantic segmentation
Date
2021
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Long-wave infrared imaging (LWIR) captures useful information about the thermal and material characteristics of a scene that are not captured by color cameras which only observe the visible spectrum. Up until recently, LWIR (thermal) cameras have been prohibitively expensive for wide-spread usage and computer vision algorithms targeting this unique modality have lagged behind methods designed for color imagery. In this dissertation, I present a computer vision toolkit that includes a dataset and benchmark for color and thermal algorithms, as well as novel machine learning methods for traditional computer vision tasks utilizing a mix of color and thermal images. ☐ Machine learning approaches have become popular due to their generalization power in vision tasks. However, these methods require numerous training samples and diverse imagery in order to achieve this generalization power. Often, features learned through these data-driven approaches are more robust than their manually engineered counterparts. These machine learning approaches have pushed the state-of-the-art in a variety of vision tasks, however machine learning methods designed for color imagery can fail when applied to thermal imagery. ☐ While datasets with both color and thermal imagery exist, they are usually acquired with a car-mounted camera system and target autonomous vehicle applications. These datasets include many repetitions of a small number of object classes, and the consistent geometry of the scenery can lead to machine learning approaches not generalizing to different scene geometries. The Color and Thermal Stereo dataset (CATS) is a color and thermal dataset with 3D point clouds, stereo disparity, and object label ground truth that aims to improve upon this deficit in scene and object diversity. CATS also includes a benchmark for both color and thermal image algorithms. ☐ Currently, there are not many algorithms designed for combinations of color and thermal imagery. The methods that do exist often utilize color and thermal fused images for specific tasks like pedestrian detection, due to the efficacy of thermal cameras in identifying black body radiators above ambient temperatures such as humans. To extend upon this area, I present novel machine learning approaches for the tasks of image similarity (WILDCAT) and semantic segmentation (WHERECAT). WILDCAT employs a deep residual pseudo-siamese network for comparing the similarity of color and thermal image regions. WHERECAT is an encoder-decoder architecture that exploits common edge features between color and thermal images in order to guide semantic segmentation in the network's decoder. This work also provides a suite of tools for the future development and benchmarking of methods that exploit the thermal imaging modality -- and its fusion with the visual modality -- as thermal cameras become a more ubiquitous sensor in vision platforms.
Description
Keywords
Computer Vision, Machine Learning, Thermal Imaging, Long-wave infrared imaging