Combining learning and computational imaging for 3D inference

Date
2017
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Acquiring 3D geometry of the scene is a key task in computer vision. Applications are numerous, from classical object reconstruction and scene understanding to the more recent visual SLAM and autonomous driving. Recent advances in computational imaging have enabled many new solutions to tackle the problem of 3D reconstruction. By modifying the camera's components, computational imaging optically encodes the scene, then decodes it with tailored algorithms. ☐ This dissertation focuses on exploring new computational imaging techniques, combined with recent advances in deep learning, to infer 3D geometry of the scene. In general, our approaches can be categorized into active and passive 3D sensing. ☐ For active illumination methods, we propose two solutions: first, we present a multi-flash (MF) system implemented on the mobile platform. Using the sequence of images captured by the MF system, we can extract the depth edges of the scene, and further estimate a depth map on a mobile device. Next, we show a portable immersive system that is capable of acquiring and displaying high fidelity 3D reconstructions using a set of RGB-D sensors. The system is based on structured light technique and is able to recover 3D geometry of the scene in real time. We have also developed a visualization system that allows users to dynamically visualize the event from new perspectives at arbitrary time instances in real time. ☐ For passive sensing methods, we focus on light field based depth estimation. For depth inference from a single light field, we present an algorithm that is tailored for barcode images. Our algorithm analyzes the statistics of raw light field images and conducts depth estimation with real time speed for fast refocusing and decoding. To mimic the human vision system, we investigate the dual light field input and propose a unified deep learning based framework to extract depth from both disparity cue and focus cue. To facilitate training, we have created a large dual focal stack database with ground truth disparity. While above solution focuses on fusing depth from focus and stereo, we also exploit combing depth from defocus and stereo, with an all-focus stereo pair and a defocused image of one of the stereo views as input. We have adopted the hourglass network architecture to extract depth from the image triplets. We have then studied and explored multiple neural network architectures to improve depth inference. We demonstrate that our deep learning based approaches preserve the strength of focus/defocus cue and disparity cue while effectively suppressing their weaknesses.
Description
Keywords
Applied sciences, Computational imaging, Deep learning, Depth estimation, Light field
Citation