From coarse to fine: quickly and accurately obtaining indoor image-based localization under various illuminations

Date
2016
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
The focus of this dissertation is on improving accuracy and efficiency for indoor image- or video-based localization under different situations. Indoor localization is a critical topic in computer vision. Image-based localization is mostly used in an outdoor environment, especially in urban areas where GPS localization is inaccurate. Current vision-based indoor localization techniques are mainly aimed at retrieving the best matching image to achieve the location information, which misses the orientation information. Kinect is also used to search correspondences for each pixel in a query image to 3D points in an already known scene. However, Kinect is limited to a short distance range to be useful. I introduce the 3D Structure-from-Motion (SfM) reconstruction model into the indoor image-based localization framework. Feature correspondences are searched within each descriptor visual word, which reduces the search scope and accelerates the search process. In order to make my feature more discriminant for building 2D-to-3D correspondences, I learn a projection matrix and project features to a more discriminant space through trace ratio relevance feedback. I then change the distance computation method from Euclidean distance to Hellinger distance to improve the localization accuracy. I also analyze the localization accuracy for different distance computation methods. In a crowded environment, the captured images in an indoor environment usually contain people, which often leads to an inaccurate camera pose estimation. In my approach, people are segmented out in the videos by means of an optical flow technique and background is completed. A novel initial correspondence selection method is used instead of traditional RANSAC, leading to a much faster image registration speed. The correspondence selection method is through graph matching to enhance both image registration speed and camera pose estimation accuracy. In addition to SfM-based methods, I propose another multi-view, image-based localization method. The 3D SfM model is still applied to provide users a clear view of the whole building. I perform image retrieval to roughly obtain the image location. Knowing the orientation is also an essential property to localize an image, I regard each view direction as a task and perform image retrieval in a multi-task learning framework. By performing the multi-view image retrieval, the image location and orientation are achieved at the same time. This process avoids finding correspondences for each feature extracted from 2D image, which accelerates the process of building correspondences. I linearly search the nearest neighbor of the query image within the same location group and task, and assign the camera pose of the best matching candidate to the query image with further refinement through bundle adjustment. Existing image-based localization systems are based on color images captured by normal cameras with the assumption that adequate light is supplied. In many situations, such as in a power outage, light is not available. To cope with this situation, I implement a localization method based on thermal imaging which captures the object surface temperature instead of light. To overcome some limitations of thermal cameras, more time is needed to invest into focusing and framing a scene. Because this processes is labor intensive, far fewer samples were taken compared with color images. I apply transfer learning to enhance the training of the thermal image classification. To select the most informative samples used in the learning process, I combine active learning with transfer learning to make the classification model more accurate. Through the active transfer learning method, the location model can accurately indicate the thermal image location. To better explore scene geometric attributes, I further use a 3D model reconstructed by a short video as the query to realize 3D-to-3D localization under a multitask, point-retrieval framework. First, the use of a 3D model as the query enables efficient selection of location candidates. Furthermore, the reconstruction of 3D model exploits the correlation among different images, based on the fact that images captured from different views for SfM share information through matching features. By exploring shared information (matching features) across multiple related tasks (images of the same scene captured from different views), the visual feature’s view-invariance property can be improved in order to achieve higher point-retrieval accuracy. More specifically, I use a multi-task point-retrieval framework to explore the relationship between descriptors and the 3D points, thereby extracting the discriminant points for a more accurate 3D-to-3D correspondences retrieval. In summary, the entire dissertation conducts research on image/video-based localization with a focus on indoor environment in more accurate time- and memory-efficient methods.
Description
Keywords
Citation