Robocentric visual-inertial localization and mapping

Date
2023
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
In this thesis, we focus on the research and application of visual-inertial navigation systems (VINS) that is backboned by the advances of the sensor technology and the estimation theory. As the sensors such as cameras and inertial measurement units (IMUs) are able to be mass manufactured with small sizes and cheap prices, we can find more and more devices ranging from the smartphones, drones to cars, have become the platforms for deploying VINS, and many paradigms of success have also been witnessed so far. Despite of the huge progress achieved, here we vigilantly point out the issues of the current VINS approaches affecting the consistency, efficiency and versatility, and propose our solutions for these aspects. Especially, two main approaches, filtering and optimization, which are mathematically equivalent but have different advantages for different use cases are both included, and improved in performance by our solutions. ☐ Our first effort focuses on the inconsistency issue of VINS estimator which is mainly caused by the mismatch of observability properties between original and linearized systems. Different with the remedies proposed by trading off the accuracy or efficiency of the estimator, we seek to solve this issue from another perspective by reformulating the VINS problem. Therefore, instead of using a fixed global (i.e., world-centric) frame as the navigation reference and estimating absolute pose of the platform directly, we express the VINS problem with respect to a moving local frame and estimate relative motion for recovering absolute pose of the platform. Through rigorous analysis, we show that the system using our proposed formulation has invariant observability properties even after linearization, thus fundamentally improve estimation consistency. Another advantage of such reformulation is VINS can start from arbitrary pose, without the need of doing particular initialization like gravity alignment which may be tricky to guarantee the accuracy in practice. Especially, we term ours the {\em robocentric} VINS formulation as opposed to the commonly-used world-centric formulation, and develop the corresponding robocentric visual-inertial odometry (R-VIO) algorithm to validate our analysis and demonstrate its performance through both Monte-Carlo simulations and real-world experiments with different platforms in different scenarios. ☐ Considering the computational resource limit and digital precision of the mobile platforms, our second effort investigates the application of the square-root estimator which has lower memory cost by computing a half-size information matrix and better numerical stability with a square-rooted condition number as compared with the original information (or covariance)-based estimator. Based on R-VIO which implements a covariance-based estimator, we derive the information-based robocentric formulation with the factors of visual and inertial measurements. QR factorization that offers good numerical stability is used for information update, and we enable in-place operations to further speed up the computation. We also integrate online calibration to our new estimator to deal with unknown parameter errors in relative spatial configuration and timing offset between sensors, and it is shown to be able to handle several degenerate cases. Thanks to the proposed square-root robocentric formulation, we develop a new algorithm, R-VIO2, by fully using single-precision floating (float32) arithmetic, which is resource-friendly over the double-precision (float64) counterparts. Through extensive real-world tests on the benchmark dataset and in a long-term large-scale experiment, we demonstrate the superior time efficiency of our square-root estimator, and the significant improvement in accuracy with online spatiotemporal calibration. ☐ For a complete solution of VINS, the ability of mapping is useful for sensing the surroundings in the environment and correcting the drift caused by the sensor noises and biases. Based on our preceding results that improve the trajectory estimation, our third effort aims to realize a consistent and efficient VINS framework for simultaneous localization and mapping (SLAM). In contrast to the current popular approaches that split SLAM process into motion tracking and map building and solve them separately which apparently leads to inconsistent estimation result, here we take into account the correlation between the two split processes by referencing the batch SLAM estimator. In our design, the front-end focuses on motion tracking, while the back-end is used for building the entire map and trajectory and performing global optimization when loop closure is found. To include real-time states in the front-end into batch optimization, we enable multiple threads for the back-end to track the estimation in the front-end in real time during the processing of global optimization so that the correlation between the states is updated just as in the batch estimator. Also, in this way the drift in the front-end can be corrected immediately using the result of global optimization, thus leading to consistent and accurate SLAM output. This framework is compatible with different formulations for front-end and back-end, for which we develop a proof-of-concept SLAM system using a robocentric front-end and a world-centric back-end, showing that the versatility across different choices of front-end becomes possible when realizing VINS. Similarly, we test our system with Monte-Carlo simulations and challenging real-world datasets, where it is shown to achieve superior consistency and better accuracy against the state-of-the-art visual-inertial approach, and has the potential to be an extensible platform for experimenting distributed large-scale navigation application. ☐ In particular, our developed VINS algorithms are open sourced online in order to benefit our research community: • R-VIO: https://github.com/rpng/R-VIO, • R-VIO2: https://github.com/rpng/R-VIO2.
Description
Keywords
Estimation consistency, Robocentric formulation, Sensor fusion, Inertial measurement units, Benchmark datasets, Visual-inertial navigation
Citation