Rethinking the foundations of visual-inertial state estimation

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

University of Delaware

Abstract

Visual–Inertial Navigation Systems (VINS), which provide accurate motion tracking in uncertain real-world 3D environments, form a core component of modern autonomy. This dissertation revisits three key elements in visual–inertial estimation—linearization, iteration, and error-state modeling—and demonstrates their roles in developing more consistent, accurate, and efficient algorithms, with applications in aerial robotics and extended reality (XR). ☐ We begin by addressing estimator consistency, leading to a discussion on how linear systems are constructed for state estimation algorithms. A major source of inconsistency in VINS arises from the mismatch between the observability of the linearized model used in the estimator and that of the true nonlinear system. Over the years, the First-Estimate Jacobian (FEJ) technique has improved consistency by fixing the linearization points used to evaluate Jacobians with the first available state estimates. However, FEJ has remained largely confined to filtering-based estimators, as it is challenging to determine which states should be fixed and when, across different estimation architectures. We generalize FEJ into FEJ++, a framework that tracks and identifies the minimal yet effective set of states to be fixed at the proper time across diverse estimator designs. Beyond improving consistency, this work leverages the fact that fixed Jacobians eliminate redundant computations and shows how this property can be used to accelerate both iterative optimization and covariance recovery. We further develop FEJ2, which explicitly models the linearization error introduced by fixed Jacobians to improve robustness and accuracy. ☐ Looking back at the evolution of estimator design, VINS estimators are typically divided into two categories: optimization-based methods that iteratively relinearize nonlinear measurements to update the state, and filtering-based methods that perform a single-pass update. While iteration is often believed to improve accuracy, we further examine its role through a systematic analysis of how the relinearization of IMU and camera measurements affects accuracy, consistency, and computational efficiency. The results show that while iteration can help in cases such as poor initialization or lost feature tracks, its benefits in typical scenarios are often limited relative to its computational cost. These findings suggest that iterative refinement should be regarded as a task-dependent rather than default design choice in estimator development. ☐ Next, we revisit how estimators represent state and error. Building on prior insights from invariant VINS formulations, we reaffirm that the choice of error-state representation—rather than the state—plays a central role in estimator behavior. Along this line, we first develop the Decoupled Right-Invariant VINS (DRI-VINS), which improves efficiency by separating feature states from the group structure while maintaining consistency through careful treatment of observability. Extending this idea, we propose the Decoupled Error and State (DES) methodology, which treats the state and error representations as independent design elements. This flexibility allows the error state to be designed adaptively and to evolve during estimation, leveraging the benefits of different representations at various stages while estimating the same underlying state. ☐ Finally, we demonstrate that VINS is not limited to navigation. In robotics, we enable real-time system identification for agile micro aerial vehicles (MAVs) that adapt in flight without external calibration or additional sensors. In XR, we design a monocular visual–inertial odometry (VIO) system that leverages structural cues such as planes for robust and lightweight tracking in complex environments. Overall, this dissertation reexamines fundamental design choices in visual–inertial estimation and contributes to a deeper understanding of how consistency, efficiency, and accuracy can be achieved in practice.

Description

Citation

Endorsement

Review

Supplemented By

Referenced By