AerialVL: A Dataset, Baseline and Algorithm Framework for Aerial-Based Visual Localization With Reference Map

Abstract
Visual localization plays an essential role in the autonomous flight of Unmanned Aerial Vehicles (UAVs) especially for the Global Navigation Satellite System (GNSS) denied environments. Existing aerial-based visual localization methods mainly focus on eliminating image variance between database map and captured frames. However, these is a lack of public dataset and baseline for method comparisons, which impedes the development of aerial-based visual localization. To address this issue, we construct AerialVL, a large-scale dataset, which is collected using UAV flying at different altitudes, along various routes, and during diverse time periods. AerialVL consists of 11 image sequences covering approximately 70 km of trajectory and includes a reference satellite image database corresponding to the flight area. Leveraging AerialVL, we perform thorough evaluations on various mainstream solutions designed for aerial-based visual localization for the first time. This evaluation encompasses visual place recognition, visual alignment localization and visual odometry, serving as comparison baselines. Furthermore, we present a general aerial-based visual localization framework, which unifies various methods and integrates them into a modular architecture. We note that across all flight trajectories, the proposed framework achieves higher localization accuracy and robustness against the existing methods.
Description
This article was originally published in IEEE Robotics and Automation Letters. The version of record is available at: https://doi.org/10.1109/LRA.2024.3441491 © 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. This article will be embargoed until 08/09/2026.
Keywords
deep learning for visual perception, localization, recognition, vision-based navigation
Citation
M. He et al., "AerialVL: A Dataset, Baseline and Algorithm Framework for Aerial-Based Visual Localization With Reference Map," in IEEE Robotics and Automation Letters, vol. 9, no. 10, pp. 8210-8217, Oct. 2024, doi: 10.1109/LRA.2024.3441491.