VOLoc: Visual Place Recognition by Querying Compressed Lidar Map

Proceedings of ICRA 2024

 

Xudong Cai1, Yongcai Wang1,*, Zhe Huang1, Yu Shao1, Deying Li1

1 School of Information, Renmin University of China, Beijing, 100872

 

image-20240529183359317 image-20240529183422179 image-20240529183517490

 

 

Overview

The availability of city-scale Lidar maps enables the potential of city-scale place recognition using mobile cam- eras. However, the city-scale Lidar maps generally need to be compressed for storage efficiency, which increases the difficulty of direct visual place recognition in compressed Lidar maps. This paper proposes VOLoc, an accurate and efficient visual place recognition method that exploits geometric similarity to directly query the compressed Lidar map via the real-time cap- tured image sequence. In the offline phase, VOLoc compresses the Lidar maps using a Geometry-Preserving Compressor (GPC), in which the compression is reversible, a crucial requirement for the downstream 6DoF pose estimation. In the online phase, VOLoc proposes an online Geometric Recovery Module (GRM), which is composed of online Visual Odometry (VO) and a point cloud optimization module, such that the local scene structure around the camera is online recovered to build the Querying Point Cloud (QPC). Then the QPC is compressed by the same GPC, and is aggregated into a global descriptor by an attention- based aggregation module, to query the compressed Lidar map in the vector space. A transfer learning mechanism is also proposed to improve the accuracy and the generality of the aggregation network. Extensive evaluations show that VOLoc provides localization accuracy even better than the Lidar-to- Lidar place recognition, setting up a new record for utilizing the compressed Lidar map by low-end mobile cameras.

System Architecture

Consider a city-scale point cloud map M which is collected along the city roads. The map is segmented into segments of equal size. We compress the segmented maps for storage efficiency and setup a database, i.e., DB={c1,c2,...,cN}, where ci is the ith compressed segment. A client equipped with a mono-camera queries the database using its captured images to find which segment the client is most possibly located at.

image-20240531105642836

The overview of the proposed method is shown. The Lidar sub-maps are first processed by Geometry-Preserving Compressor (Section~\ref{sec:GPC}), and are then processed by the Feature Aggregation module (Section~\ref{sec:Global Feature Aggregation}) to be converted into global descriptors Dd={d1,d2,...,dN}.

image-20240531111255434

Evaluations

image-20240531111311324

Visualization

image-20240531111324494

image-20240531111347493

Bibtex

Acknowledgment

This work was supported in part by the National Natural Science Foundation of China Grant No. 61972404, 12071478; Public Computing Cloud, Renmin University of China; Blockchain Laboratory, Metaverse Research Center, Renmin University of China.