Muhammad Haris, Mathias Franzius, Ute Bauer-Wersing,
Learning Visual Landmarks for Localization with Minimal Supervision",
International Conference on IMAGE ANALYSIS AND PROCESSING, pp. 773-786, 2022.
Camera localization is one of the fundamental requirements for vision-based mobile robots, self-driving cars, and augmented reality applications. In this context, learning spatial representations relative to unique regions in a scene with Slow Feature Analysis (SFA) has demonstrated large-scale localization. However, it relies either on pre-existing object detectors or hand-labeled data to train a CNN for recognizing unique regions in a scene. We propose a new approach that uses readily available CNN-detectable objects as anchors to label and learn new landmark objects or regions in a scene using only a minimal amount of supervision. Thus, the method bootstraps the landmark generation process and removes the need to label large amounts of data manually. The anchor objects are only required to learn the new landmarks and become obsolete for the unsupervised mapping and localization phases. We present localization results with the learned landmarks in both simulated and real-world outdoor environments and compare the results to SFA on complete images and PoseNet. The landmark-based localization shows similar or significantly better performance than the baseline methods in challenging scenarios. Our results further suggest that the approach scales well and achieves even higher localization accuracy by increasing the number of learned landmarks without increasing the number of anchors.
Download Bibtex file