Rethinking Visual Geo-localization for Large-Scale Applications

doi:10.48550/arXiv.2204.02287

Rethinking Visual Geo-localization for Large-Scale Applications

Visual Geo-localization (VG) is the task of estimating the position where a given photo was taken by comparing it with a large database of images of known locations. To investigate how existing techniques would perform on a real-world city-wide VG application, we build San Francisco eXtra Large, a new dataset covering a whole city and providing a wide range of challenging cases, with a size 30x bigger than the previous largest dataset for visual geo-localization. We find that current methods fail to scale to such large datasets, therefore we design a new highly scalable training technique, called CosPlace, which casts the training as a classification problem avoiding the expensive mining needed by the commonly used contrastive learning. We achieve state-of-the-art performance on a wide range of datasets and find that CosPlace is robust to heavy domain changes. Moreover, we show that, compared to the previous state-of-the-art, CosPlace requires roughly 80% less GPU memory at train time, and it achieves better results with 8x smaller descriptors, paving the way for city-wide real-world visual geo-localization. Dataset, code and trained models are available for research purposes at https://github.com/gmberton/CosPlace.

Publication:

arXiv e-prints

Pub Date:

April 2022

DOI:

10.48550/arXiv.2204.02287

arXiv:

arXiv:2204.02287

Bibcode:

2022arXiv220402287B

Keywords:

Computer Science - Computer Vision and Pattern Recognition

E-Print:

CVPR 2022

ADS

Rethinking Visual Geo-localization for Large-Scale Applications

Abstract