Cross-View Completion Models are Zero-shot Correspondence Estimators
Abstract
In this work, we explore new perspectives on cross-view completion learning by drawing an analogy to self-supervised correspondence learning. Through our analysis, we demonstrate that the cross-attention map within cross-view completion models captures correspondence more effectively than other correlations derived from encoder or decoder features. We verify the effectiveness of the cross-attention map by evaluating on both zero-shot matching and learning-based geometric matching and multi-frame depth estimation. Project page is available at https://cvlab-kaist.github.io/ZeroCo/.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2024
- DOI:
- arXiv:
- arXiv:2412.09072
- Bibcode:
- 2024arXiv241209072A
- Keywords:
-
- Computer Science - Computer Vision and Pattern Recognition
- E-Print:
- Project Page: https://cvlab-kaist.github.io/ZeroCo/