FusionDepth: Complement Self-Supervised Monocular Depth Estimation with Cost Volume
Abstract
Multi-view stereo depth estimation based on cost volume usually works better than self-supervised monocular depth estimation except for moving objects and low-textured surfaces. So in this paper, we propose a multi-frame depth estimation framework which monocular depth can be refined continuously by multi-frame sequential constraints, leveraging a Bayesian fusion layer within several iterations. Both monocular and multi-view networks can be trained with no depth supervision. Our method also enhances the interpretability when combining monocular estimation with multi-view cost volume. Detailed experiments show that our method surpasses state-of-the-art unsupervised methods utilizing single or multiple frames at test time on KITTI benchmark.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2023
- DOI:
- 10.48550/arXiv.2305.06036
- arXiv:
- arXiv:2305.06036
- Bibcode:
- 2023arXiv230506036H
- Keywords:
-
- Computer Science - Computer Vision and Pattern Recognition