OCTO+: A Suite for Automatic Open-Vocabulary Object Placement in Mixed Reality

doi:10.48550/arXiv.2401.08973

OCTO+: A Suite for Automatic Open-Vocabulary Object Placement in Mixed Reality

One key challenge in Augmented Reality is the placement of virtual content in natural locations. Most existing automated techniques can only work with a closed-vocabulary, fixed set of objects. In this paper, we introduce and evaluate several methods for automatic object placement using recent advances in open-vocabulary vision-language models. Through a multifaceted evaluation, we identify a new state-of-the-art method, OCTO+. We also introduce a benchmark for automatically evaluating the placement of virtual objects in augmented reality, alleviating the need for costly user studies. Through this, in addition to human evaluations, we find that OCTO+ places objects in a valid region over 70% of the time, outperforming other methods on a range of metrics.

Publication:

arXiv e-prints

Pub Date:

January 2024

DOI:

10.48550/arXiv.2401.08973

arXiv:

arXiv:2401.08973

Bibcode:

2024arXiv240108973S

Keywords:

Computer Science - Computer Vision and Pattern Recognition;
Computer Science - Artificial Intelligence;
Computer Science - Computation and Language

E-Print:

2024 IEEE International Conference on Artificial Intelligence and eXtended and Virtual Reality (AIXVR)

NASA/ADS

OCTO+: A Suite for Automatic Open-Vocabulary Object Placement in Mixed Reality

Abstract