"Show me the cup": Reference with Continuous Representations
Abstract
One of the most basic functions of language is to refer to objects in a shared scene. Modeling reference with continuous representations is challenging because it requires individuation, i.e., tracking and distinguishing an arbitrary number of referents. We introduce a neural network model that, given a definite description and a set of objects represented by natural images, points to the intended object if the expression has a unique referent, or indicates a failure, if it does not. The model, directly trained on reference acts, is competitive with a pipeline manually engineered to perform the same task, both when referents are purely visual, and when they are characterized by a combination of visual and linguistic properties.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2016
- DOI:
- 10.48550/arXiv.1606.08777
- arXiv:
- arXiv:1606.08777
- Bibcode:
- 2016arXiv160608777B
- Keywords:
-
- Computer Science - Computation and Language;
- Computer Science - Artificial Intelligence;
- Computer Science - Machine Learning
- E-Print:
- In: Gelbukh A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science, vol 10761. Springer, Cham