"Show me the cup": Reference with Continuous Representations

doi:10.48550/arXiv.1606.08777

"Show me the cup": Reference with Continuous Representations

One of the most basic functions of language is to refer to objects in a shared scene. Modeling reference with continuous representations is challenging because it requires individuation, i.e., tracking and distinguishing an arbitrary number of referents. We introduce a neural network model that, given a definite description and a set of objects represented by natural images, points to the intended object if the expression has a unique referent, or indicates a failure, if it does not. The model, directly trained on reference acts, is competitive with a pipeline manually engineered to perform the same task, both when referents are purely visual, and when they are characterized by a combination of visual and linguistic properties.

Publication:

arXiv e-prints

Pub Date:

June 2016

DOI:

10.48550/arXiv.1606.08777

arXiv:

arXiv:1606.08777

Bibcode:

2016arXiv160608777B

Keywords:

Computer Science - Computation and Language;
Computer Science - Artificial Intelligence;
Computer Science - Machine Learning

E-Print:

In: Gelbukh A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science, vol 10761. Springer, Cham

NASA/ADS

"Show me the cup": Reference with Continuous Representations

Abstract