In this paper we propose a neural network model with a novel Sequential Attention layer that extends soft attention by assigning weights to words in an input sequence in a way that takes into account not just how well that word matches a query, but how well surrounding words match. We evaluate this approach on the task of reading comprehension (on the Who did What and CNN datasets) and show that it dramatically improves a strong baseline--the Stanford Reader--and is competitive with the state of the art.
- Pub Date:
- May 2017
- Computer Science - Computation and Language;
- Computer Science - Machine Learning
- To appear in ACL 2017 2nd Workshop on Representation Learning for NLP. Contains additional experiments in section 4 and a revised Figure 1