Nonparametric Bayesian Storyline Detection from Microtexts
Abstract
News events and social media are composed of evolving storylines, which capture public attention for a limited period of time. Identifying storylines requires integrating temporal and linguistic information, and prior work takes a largely heuristic approach. We present a novel online non-parametric Bayesian framework for storyline detection, using the distance-dependent Chinese Restaurant Process (dd-CRP). To ensure efficient linear-time inference, we employ a fixed-lag Gibbs sampling procedure, which is novel for the dd-CRP. We evaluate on the TREC Twitter Timeline Generation (TTG), obtaining encouraging results: despite using a weak baseline retrieval model, the dd-CRP story clustering method is competitive with the best entries in the 2014 TTG task.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2016
- DOI:
- 10.48550/arXiv.1601.04580
- arXiv:
- arXiv:1601.04580
- Bibcode:
- 2016arXiv160104580K
- Keywords:
-
- Computer Science - Computation and Language;
- Computer Science - Machine Learning
- E-Print:
- Appeared at the Workshop on Computing News Storylines at the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP 2016)