Improving the Effectiveness of Content Popularity Prediction Methods using Time Series Trends
Abstract
We here present a simple and effective model to predict the popularity of web content. Our solution, which is the winner of two of the three tasks of the ECML/PKDD 2014 Predictive Analytics Challenge, aims at predicting user engagement metrics, such as number of visits and social network engagement, that a web page will achieve 48 hours after its upload, using only information available in the first hour after upload. Our model is based on two steps. We first use time series clustering techniques to extract common temporal trends of content popularity. Next, we use linear regression models, exploiting as predictors both content features (e.g., numbers of visits and mentions on online social networks) and metrics that capture the distance between the popularity time series to the trends extracted in the first step. We discuss why this model is effective and show its gains over state of the art alternatives.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2014
- DOI:
- arXiv:
- arXiv:1408.7094
- Bibcode:
- 2014arXiv1408.7094F
- Keywords:
-
- Computer Science - Social and Information Networks;
- Physics - Physics and Society;
- H.3.5
- E-Print:
- Presented on the ECML/PKDD Discovery Challenge on Predictive Analytics. Winner of two out pf three tasks of the Predictive Analytics Discovery Challenge