Prediction of the FIFA World Cup 2018 - A random forest approach with an emphasis on estimated team ability parameters
In this work, we compare three different modeling approaches for the scores of soccer matches with regard to their predictive performances based on all matches from the four previous FIFA World Cups 2002 - 2014: Poisson regression models, random forests and ranking methods. While the former two are based on the teams' covariate information, the latter method estimates adequate ability parameters that reflect the current strength of the teams best. Within this comparison the best-performing prediction methods on the training data turn out to be the ranking methods and the random forests. However, we show that by combining the random forest with the team ability parameters from the ranking methods as an additional covariate we can improve the predictive power substantially. Finally, this combination of methods is chosen as the final model and based on its estimates, the FIFA World Cup 2018 is simulated repeatedly and winning probabilities are obtained for all teams. The model slightly favors Spain before the defending champion Germany. Additionally, we provide survival probabilities for all teams and at all tournament stages as well as the most probable tournament outcome.
- Pub Date:
- June 2018
- Statistics - Applications
- First revised version, corrected typo in introduction when referring to the winning probabilities derived by Zeileis, Leitner, and Hornik (2018), which are for Germany 15.8% instead of 12.8%. Second revised version, slight changes in notation in Section 3.3