Prediction Model For Wordle Game Results With High Robustness
Abstract
In this study, we delve into the dynamics of Wordle using data analysis and machine learning. Our analysis initially focused on the correlation between the date and the number of submitted results. Due to initial popularity bias, we modeled stable data using an ARIMAX model with coefficient values of 9, 0, 2, and weekdays/weekends as the exogenous variable. We found no significant relationship between word attributes and hard mode results. To predict word difficulty, we employed a Backpropagation Neural Network, overcoming overfitting via feature engineering. We also used K-means clustering, optimized at five clusters, to categorize word difficulty numerically. Our findings indicate that on March 1st, 2023, around 12,884 results will be submitted and the word "eerie" averages 4.8 attempts, falling into the hardest difficulty cluster. We further examined the percentage of loyal players and their propensity to undertake daily challenges. Our models underwent rigorous sensitivity analyses, including ADF, ACF, PACF tests, and cross-validation, confirming their robustness. Overall, our study provides a predictive framework for Wordle gameplay based on date or a given five-letter word. Results have been summarized and submitted to the Puzzle Editor of the New York Times.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2023
- DOI:
- arXiv:
- arXiv:2309.14250
- Bibcode:
- 2023arXiv230914250W
- Keywords:
-
- Statistics - Applications;
- Computer Science - Artificial Intelligence;
- Mathematics - Statistics Theory
- E-Print:
- 25 Pages, 28 Figures