Censored expectation maximization algorithm for mixtures: Application to intertrade waiting times
Abstract
In a previous analysis the problem induced by "zero-inflation" in time series data (caused by high frequency trading in the electronic order book) was handled by left-truncating the waiting times between consecutive limit orders . We demonstrated, using rigorous statistical methods, that the truncated Weibull distribution describes the corresponding stochastic dynamics for the entire range of inter-arrival limit order waiting times, except for a region close to zero. However, since the truncated Weibull distribution was not able to describe the prodigious "zero-inflated" probability mass in the neighbourhood of zero (making up approximately 50% of the data for limit orders), it became clear that the entire probability distribution must be a mixture distribution of which the Weibull distribution is a significant part. To investigate this idea, we use a "censored expectation-maximization algorithm" to analyse the intertrade waiting times data for four selected stocks trading on the London Stock Exchange. The intertrade waiting times usually have a much lower percentage of zero inflation, typically around 2.5%. Making use of this new method and testing various mixture models, we show that the desired mixture consists of the Weibull distribution with the universal shape parameter of β ≃ 0 . 57 plus an additional exponential distribution. This is the same value for the shape parameter found already in our previous study. The "1 exponential + 1 Weibull" mixture describes the intertrade waiting times extremely well at all time scales. While the Weibull component dominates in the transition and tail regions the exponential distribution explains the "zero-inflated" excess mass.
- Publication:
-
Physica A Statistical Mechanics and its Applications
- Pub Date:
- February 2022
- DOI:
- Bibcode:
- 2022PhyA..58726456K
- Keywords:
-
- Mixture distributions;
- Censored expectation maximization;
- Intertrade waiting times;
- Zero-inflated data;
- Information criteria;
- Model selection;
- Stochastic processes;
- Tick-by-tick data