Censored expectation maximization algorithm for mixtures: Application to intertrade waiting times
Abstract
In a previous analysis the problem induced by "zeroinflation" in time series data (caused by high frequency trading in the electronic order book) was handled by lefttruncating the waiting times between consecutive limit orders . We demonstrated, using rigorous statistical methods, that the truncated Weibull distribution describes the corresponding stochastic dynamics for the entire range of interarrival limit order waiting times, except for a region close to zero. However, since the truncated Weibull distribution was not able to describe the prodigious "zeroinflated" probability mass in the neighbourhood of zero (making up approximately 50% of the data for limit orders), it became clear that the entire probability distribution must be a mixture distribution of which the Weibull distribution is a significant part. To investigate this idea, we use a "censored expectationmaximization algorithm" to analyse the intertrade waiting times data for four selected stocks trading on the London Stock Exchange. The intertrade waiting times usually have a much lower percentage of zero inflation, typically around 2.5%. Making use of this new method and testing various mixture models, we show that the desired mixture consists of the Weibull distribution with the universal shape parameter of β ≃ 0 . 57 plus an additional exponential distribution. This is the same value for the shape parameter found already in our previous study. The "1 exponential + 1 Weibull" mixture describes the intertrade waiting times extremely well at all time scales. While the Weibull component dominates in the transition and tail regions the exponential distribution explains the "zeroinflated" excess mass.
 Publication:

Physica A Statistical Mechanics and its Applications
 Pub Date:
 February 2022
 DOI:
 10.1016/j.physa.2021.126456
 Bibcode:
 2022PhyA..58726456K
 Keywords:

 Mixture distributions;
 Censored expectation maximization;
 Intertrade waiting times;
 Zeroinflated data;
 Information criteria;
 Model selection;
 Stochastic processes;
 Tickbytick data