A sparse Bayesian hierarchical vector autoregressive model for microbial dynamics in a wastewater treatment plant
Proper function of a wastewater treatment plant (WWTP) relies on maintaining a delicate balance between a multitude of competing microorganisms. Gaining a detailed understanding of the complex network of interactions therein is essential to maximising not only current operational efficiencies, but also for the effective design of new treatment technologies. Metagenomics offers an insight into these dynamic systems through the analysis of the microbial DNA sequences present. Unique taxa are inferred through sequence clustering to form operational taxonomic units (OTUs), with per-taxa abundance estimates obtained from corresponding sequence counts. The data in this study comprise weekly OTU counts from an activated sludge (AS) tank of a WWTP. To model the OTU dynamics, we develop a Bayesian hierarchical vector autoregressive model, which is a linear approximation to the commonly used generalised Lotka-Volterra (gLV) model. To tackle the high dimensionality and sparsity of the data, they are first clustered into 12 "bins" using a seasonal phase-based approach. The autoregressive coefficient matrix is assumed to be sparse, so we explore different shrinkage priors by analysing simulated data sets before selecting the regularised horseshoe prior for the biological application. We find that ammonia and chemical oxygen demand have a positive relationship with several bins and pH has a positive relationship with one bin. These results are supported by findings in the biological literature. We identify several negative interactions, which suggests OTUs in different bins may be competing for resources and that these relationships are complex. We also identify two positive interactions. Although simpler than a gLV model, our vector autoregression offers valuable insight into the microbial dynamics of the WWTP.