Predictive Coding Networks (PCNs) aim to learn a generative model of the world. Given observations, this generative model can then be inverted to infer the causes of those observations. However, when training PCNs, a noticeable pathology is often observed where inference accuracy peaks and then declines with further training. This cannot be explained by overfitting since both training and test accuracy decrease simultaneously. Here we provide a thorough investigation of this phenomenon and show that it is caused by an imbalance between the speeds at which the various layers of the PCN converge. We demonstrate that this can be prevented by regularising the weight matrices at each layer: by restricting the relative size of matrix singular values, we allow the weight matrix to change but restrict the overall impact which a layer can have on its neighbours. We also demonstrate that a similar effect can be achieved through a more biologically plausible and simple scheme of just capping the weights.
- Pub Date:
- August 2022
- Computer Science - Artificial Intelligence;
- Quantitative Biology - Neurons and Cognition
- preprint of IWAI 2022 conference paper. this version clarifies comments in final paragraph of section 3