Is In-Context Universality Enough? MLPs are Also Universal In-Context
Abstract
The success of transformers is often linked to their ability to perform in-context learning. Recent work shows that transformers are universal in context, capable of approximating any real-valued continuous function of a context (a probability measure over $\mathcal{X}\subseteq \mathbb{R}^d$) and a query $x\in \mathcal{X}$. This raises the question: Does in-context universality explain their advantage over classical models? We answer this in the negative by proving that MLPs with trainable activation functions are also universal in-context. This suggests the transformer's success is likely due to other factors like inductive bias or training stability.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2025
- DOI:
- arXiv:
- arXiv:2502.03327
- Bibcode:
- 2025arXiv250203327K
- Keywords:
-
- Statistics - Machine Learning;
- Computer Science - Machine Learning;
- Computer Science - Neural and Evolutionary Computing;
- Mathematics - Numerical Analysis;
- Mathematics - Probability