Language Model Decomposition: Quantifying the Dependency and Correlation of Language Models

doi:10.48550/arXiv.2210.10289

Language Model Decomposition: Quantifying the Dependency and Correlation of Language Models

Zhang, Hao

Pre-trained language models (LMs), such as BERT (Devlin et al., 2018) and its variants, have led to significant improvements on various NLP tasks in past years. However, a theoretical framework for studying their relationships is still missing. In this paper, we fill this gap by investigating the linear dependency between pre-trained LMs. The linear dependency of LMs is defined analogously to the linear dependency of vectors. We propose Language Model Decomposition (LMD) to represent a LM using a linear combination of other LMs as basis, and derive the closed-form solution. A goodness-of-fit metric for LMD similar to the coefficient of determination is defined and used to measure the linear dependency of a set of LMs. In experiments, we find that BERT and eleven (11) BERT-like LMs are 91% linearly dependent. This observation suggests that current state-of-the-art (SOTA) LMs are highly "correlated". To further advance SOTA we need more diverse and novel LMs that are less dependent on existing LMs.

Publication:

arXiv e-prints

Pub Date:

October 2022

DOI:

10.48550/arXiv.2210.10289

arXiv:

arXiv:2210.10289

Bibcode:

2022arXiv221010289Z

Keywords:

Computer Science - Computation and Language;
Computer Science - Artificial Intelligence;
Computer Science - Machine Learning;
68T50 (Primary) 68T30;
68T07 (Secondary);
I.2.7

E-Print:

accepted by EMNLP 2022

NASA/ADS

Language Model Decomposition: Quantifying the Dependency and Correlation of Language Models

Abstract