Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks

doi:10.48550/arXiv.2009.08445

Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks

Self-supervised pre-training of transformer models has revolutionized NLP applications. Such pre-training with language modeling objectives provides a useful initial point for parameters that generalize well to new tasks with fine-tuning. However, fine-tuning is still data inefficient -- when there are few labeled examples, accuracy can be low. Data efficiency can be improved by optimizing pre-training directly for future fine-tuning with few examples; this can be treated as a meta-learning problem. However, standard meta-learning techniques require many training tasks in order to generalize; unfortunately, finding a diverse set of such supervised tasks is usually difficult. This paper proposes a self-supervised approach to generate a large, rich, meta-learning task distribution from unlabeled text. This is achieved using a cloze-style objective, but creating separate multi-class classification tasks by gathering tokens-to-be blanked from among only a handful of vocabulary terms. This yields as many unique meta-training tasks as the number of subsets of vocabulary terms. We meta-train a transformer model on this distribution of tasks using a recent meta-learning framework. On 17 NLP tasks, we show that this meta-training leads to better few-shot generalization than language-model pre-training followed by finetuning. Furthermore, we show how the self-supervised tasks can be combined with supervised tasks for meta-learning, providing substantial accuracy gains over previous supervised meta-learning.

Publication:

arXiv e-prints

Pub Date:

September 2020

DOI:

10.48550/arXiv.2009.08445

arXiv:

arXiv:2009.08445

Bibcode:

2020arXiv200908445B

Keywords:

Computer Science - Computation and Language;
Computer Science - Machine Learning

E-Print:

To appear in EMNLP 2020, camera-ready, link to code added

ADS

Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks

Abstract