SoftCTC -- Semi-Supervised Learning for Text Recognition using Soft Pseudo-Labels

doi:10.48550/arXiv.2212.02135

SoftCTC -- Semi-Supervised Learning for Text Recognition using Soft Pseudo-Labels

This paper explores semi-supervised training for sequence tasks, such as Optical Character Recognition or Automatic Speech Recognition. We propose a novel loss function $\unicode{x2013}$ SoftCTC $\unicode{x2013}$ which is an extension of CTC allowing to consider multiple transcription variants at the same time. This allows to omit the confidence based filtering step which is otherwise a crucial component of pseudo-labeling approaches to semi-supervised learning. We demonstrate the effectiveness of our method on a challenging handwriting recognition task and conclude that SoftCTC matches the performance of a finely-tuned filtering based pipeline. We also evaluated SoftCTC in terms of computational efficiency, concluding that it is significantly more efficient than a naïve CTC-based approach for training on multiple transcription variants, and we make our GPU implementation public.

Publication:

arXiv e-prints

Pub Date:

December 2022

DOI:

10.48550/arXiv.2212.02135

arXiv:

arXiv:2212.02135

Bibcode:

2022arXiv221202135K

Keywords:

Computer Science - Machine Learning;
Computer Science - Computer Vision and Pattern Recognition;
68T07;
68T10

E-Print:

21 pages, 8 figures, 6 tables, accepted to International Journal on Document Analysis and Recognition (IJDAR)

NASA/ADS

SoftCTC -- Semi-Supervised Learning for Text Recognition using Soft Pseudo-Labels

Abstract