Self-supervised speech representation and contextual text embedding for match-mismatch classification with EEG recording
Abstract
Relating speech to EEG holds considerable importance but is challenging. In this study, a deep convolutional network was employed to extract spatiotemporal features from EEG data. Self-supervised speech representation and contextual text embedding were used as speech features. Contrastive learning was used to relate EEG features to speech features. The experimental results demonstrate the benefits of using self-supervised speech representation and contextual text embedding. Through feature fusion and model ensemble, an accuracy of 60.29% was achieved, and the performance was ranked as No.2 in Task 1 of the Auditory EEG Challenge (ICASSP 2024). The code to implement our work is available on Github: https://github.com/bobwangPKU/EEG-Stimulus-Match-Mismatch.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2024
- DOI:
- 10.48550/arXiv.2401.04964
- arXiv:
- arXiv:2401.04964
- Bibcode:
- 2024arXiv240104964W
- Keywords:
-
- Electrical Engineering and Systems Science - Signal Processing;
- Computer Science - Sound;
- Electrical Engineering and Systems Science - Audio and Speech Processing
- E-Print:
- 2 pages, 2 figures, accepted by ICASSP 2024