ConvNeXt Based Neural Network for Audio Anti-Spoofing

doi:10.48550/arXiv.2209.06434

ConvNeXt Based Neural Network for Audio Anti-Spoofing

With the rapid development of speech conversion and speech synthesis algorithms, automatic speaker verification (ASV) systems are vulnerable to spoofing attacks. In recent years, researchers had proposed a number of anti-spoofing methods based on hand-crafted features. However, using hand-crafted features rather than raw waveform will lose implicit information for anti-spoofing. Inspired by the promising performance of ConvNeXt in image classification tasks, we revise the ConvNeXt network architecture and propose a lightweight end-to-end anti-spoofing model. By integrating with the channel attention block and using the focal loss function, the proposed model can focus on the most informative sub-bands of speech representations and the difficult samples that are hard to classify. Experiments show that our proposed system could achieve an equal error rate of 0.64% and min-tDCF of 0.0187 for the ASVSpoof 2019 LA evaluation dataset, which outperforms the state-of-the-art systems.

Publication:

arXiv e-prints

Pub Date:

September 2022

DOI:

10.48550/arXiv.2209.06434

arXiv:

arXiv:2209.06434

Bibcode:

2022arXiv220906434M

Keywords:

Computer Science - Sound;
Computer Science - Computation and Language;
Electrical Engineering and Systems Science - Audio and Speech Processing

E-Print:

6 pages

NASA/ADS

ConvNeXt Based Neural Network for Audio Anti-Spoofing

Abstract