HRTF upsampling with a generative adversarial network using a gnomonic equiangular projection
Abstract
An individualised head-related transfer function (HRTF) is very important for creating realistic virtual reality (VR) and augmented reality (AR) environments. However, acoustically measuring high-quality HRTFs requires expensive equipment and an acoustic lab setting. To overcome these limitations and to make this measurement more efficient HRTF upsampling has been exploited in the past where a high-resolution HRTF is created from a low-resolution one. This paper demonstrates how generative adversarial networks (GANs) can be applied to HRTF upsampling. We propose a novel approach that transforms the HRTF data for direct use with a convolutional super-resolution generative adversarial network (SRGAN). This new approach is benchmarked against three baselines: barycentric upsampling, spherical harmonic (SH) upsampling and an HRTF selection approach. Experimental results show that the proposed method outperforms all three baselines in terms of log-spectral distortion (LSD) and localisation performance using perceptual models when the input HRTF is sparse (less than 20 measured positions).
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2023
- DOI:
- 10.48550/arXiv.2306.05812
- arXiv:
- arXiv:2306.05812
- Bibcode:
- 2023arXiv230605812H
- Keywords:
-
- Electrical Engineering and Systems Science - Audio and Speech Processing;
- Computer Science - Computer Vision and Pattern Recognition;
- Computer Science - Human-Computer Interaction;
- Computer Science - Machine Learning;
- Computer Science - Sound;
- Electrical Engineering and Systems Science - Signal Processing
- E-Print:
- 15 pages, 9 figures, Preprint (Accepted to IEEE/ACM Transactions on Audio, Speech, and Language Processing on the 15 Feb 2024)