Delineating the associations between images and a vector of covariates is of central interest in medical imaging studies. To tackle this problem of image response regression, we propose a novel nonparametric approach in the framework of spatially varying coefficient models, where the spatially varying functions are estimated through deep neural networks. Compared to existing solutions, the proposed method explicitly accounts for spatial smoothness and subject heterogeneity, has straightforward interpretations, and is highly flexible and accurate in capturing complex association patterns. A key idea in our approach is to treat the image voxels as the effective samples, which not only alleviates the limited sample size issue that haunts the majority of medical imaging studies, but also leads to more robust and reproducible results. Focusing on a broad family of piecewise smooth functions, we establish the estimation and selection consistency, and derive the asymptotic error bounds. We demonstrate the efficacy of the method through intensive simulations, and further illustrate its advantages with analyses of two functional magnetic resonance imaging datasets.