Most COVID-19 studies commonly report figures of the overall infection at a state- or county-level, reporting the aggregated number of cases in a particular region at one time. This aggregation tends to miss out on fine details of the propagation patterns of the virus. This paper is motivated by analyzing a high-resolution COVID-19 dataset in Cali, Colombia, that provides every confirmed case's exact location and time information, offering vital insights for the spatio-temporal interaction between individuals concerning the disease spread in a metropolis. We develop a non-stationary spatio-temporal point process, assuming that previously infected cases trigger newly confirmed ones, and introduce a neural network-based kernel to capture the spatially varying triggering effect. The neural network-based kernel is carefully crafted to enhance expressiveness while maintaining results interpretability. We also incorporate some exogenous influences imposed by city landmarks. The numerical results on real data demonstrate good predictive performances of our method compared to the state-of-the-art as well as its interpretable findings.