(Context) Monte Carlo radiative transfer (MCRT) is a widely used technique to model the interaction between radiation and a medium, and plays an important role in astrophysical modelling and when comparing those models with observations. (Aims) In this work, we present a novel approach to MCRT that addresses the challenging memory access patterns of traditional MCRT algorithms, which hinder optimal performance of MCRT simulations on modern hardware with a complex memory architecture. (Methods) We reformulate the MCRT photon packet life cycle as a task-based algorithm, whereby the computation is broken down into small tasks that are executed concurrently. Photon packets are stored in intermediate buffers, and tasks propagate photon packets through small parts of the computational domain, moving them from one buffer to another in the process. (Results) Using the implementation of the new algorithm in the photoionization MCRT code CMacIonize 2.0, we show that the decomposition of the MCRT grid into small parts leads to a significant performance gain during the photon packet propagation phase, which constitutes the bulk of an MCRT algorithm, as a result of better usage of memory caches. Our new algorithm is a factor 2 to 4 faster than an equivalent traditional algorithm and shows good strong scaling up to 30 threads. We briefly discuss how our new algorithm could be adjusted or extended to other astrophysical MCRT applications. (Conclusions) We show that optimising the memory access patterns of a memory-bound algorithm such as MCRT can yield significant performance gains.