We present an extension of the multi-band galaxy fitting method scarlet which allows the joint modeling of astronomical images from different instruments, by performing simultaneous resampling and convolution. We introduce a fast and formally accurate linear projection operation that maps a pixelated model at a given resolution onto an observation frame with a different point spread function and pixel scale. We test our implementation against the well-tested resampling and convolution method in galsim on simulated images mimicking observations with the Euclid space telescope and the Vera C. Rubin Observatory, and find that it reduces interpolation errors by an order of magnitude or more compared to galsim default settings. Tests with a wide range of levels of blending show more accurate galaxy models from joint modeling of Euclid and Rubin images compared to separate modeling of each survey by up to an order of magnitude. Our results demonstrate, for the first time, the feasibility and utility of performing non-parametric pixel-level data fusion of overlapping imaging surveys. All results can be reproduced with the specific versions of the codes and notebooks used in this study.