Natural images are often the superposition of various parts of different geometric characteristics. For instance, an image might be a mixture of cartoon and texture structures. In addition, images are often given with missing data. In this paper, we develop a method for simultaneously decomposing an image to its two underlying parts and inpainting the missing data. Our separation inpainting method is based on and $l_1$ minimization approach, using two dictionaries, each sparsifying one of the image parts but not the other. We introduce a comprehensive convergence analysis of our method, in a general setting, utilizing the concepts of joint concentration, clustered sparsity, and cluster coherence. As the main application of our theory, we consider the problem of separating and inpainting an image to a cartoon and texture parts.