Image data are often composed of two or more geometrically distinct constituents; in galaxy catalogs, for instance, one sees a mixture of pointlike structures (galaxy superclusters) and curvelike structures (filaments). It would be ideal to process a single image and extract two geometrically `pure' images, each one containing features from only one of the two geometric constituents. This seems to be a seriously underdetermined problem, but recent empirical work achieved highly persuasive separations. We present a theoretical analysis showing that accurate geometric separation of point and curve singularities can be achieved by minimizing the $\ell_1$ norm of the representing coefficients in two geometrically complementary frames: wavelets and curvelets. Driving our analysis is a specific property of the ideal (but unachievable) representation where each content type is expanded in the frame best adapted to it. This ideal representation has the property that important coefficients are clustered geometrically in phase space, and that at fine scales, there is very little coherence between a cluster of elements in one frame expansion and individual elements in the complementary frame. We formally introduce notions of cluster coherence and clustered sparsity and use this machinery to show that the underdetermined systems of linear equations can be stably solved by $\ell_1$ minimization; microlocal phase space helps organize the calculations that cluster coherence requires.