Face photo-sketch synthesis aims at generating a facial sketch/photo conditioned on a given photo/sketch. It is of wide applications including digital entertainment and law enforcement. Precisely depicting face photos/sketches remains challenging due to the restrictions on structural realism and textural consistency. While existing methods achieve compelling results, they mostly yield blurred effects and great deformation over various facial components, leading to the unrealistic feeling of synthesized images. To tackle this challenge, in this work, we propose to use the facial composition information to help the synthesis of face sketch/photo. Specially, we propose a novel composition-aided generative adversarial network (CA-GAN) for face photo-sketch synthesis. In CA-GAN, we utilize paired inputs including a face photo/sketch and the corresponding pixel-wise face labels for generating a sketch/photo. In addition, to focus training on hard-generated components and delicate facial structures, we propose a compositional reconstruction loss. Finally, we use stacked CA-GANs (SCA-GAN) to further rectify defects and add compelling details. Experimental results show that our method is capable of generating both visually comfortable and identity-preserving face sketches/photos over a wide range of challenging data. Our method achieves the state-of-the-art quality, reducing best previous Frechet Inception distance (FID) by a large margin. Besides, we demonstrate that the proposed method is of considerable generalization ability. We have made our code and results publicly available: https://fei-hdu.github.io/ca-gan/.