We propose a hierarchical interference mitigation scheme for massive MIMO cellular networks. The MIMO precoder at each base station (BS) is partitioned into an inner precoder and an outer precoder. The inner precoder controls the intra-cell interference and is adaptive to local channel state information (CSI) at each BS (CSIT). The outer precoder controls the inter-cell interference and is adaptive to channel statistics. Such hierarchical precoding structure reduces the number of pilot symbols required for CSI estimation in massive MIMO downlink and is robust to the backhaul latency. We study joint optimization of the outer precoders, the user selection, and the power allocation to maximize a general concave utility which has no closed-form expression. We first apply random matrix theory to obtain an approximated problem with closed-form objective. We show that the solution of the approximated problem is asymptotically optimal with respect to the original problem as the number of antennas per BS grows large. Then using the hidden convexity of the problem, we propose an iterative algorithm to find the optimal solution for the approximated problem. We also obtain a low complexity algorithm with provable convergence. Simulations show that the proposed design has significant gain over various state-of-the-art baselines.