The problem of excessive pilot overhead required for uplink massive MIMO channel estimation is well known, let alone when it is considered along with wideband (OFDM) transmissions. Towards channel estimators that are both efficient and require low-training overhead, compressive sensing (CS) approaches have been increasingly popular, exploiting the sparse nature of the physical channel. However, no analytical insights regarding the overhead required for reliable channel estimation in wideband massive MIMO are available. By observing that the wideband massive MIMO channel can be represented by a vector that is not simply sparse but has well defined structural properties, referred to as hierarchical sparsity, we propose low complexity channel estimators for the multiuser scenario that take this property into account. By employing the framework of the hierarchical restricted isometry property, rigorous performance guarantees for these algorithms are provided suggesting concrete design goals for the user pilot sequences. For a specific design, we analytically characterize the scaling of the required pilot overhead with increasing number of antennas and bandwidth, revealing that, as long as the number of antennas is sufficiently large, it is independent of the per user channel sparsity level as well as the number of active users. Hence, surprisingly, in contrast to the classical setting, pilot overhead can be shifted into spatial dimensions not affecting crucial bandwidth constraints thereby increasing the overall system capacity. These analytical insights are verified by simulation results demonstrating also the superiority of the proposed algorithm over conventional CS algorithms that ignore the hierarchical sparsity property.