How can we accurately compare different community detection algorithms? These algorithms cluster nodes in a given network, and their performance is often validated on benchmark networks with explicit ground-truth communities. Given the lack of cluster labels in real-world networks, a model that generates realistic networks is required for accurate evaluation of these algorithm. In this paper, we present a simple, intuitive, and flexible benchmark generator to generate intrinsically modular networks for community validation. We show how the generated networks closely comply with the characteristics observed for real networks; whereas their characteristics could be directly controlled to match wide range of real world networks. We further show how common community detection algorithms rank differently when being evaluated on these benchmarks compared to current available alternatives.