While depth sensors are becoming increasingly popular, their spatial resolution often remains limited. Depth super-resolution therefore emerged as a solution to this problem. Despite much progress, state-of-the-art techniques suffer from two drawbacks: (i) they rely on the assumption that intensity edges coincide with depth discontinuities, which, unfortunately, is only true in controlled environments; and (ii) they typically exploit the availability of high-resolution training depth maps, which can often not be acquired in practice due to the sensors' limitations. By contrast, here, we introduce an approach to performing depth super-resolution in more challenging conditions, such as in outdoor scenes. To this end, we first propose to exploit semantic information to better constrain the super-resolution process. In particular, we design a co-sparse analysis model that learns filters from joint intensity, depth and semantic information. Furthermore, we show how low-resolution training depth maps can be employed in our learning strategy. We demonstrate the benefits of our approach over state-of-the-art depth super-resolution methods on two outdoor scene datasets.