An increasingly urgent task in analysis of networks is to develop statistical models that include contextual information in the form of covariates while respecting degree heterogeneity and sparsity. In this paper, we propose a new parameter-sparse random graph model for density-sparse directed networks, with parameters to explicitly account for all these features. The resulting objective function of our model is akin to that of the high-dimensional logistic regression, with the key difference that the probabilities are allowed to go to zero at a certain rate to accommodate sparse networks. We show that under appropriate conditions, an estimator obtained by the familiar penalized likelihood with an $\ell_1$ penalty to achieve parameter sparsity can alleviate the curse of dimensionality, and crucially is selection and rate consistent. Interestingly, inference on the covariate parameter can be conducted straightforwardly after the model fitting, without the need of the kind of debiasing commonly employed in $\ell_1$ penalized likelihood estimation. Simulation and data analysis corroborate our theoretical findings. In developing our model, we provide the first result highlighting the fallacy of what we call data-selective inference, a common practice of artificially truncating the sample by throwing away nodes based on their connections, by examining the estimation bias in the Erdös-Rényi model theoretically and in the stochastic block model empirically.