The skew-normal and the skew-$t$ distributions are parametric families which are currently under intense investigation since they provide a more flexible formulation compared to the classical normal and $t$ distributions by introducing a parameter which regulates their skewness. While these families enjoy attractive formal properties from the probability viewpoint, a practical problem with their usage in applications is the possibility that the maximum likelihood estimate of the parameter which regulates skewness diverges. This situation has vanishing probability for increasing sample size, but for finite samples it occurs with non-negligible probability, and its occurrence has unpleasant effects on the inferential process. Methods for overcoming this problem have been put forward both in the classical and in the Bayesian formulation, but their applicability is restricted to simple situations. We formulate a proposal based on the idea of penalized likelihood, which has connections with some of the existing methods, but it applies more generally, including in the multivariate case.