A deep-learning based hybrid strategy for short-term load forecasting is presented. The strategy proposes a novel tree-based ensemble method Warm-start Gradient Tree Boosting (WGTB). Current strategies either ensemble submodels of a single type, which fail to take advantage of statistical strengths of different inference models. Or they simply sum the outputs from completely different inference models, which doesn't maximize the potential of ensemble. WGTB is thus proposed and tailored to the great disparity among different inference models in accuracy, volatility and linearity. The complete strategy integrates four different inference models (i.e., auto-regressive integrated moving average, nu support vector regression, extreme learning machine and long short-term memory neural network), both linear and nonlinear models. WGTB then ensembles their outputs by hybridizing linear estimator ElasticNet and nonlinear estimator ExtraTree via boosting algorithm. It is validated on the real historical data of a grid from State Grid Corporation of China of hourly resolution. The result demonstrates the effectiveness of the proposed strategy that hybridizes statistical strengths of both linear and nonlinear inference models.