This paper studies deep learning methodologies for portfolio optimization in the US equities market. We present a novel residual switching network that can automatically sense changes in market regimes and switch between momentum and reversal predictors accordingly. The residual switching network architecture combines two separate residual networks (ResNets), namely a switching module that learns stock market conditions, and the main module that learns momentum and reversal predictors. We demonstrate that over-fitting noisy financial data can be controlled with stacked residual blocks and further incorporating the attention mechanism can enhance powerful predictive properties. Over the period 2008 to H12017, the residual switching network (Switching-ResNet) strategy verified superior out-of-sample performance with an average annual Sharpe ratio of 2.22, compared with an average annual Sharpe ratio of 0.81 for the ANN-based strategy and 0.69 for the linear model.