Dynamical quantum simulation may be one of the first applications to see quantum advantage. However, the circuit depth of standard Trotterization methods can rapidly exceed the coherence time of noisy quantum computers. This has led to recent proposals for variational approaches to dynamical simulation. In this work, we aim to make variational dynamical simulation even more practical and near-term. We propose a new algorithm called Variational Hamiltonian Diagonalization (VHD), which approximately transforms a given Hamiltonian into a diagonal form that can be easily exponentiated. VHD allows for fast forwarding, i.e., simulation beyond the coherence time of the quantum computer with a fixed-depth quantum circuit. It also removes Trotterization error and allows simulation of the entire Hilbert space. We prove an operational meaning for the VHD cost function in terms of the average simulation fidelity. Moreover, we prove that the VHD cost function does not exhibit a shallow-depth barren plateau, i.e., its gradient does not vanish exponentially. Our proof relies on locality of the Hamiltonian, and hence we connect locality to trainability. Our numerical simulations verify that VHD can be used for fast-forwarding dynamics.