Monocular scene reconstruction is essential for modern applications such as robotics or autonomous driving. Although stereo methods usually result in better accuracy than monocular methods, they are more expensive and more difficult to calibrate. In this work, we present a novel second order optimal minimum energy filter that jointly estimates the camera motion, the disparity map and also higher order kinematics recursively on a product Lie group containing a novel disparity group. This mathematical framework enables to cope with non-Euclidean state spaces, non-linear observations and high dimensions which is infeasible for most classical filters. To be robust against outliers, we use a generalized Charbonnier energy function in this framework rather than a quadratic energy function as proposed in related work. Experiments confirm that our method enables accurate reconstructions on-par with state-of-the-art.