Many problems in geometric optics or convex geometry can be recast as optimal transport problems: this includes the far-field reflector problem, Alexandrov's curvature prescription problem, etc. A popular way to solve these problems numerically is to assume that the source probability measure is absolutely continuous while the target measure is finitely supported. We refer to this setting as semi-discrete optimal transport. Among the several algorithms proposed to solve semi-discrete optimal transport problems, one currently needs to choose between algorithms that are slow but come with a convergence speed analysis (e.g. Oliker-Prussner) or algorithms that are much faster in practice but which come with no convergence guarantees Algorithms of the first kind rely on coordinate-wise increments and the number of iterations required to reach the solution up to an error of $\epsilon$ is of order $N^3/\epsilon$, where $N$ is the number of Dirac masses in the target measure. On the other hand, algorithms of the second kind typically rely on the formulation of the semi-discrete optimal transport problem as an unconstrained convex optimization problem which is solved using a Newton or quasi-Newton method. The purpose of this article is to bridge this gap between theory and practice by introducing a damped Newton's algorithm which is experimentally efficient and by proving the global convergence of this algorithm with optimal rates. The main assumptions is that the cost function satisfies a condition that appears in the regularity theory for optimal transport (the Ma-Trudinger-Wang condition) and that the support of the source density is connected in a quantitative way (it must satisfy a weighted Poincaré-Wirtinger inequality).