The article is about algorithms for learning Bayesian hierarchical models, the computational complexity of which scales linearly with the number of observations and the number of parameters in the model. It focuses on crossed random effect and nested multilevel models, which are used ubiquitously in applied sciences, and illustrates the methodology on two challenging real data analyses on predicting electoral results and real estate prices respectively. The posterior dependence in both classes is sparse: in crossed random effects models it resembles a random graph, whereas in nested multilevel models it is tree-structured. For each class we develop a framework for scalable computation. We provide a number of negative (for crossed) and positive (for nested) results for the scalability (or lack thereof) of methods based on sparse linear algebra, which are relevant also to Laplace approximation methods for such models. Our numerical experiments compare with off-the-shelf variational approximations and Hamiltonian Monte Carlo. Our theoretical results, although partial, are useful in suggesting interesting methodologies and lead to conclusions that our numerics suggest to hold well beyond the scope of the underlying assumptions.