Size bounds and query plans for relational joins
Abstract
Relational joins are at the core of relational algebra, which in turn is the core of the standard database query language SQL. As their evaluation is expensive and very often dominated by the output size, it is an important task for database query optimisers to compute estimates on the size of joins and to find good execution plans for sequences of joins. We study these problems from a theoretical perspective, both in the worstcase model, and in an averagecase model where the database is chosen according to a known probability distribution. In the former case, our first key observation is that the worstcase size of a query is characterised by the fractional edge cover number of its underlying hypergraph, a combinatorial parameter previously known to provide an upper bound. We complete the picture by proving a matching lower bound, and by showing that there exist queries for which the joinproject plan suggested by the fractional edge cover approach may be substantially better than any join plan that does not use intermediate projections. On the other hand, we show that in the averagecase model, every joinproject plan can be turned into a plan containing no projections in such a way that the expected time to evaluate the plan increases only by a constant factor independent of the size of the database. Not surprisingly, the key combinatorial parameter in this context is the maximum density of the underlying hypergraph. We show how to make effective use of this parameter to eliminate the projections.
 Publication:

arXiv eprints
 Pub Date:
 November 2017
 DOI:
 10.48550/arXiv.1711.03860
 arXiv:
 arXiv:1711.03860
 Bibcode:
 2017arXiv171103860A
 Keywords:

 Computer Science  Databases
 EPrint:
 Conference version in FOCS 2008