Optimization of Logical Query Plans: Additional notes

This page lists various comments on solutions provided by the students to the Optimization of Logical Query Plans exercises. The goal is to provide hints at common mistakes, or on how to improve the solutions.

Excerpts of the originally submitted and incorrect solutions are marked as follows:

Original solution
etc.

Relational Algebra and Fields Naming Convention

Exercise 4.1
The relations have the following schema:
S(C), T(D, E), U(F, G)

$Q_1(x,y) \leftarrow S(x), T(x,3), U(x,y)$

SPJ: $\pi_{\text{C, G}} \sigma_{\text{S.C} = \text{T.D} \wedge \text{S.C} = \text{U.F} \wedge \text{T.E} = 3}
( S \times T \times U )$

In the relational algebra, without any renaming, the fields do not have a relation name as a prefix. For instance, given the relations $A(X,Y)$ and $B(Z)$, the following are valid expressions:

$\pi_Z ( \sigma_{X = Z} ( A \times B ) )$
$\pi_{B.Z} ( \sigma_{A.X = B.Z} ( \rho_{A}(A) \times \rho_{B}(B) ) )$

But not:

$\pi_{B.Z} ( \sigma_{A.X = B.Z} ( A \times B ) )$

In the submitted solution, the SPJ should not mention relation names in $\sigma$.

Conjunctive Queries

Matchings

A matching of a query into a database is a mapping of each variable in the query to a value in the database, in such a way that each atom of the query corresponds to an element of the database. For instance, consider the following queries:

Q₁(x) ← A(x, y)
Q₂(w,z) ← A(w, x), B(x, y), B(y, z)

and the database D that contains the following relations:

A: { (1, 2) }
B: { (2, 3), (3, 4), (2, 6) (6, 7) }

We can see that m₁ = { x: 1, y: 2 } is a matching of Q₁ into D. In particular A(x, y) for x = 1 and y = 2 occurs in the database D. Similarly, m₂ = { w: 1, x: 2, y: 3, z: 4 } is a matching of Q₂ into D, and in particular A(1, 2) (corresponding to A(w,x) ) occurs in D, as well as B(2, 3), and B(3, 4).

In general many different matchings exist for a same query into a same database. For instance, m₃ = { w: 1, x: 2, y: 6, z: 7 } is also a matching of Q₂ into D.

For a given matching, an individual result of the query on the database is obtained by replacing the variables in the head of the query by the corresponding constants. For instance, m₁ leads to the tuple (x) for x = 1 as a result of Q₁ on D. Similarly, m₂ leads to the tuple (w, z) for w = 1, and z = 4 as a result of Q₂ on D.

The result of the query on the database is the set of all individual results of the query. In particular, for Q₂ on D, we have as the result the set { (1, 4), (1, 7) }.

Building Matchings

Building matchings by hands can be tedious, as you may need to try and associate each atom of the query with each element of the database. The following is a list of tricks that can help speeding up the process. The essence of these is to construct a mapping by considering each atom of the query in turn, and checking the equalities they must imply.

To test whether a tuple pertains to the result of the query, you can start by mapping the variables of the head of the query to the constants in the tuple.

If a relation contains just one element, then all the atoms of the queries that consider this relation must be mapped to that element.

If the current mapping makes it so that a given atom can only map to one element in the database, then the remaining variables that the atom refers to can be mapped to the corresponding constants in that element.

These rules of thumb ensure that a deterministic choice is made whenever it can be. However, in many circumstances an atom can be bound to many different elements of the database; then all possible associations of atoms and elements must be considered, one at a time.

Also note that for some queries, no matching can be built. This can be detected, for instance, if building the matching would require a variable to take two different values or if no value is available at all for a given variable.

As an example, let us building all matchings of Q₂ into the database D from the example above. Since the relation A only contains the pair (1, 2) any matching of Q₂ into D must contain { w: 1, x : 2 }. We now consider B(x, y), with the constraint that x = 2. There are two elements of the database satisfying this constraint: (2, 3) and (2, 6). We must consider both of them in turn.

For (2, 3) the matching would contain { w: 1, x: 2, y : 3}. We now

consider B(y, z) with y = 3. There is only one element of the database satisfying this condition, which is (3, 4). The corresponding matching is { w: 1, x: 2, y: 3, z: 4 } and the individual result is (1, 4).

For (2, 6) the matching would contain { w: 1, x: 2, y : 6}. We now

consider B(y, z) with y = 6. There is only one element of the database satisfying this condition, which is (6, 7). The corresponding matching is { w: 1, x: 2, y: 6, z: 7 } and the individual result is (1, 7).

Common Mistakes

Exercise 5.

$Q_2(x,y) \leftarrow Q(x,a), Q(a,b), Q(b,c), Q(c,y)$
$Q_3(x,y) \leftarrow Q(x,a), Q(a,1), Q(1,b), Q(b,y)$

$Q_2 \subseteq Q_3$ ?

We build the following matching from $Q_3$ into the canonical database $D(Q_2)$:
$\{ x \rightarrow x_c,
a \rightarrow a_c,
1 \rightarrow b_c,
b \rightarrow c_c,
y \rightarrow y_c \}$
and hence we have $Q_2 \subseteq Q_3$

…
Exercise 6.

$Q_1(x,z) \leftarrow R(x,y), R(y,w), R(y,z)$
$Q_1'(x,z) \leftarrow R(x,y), R(y,z)$
$Q_1' \subseteq Q_1$ ?

We build the following matching from $Q_1$ into the canonical database $D(Q_1')$:
$\{ x \rightarrow x_c, y \rightarrow y_c, w \rightarrow z_c \}$
and hence we have $Q_1' \subseteq Q_1$

When an atom of the query contains a constant, that constant must appear as it is in the corresponding element of the database. For instance, it is wrong to map constant 1 to some other constant.

No variable of the query can be left unmapped (here, the variable $z$ is left unmapped, but can be mapped to $z_c$).

Conjunctive Queries Inclusion Test

Exercise 6.

$Q_1(x,z) \leftarrow R(x,y), R(y,w), R(y,z)$
$Q_1'(x,z) \leftarrow R(x,y), R(y,z)$
$Q_1' \subseteq Q_1$ ?

We build the following matching from $Q_1$ into the canonical database $D(Q_1')$:
$\{ x \rightarrow x_c, y \rightarrow y_c, w \rightarrow z_c, z \rightarrow z_c \}$

To test the inclusion of a conjunctive query Q into Q' you need to:

1. Construct the canonical database D of Q.

2. Try to find a matching of Q' into D such that the corresponding individual result is the header of Q where variables are seen as constants.

When writing down exercises solutions, it is recommended that you indicate the reason for not being able to build a matching. For instance you could indicate that a matching would need to associate a variable with two different values.

In the provided solution, the answer should also mention that $(x_c, z_c)$ is in the result of $Q_1$ on $D(Q_1')$.

Optimizing Conjunctive Queries

Exercise 6.2

$Q_2(x,y) \leftarrow R(x,z), R(y,w), R(a, w), R(x, y)$

Let $Q_2'(x,y) \leftarrow R(y,w), R(a, w), R(x, y)$, and let us check whether
$Q_2' \subseteq Q_2$.

Then $\{ x \rightarrow x_c, y \rightarrow y_c, z \rightarrow y_c, w \rightarrow w_c, a \rightarrow a_c \}$ is a matching of $Q_2$ into $D(Q_2')$.
Hence $Q_2' \equiv Q_2$.

Let $Q_2^{\prime\prime}(x,y) \leftarrow R(a, w), R(x, y)$, and let us check
whether $Q_2^{\prime\prime} \subseteq Q_2$.

Then for any matching, the variable $y$ must refer at the same time
to the constants $y_c$ (from the head) and $a_c$ (from $R(y, w)$ and
$R(a,w)$). Therefore, $(x_c,y_c)$ is not part of the result of $Q_2$ on
$D(Q_2^{\prime\prime})$, and $Q_2^{\prime\prime} \not\equiv Q_2$.

When optimizing conjunctive queries, you can reuse the optimized expression when optimizing further: for instance, if you have optimized $Q$ into $Q'$ and you want to check whether $Q^{\prime\prime}$ is even more optimal, you can test $Q^{\prime\prime}$ against $Q'$. Indeed, while $Q'$ is smaller that $Q$, it is strictly equivalent. However, since it is smaller, constructing a matching is usually easier.