Relational algebra

In the theory of databases is meant by a relational algebra or a relational algebra is a formal language, which can be formulate queries over a relational schema. It allows you to link relations to each other or to reduce complex information and derive from it.

The relational algebra defines operations that can be applied to a set of relations. This allows, for example, link, filter, or rename relations. The results of all operations are also relations. For this reason, some sort of relational algebra as completed.

Their importance has the relational algebra as a theoretical foundation for query languages ​​in relational databases. Here the operations of relational algebra in so-called database operators can be implemented. If each operation of the relational algebra in the query language (at least) an expression can be implemented by, it is called relationally complete; the term may here associate multiple database operators. If each operation (just ) a database operator can be implemented by, it is called strictly relationally complete; so it must always exactly one database operator be included in one and the same -converting expression. If the condition of strict relational completeness also applies in the other direction, so it to any database operator is a corresponding operation of the relational algebra, then that means the query language equivalent to the relational algebra, in short: relational equivalent.

As there are for the relational algebra ( more ) minimal amounts of operations from which all other operations can be composed, it is sufficient for the ( strictly ) from relational completeness to compare the query language with these " basic operations ". This follows from the fact that the relational algebra itself is trivially equivalent and a minimum of system operations already fully described (in terms of operations). A common minimal system of operations consists of the following six operations: projection, selection, cross product, union, difference and renaming.

The relational algebra is often used because of its theoretical clarity as benchmarks for the cardinality or expressive power of query languages ​​, including by means of Vergleichsbegrifflichkeiten just described. However, one must not close their greater power of the greater proximity of a query language for relational algebra. Query languages ​​that are relationally complete or even strictly relationally complete, often have a much wider range of functions than would be possible by the sole implementation of relations algebra operations. For example, in the relational algebra is the possibility of forming the transitive closure of a relation, what is interesting about when referral relations, not given. From the strict relational completeness of a query language can be rather a Mindestfunktionaliät, of the relational equivalence rather close to a maximum functionality, while the non- strict relational completeness provides the fewest concrete information on the query language.

In contrast to the relational algebra the calculi is safe, ie it provides in finite time a finite result. A relational algebra is also an example of a procedural language; in contrast to calculi that are usually formalized as descriptive languages.

The invention of the relational algebra by Edgar F. Codd has revolutionized in the 1970s, the database world whose standard data models were the hierarchical model and the network database model at that time. This possibly was due to the work of Alfred Tarski algebra of binary relations. The database language SEQUEL, a precursor of today's SQL, QBE and QUEL was next to one of the first implementations of the ideas of the relational model, and thus the relational algebra.

  • 3.1 Null Values
  • 3.2 Grouping operator and aggregate functions
  • 3.3 NF ²
  • 3.4 multiset semantics

Operations

Set Operations

To be able to perform set operations on the relations R and S, both of which must be compatible with each other. The type compatibility of two relations is given when

  • R and S have the same degree ( Tupelelementanzahl )
  • The range of values ​​of the attributes of R and S is the same as

The type compatibility is also called union compatibility.

Union

The union R ∪ S all tuples of relation R with all tuples of relation S are combined into a single relation. Prerequisite for this is that R and S have the same relation schema. That is, they have the same attributes, and attribute types. Duplicates are deleted when the association.

Definition

Example

Prerequisite

  • Association compatibility of R and S

Intersection ( Intersection )

The result of the average operation R ∩ S are all the tuples that can be found both in R and in S. The average sales volume can also be expressed by the set difference: R ∩ S = R \ (R \ S)

Definition

Example

Prerequisite

  • Association compatibility of R and S

Difference

In operation R \ S or R - S all tuples are removed from the first relation R, which are also present in the second relation S. The difference ( as well as the symmetric difference ) is not a monotonic operation, therefore, the relational algebra (e.g., data log ) is not monotonous relative to other declarative query languages ​​.

Definition

Example

Prerequisite

  • Association compatibility of R and S

Symmetric difference

In the symmetric difference △ R S is the set of all tuples that are contained in either R or S, but not in both simultaneously.

The operation can be derived from the basic operations:

Example

Prerequisite

  • Association compatibility of R and S

Cartesian product ( cross product )

The Cartesian product R × S is an operation very similar to the Cartesian product of the set theory.

The result of the Cartesian product of the set of all combinations of tuples from R and S, that is, each row of a table is combined with any other row of the table. If all of the features (columns) are different, the result table comprises the sum of the characteristics of the original tables. The number of tuples ( rows) in the result table is the result of multiplying the numbers of rows of the output tables.

Definition

Any two relations and are given. The Cartesian product is defined by

Example

Projection

The projection can also be called attribute restriction. You extracts individual attributes from the original attribute set, and is thus to be understood as a kind of selection at the column level, that is, the projection displayed columns. If β is the attribute list to write πβ (R) or in the linear notation R [ β ]. β is also called projection list. Duplicates in the result relation are eliminated.

Definition

Let R be a relation over { A1, ..., Ak } and β ⊆ { A1, ..., Ak }.

The tβ: = ( β ), that is, the tuple is only the attributes from the attribute list β.

Example

Prerequisite

  • The columns specified must be contained in R.

Selection ( restriction )

The selection can be defined with a comparison expression ( predicate ), which tuples to be included in the result set. There are hidden so tuple ( " lines "). One writes or in the linear notation R [ expression]. Expression is then called selection condition.

Definition

Let R be a relation.

Expression denotes a formula. This can consist of:

  • Constant attribute selections θ constant, θ, a conventional ( appropriate ) comparison operator.
  • Attribute selections attribute attribute θ
  • A link of a formula with logical predicates ∧, ∨, ¬ ( brackets as usual).

Example

Prerequisite

  • Each element of the specified column must be comparable over the conditional operator with the comparison value.

Join

A join (in German composite ) refers to the two consecutively executed operations Cartesian product and selection. The selection condition usually is a comparison of attributes A θ B, where θ is a fitting comparison operator. It refers to the general composite therefore also called composite θ ( " theta network" ). A special case of the general association is the equi-join ( see below).

Definition

For two relations and the outcome of the general association with a formula expression as a selection condition

The derivation is:

Example

Joinverfälschung

In Joinverfälschung the table is first splitted up on a column. The two tables are then gejoint on the common column.

Let and, then:

Example

Equi-join

At Equi- Join ( also equal composite ) is formed as the first, the Cartesian product. Then the selection is made on the condition that the content of certain columns must be the same. The equi-join is a general composite with a formula of the form A = B.

Definition

For the relations R, S and associated attributes A and B (attribute of R is ) ( is an attribute of S ) is the equi-join

Example

Here:

Natural join

The natural join is composed of the equi-join and an additional suppression of the same column (projection). The join occurs on the attributes (columns) that have the same name in both relations. If there are no common attributes, the result of the natural bond is the Cartesian product. The natural compound is commutative and associative, that is, it applies as well, which plays a role in the optimization of queries. The number of attributes of the result relation is the sum of the numbers of the two output relationships minus the number of network attributes.

Definition

For two relations and the result of the natural bond is

Example

Semi Join

The Semi Join calculated the proportion of a natural joins, which is left after a reduction to the left relation.

Definition

For two relations and the result of half a natural composite is

Example

Outer Join

In contrast to the equi-join, the tuples of the left ( left outer join ) or the right ( right outer join ) table in the result relation with recorded that can not find a join partner in the outer join. The non- existing attributes of the join relation are filled with null values. The combination of Left and Right outer join is called outer join or full outer join. In this case, all tuples are added to the result relation and filled those attributes of a tuple with null values ​​, which have found no join partners in the other relation.

The outer join can be used with or without (natural outer join ) join condition.

Example

Renaming

By this operation, attributes and relations can be renamed. This operation is important to

  • To allow joins of different named relations,
  • To allow Cartesian products, where there is the same attribute names, especially with the same relation
  • To enable set operations between relations with different attributes.

The notation is R linear [ old → new].

Definition

We construct a new set of tuples t ' from the old:

Example

Division

Definition

Since the division is a derived operation, we define it using the other operations of the relational algebra. Let R, S and the relations to R as well as related to S attribute sets. .

The division is then defined by:

So put it clearly contains attributes from those which occur in any combination with the attributes of in.

Example

Given a relation R, the fathers and mothers, including their children and the age of the children. In addition, a relation S is given, the number of children and their ages includes: Maria ( 4) and Sabine (2). Dividing R by S, we obtain as a result of a relation that contains only those couples who have both a daughter Maria to age 4 and a daughter Sabine with Age 2:

The division will be used when the question "for all" includes. For our example, the question is: "Select all parents (father, mother), who have a child with the name Maria and the age of 4 and a child with the name Sabine and age 2 (the relation S ). "

Minimality and completeness

A minimum set of operations, that is, a lot of operations, which is at least necessary to make all the terms of relational algebra can comprises

  • Projection
  • Selection
  • Cross product
  • Union
  • Difference
  • Renaming

All other operations (such as joins) can be reproduced by these basic operations.

Any other set of operations is relationally complete if they have the same cardinality as the operations mentioned above.

Extensions of relational algebra

To another query languages ​​to be able to specifically SQL complete image in the relational algebra, relational algebra is not powerful enough. For example, there 's no way that SQL operators GROUP BY / HAVING, aggregate functions and null values ​​in relational algebra to translate. Here we consider some extensions ( partially similar effect ), which allow a complete mapping to the relational algebra, and hence a complete theoretical consideration of these query languages ​​.

Zero values

SQL allows the use of null values ​​, which can be queried with the special IS NULL predicate. This is particularly important in the formation of external users that generate a relation containing all the values ​​of a relationship, and all of the other values ​​, for which the composite condition is true, otherwise provide NULL values. This may not be as shown with the relational algebra.

One possibility is the definition of zero values ​​as in SQL with a three-valued logic, that is, the Boolean operators are extended by means of truth tables so that is fixed, how to proceed when an operand is NULL.

Selection conditions or composites that are applied to zero values ​​, NULL result. One difficulty with it ( ie with the SQL -like treatment of null values ​​) is that the results of queries with subqueries that arise NULL, not necessarily match the user's intention. This type of zero treatment is not orthogonal, ie the behavior at one level ( Boolean operators, three - valued logic ) is not corresponding to another ( composites, three - valued logic is 2 -valued pictured).

Another possibility is to distinguish two different types of null values ​​, each mean "any" or "undefined".

Grouping operator and aggregate functions

The grouping used functions on the same attributes in a relation. The operator γ is given a list of features and an attribute list. The functions are then applied to the attributes of the tuple for the attribute list are the same. The output is a new relation consisting of the attribute list and a new attribute that contains the results of the function list.

The functions are then count the usual aggregate functions, sum, max, avg ....

Definition

Let R be a relation and A = { A1, ..., An } attributes from R. F ( X ) be a list of functions f1 ( x1 ), ..., fn (xn). The grouping is then

Attribute set for a blank (ie γF (x ) } { (...)) an additional attribute is generated, which contains the value of the functional application of the entire relation. This is exploited to decompose the relation with the selection part in relations with the same attributes, which are then assembled with the function application again.

Next is that a grouping with an empty function list has no effect.

NF ²

An extension of relational database model is the NF ² model. The name stands for non -first- normal -form ( NFNF ), which is intended to indicate that the condition is an atomic attribute values ​​of the first normal form is broken. Consequently, sets of attributes and sets of sets are allowed, which means that an attribute of a relationship may be a relationship again. The domain (range ) of a combined attribute is the cross product of attribute domains involved.

NF ² extends the relational algebra in that besides the usual ( appropriately adapted ) operations of relational algebra operations are added two taken, the nest is a relation ( Nestung ν ) and interleave ( Entnestung μ ). The Nestung summarizes a set of attributes in a sub- relation, which is given a new attribute name. The Entnestung removes nesting. These operations serve NF ² relations in the first normal form transform and vice versa. The operations are not bijective in general.

NF ² required for the above reasons no foreign keys.

Multiset semantics

SQL returns the result of inquiries a multiset back, so a lot that can contain multiple elements. This was handled for performance reasons, to save the additional step of duplicate elimination. It can therefore, strictly speaking, only queries in the relational algebra are translated, which are indicated with DISTINCT.

For the relational algebra can then specify additional function bag- to- set, which removes duplicates from a multiset, and thus generates a lot, and then just the basic operations as a multi-set | { { specify t ...} }. Caution but you have to be exercised in the definition of derived operations.

Examples

As a relation schemas for the examples we take the classic example database consisting of Schemes customer, supplier and goods. The schemes are:

Basic operations of relational algebra are then used as:

Since the results of relational algebra are relations again ( RA orthogonal ), the operations may be re-applied to the results of operations. This allows for complex queries. For ease of notation we assume that the cross product of an implicit renaming of attributes makes so that the new attribute names are qualified with the relation name, ie from Supplier No. from the relation WARE will WARE.Lieferantennr:

240867
de