Definition: Let *T1, T2, ..., Tn* (*n* ≥= 0) be type names, not all
necessarily distinct. Associate with each type *Ti* a distinct attribute name, *Ai*; each of the *n*
__attribute name–type name__ pairs that results is an *attribute*. Associate with each attribute a value
*vi* of type *Ti*; each of the *n* __attribute–value__ combinations that results is a *component*.
The set of all *n *components thus defined, *t* say, is a *tuple value *(or just *tuple* for short) over the attributes *A1,
A2, ..., An*. The value *n* is the *degree* of *t*; a tuple of degree one is *unary*, a tuple of degree two is *binary*,
a tuple of degree three is *ternary*, ...., and more generally a tuple of degree *n * is *n-ary*. The set of all attributes is the
*heading* of *t*.

To summarize, a component is an attribute plus its value. The set of all n components is a *tuple* or *tuple value*. The set of all *n* attributes
is the *Heading*. *n* is the *degree* of the tuple.

In TUTORIAL D a *tuple type* name is written *TUPLE {H}*, where *{H} *is the heading. If we have, for example, two attributes
[STATUS: INTEGER] and [SUPPLIER_NAME: NAME], then the tuple type would be written as

TUPLE { STATUS INTEGER, SUPPLIER_NAME NAME }

__Notice__, we are putting the attribute's name first followed by its type. This is the practice throughout. A sample tuple (tuple value) might be

{ 20, NAME("General Electric Corporation") }

An attribute's type does not have to be a SQL type (such as VARCHAR). In an attribute such as [EMPLOYEE: NAME], where NAME is the type and EMPLOYEE is the attribute name, NAME would no doubt be implemented in our SQL database as two VARCHAR columns, first_name and last_name, and NAME equality would involve comparing both first_name and last_name.

The value of an attribute should be atomic: it should contain only one useful piece of information and must never require further parsing.

A tuple value is returned or instantiated from a tuple* selector* invocation:

TUPLE { STATUS INTEGER, SUPPLIER_NAME NAME} // type declaration TUPLE { STATUS 20, SUPPLIER_NAME NAME('Smith') } // tuple selector instantiates the tuple

The attribute type is omitted in the selector because it can always be inferred from the type of the expression denoting the attribute value.

The keyword TUPLE does double duty in Tutorial D. It's used both to declare tuple *type names* and in tuple selector invocations.

In Tutorial D the syntax from extracting an attribute value from a tuple is *attribute-name FROM tuple*. For example,

TUPLE { STATUS 20, SUPPLIER_NAME NAME('Smith') } t; // selector invocation STATUS FROM t; // extraction of tuple attribute

A tuple with an empty heading has type TUPLE {} (it has no components). It is called the
*0-tuple* to emphasize that it is of degree zero. It is also sometimes called an *empty tuple*.

*Definition*: Tuples *t1 *and *t2* are *equal* if and only if they have the same attributes *A1,
A2, ..., An*. That is, they both are of the same tuple type. And for all *i (i = 1, 2, ..., n)*, the value *vi*
of *Ai *in *t1 *is equal to the value in *t2*.

*Definition: * Let *{H}* be a tuple heading and let *t1, t2, ..., tm* be distinct tuples with heading *{H}*. The
combination, *r* say, of *{H}* and the set of tuples *{t1, t2, ..., tm} *is a *relation value* (or just a *relation*
for short) over the attributes *A1, A2, ..., An, *where *A1, A2, ..., An* are the attributes in *{H}*; the *heading *of
*r* is *{H}*; *r *has the same attributes (and hence the same attribute names, and types) and the same degree as that heading does.
The *body *of *r *is the set of tuples *{ t1, t2, ...,tm}.* The value *m *is the *cardinality *of *r.*

The relational model is so called because it deals with certain abstractions that we can think of as "tables" but are known, formally, as relations in mathematics.

You define a relation's type in **TUTORIAL D **as **RELATION {H}**, where H is the heading. A *relation value*
(or *relation* for short) is its set of unique tuples. An example of the definition of a relation type

RELATION { SNO SNO, SNAME NAME, STATUS INTEGER, CITY CHAR}

Every relation value is returned by some relation selector invocation; for example, the relation type above might have this value.

RELATION { TUPLE {SNO SNO('S1'), SNAME NAME('Smith'), STATUS 20, CITY 'London'}, TUPLE {SNO SNO('S2'), SNAME NAME('Jones'), STATUS 15, CITY 'Miami'}, ... }

Relations by definition never contain duplicate tuples, and tuples never contain NULLs. Relation are by definition in first normal form (1NF) because a tuple can have only one value for each of its attribute, which is the definition of 1NF. Both relations and their tuples contain no sense of ordering.

TABLE_DUM and TABLE_DEE are both relations of degree 0, meaning they have no attributes. They are of type RELATION {}. TABLE_DUM has no tuples. TABLE_DEE has one tuple, the 0-tuple or empty tuple.

A relation is shorthand for *relation value* in the same way that integer is shorthand for *integer value*. Relvar is
shorthand for *relation variable*. A relvar is a holder for a representation of a relation value. Relvars are analgous to variables in
a strongly typed language like, say, C++, and relations are likewise analogous to the values of those
variables at a given point in time. Relvars can be characterised as *base relvars* or *derived relvars* also know as views.
Here is the definition of a base relvars in Tutorial D.

VAR S BASE RELATION { SNO SNO, SNAME NAME, STATUS INTEGER, CITY CHAR } Key { SNO };

Unlike SQL, operations like INSERT, DELETE and UPDATE are, in the relational algebra, set-level operations not tuple-level opertaions: one
relation value, say *r1,* is always entirely replaced by another relation value, say, *r2.*

*Definition*: A subset of the heading of a **relvar** is a candidate key K if it satisfies two conditions:

- The uniqueness property: no possible value of the relvar ever has two tuples with the same value for K.
- The irreducibility property: no proper subset of the the candidate key has this uniqueness property.

Keys are defined on relvars not relations. Keys imply certain integrity contraints, a uniqueness constraint in particular,
and integrity contraints apply to variables not values. Key values are tuples. *which means what????*<--

Given this relation, which represents marriages

RELATION { SPOUSE_A NAME, SPOUSE_B NAME, DATE_OF_MARRIAGE DATE }if we assume no polygamy and that no couples marry each other more than once, then this relvar

VAR MARRIAGE BASE RELATION { SPOUSE_A NAME, SPOUSE_B NAME, DATE_OF_MARRIAGE DATE }

would have three candidate keys:

KEY {SPOUSE_A, SPOUSE_B} KEY {SPOUSE_A, DATE_OF_MARRIAGE} KEY {SPOUSE_B, DATE_OF_MARRIAGE}

So if this relvar contained a TUPLE of

TUPLE { SPOUSE_A NAME("Betty Smith"), SPOUSE_B NAME("John Smith") DATE_OF_MARRIAGE DATE("09-11-1955") }

the value for the key {SPOUSE_A, SPOUSE_B } would be

TUPLE { SPOUSE_A NAME("Betty Smith"), SPOUSE_B NAME("John Smith") }

Any one of the candidate keys can be chosen as the **primary
key**.
Usually, however, the simplest candidate key is chosen. See pp. 63-64.

*Definition*: Let *R1* and *R2* be relvars, not necessarily distinct, and let *K* be a key for *R1*. Let *FK* be a
subset of *R2* that, possibly after some attribute renaming, involves exactly the same attributes as *K*. Then *FK* is a
*foreign key* if and only if, at all times, every tuple in *R2* has an *FK* value that is equal to the *K* value in some
(necessarily unique) tuple in *R1* at the time in question.

Using this example of two relvars BOOKS and BOOK_CHAPTERS

VAR BOOKS BASE RELATION { BOOK_ID ISBN, BOOK_TITLE CHAR, BOOK_PRICE FLOAT} KEY { BOOK_ID };

VAR BOOK_CHAPTERS BASE RELATION { CHAPTER_ID INTEGER, CHAPTER_NUMBER INTEGER, BOOK_ID ISBN, CHAPTER_NAME CHAR } KEY { CHAPTER_ID } FOREIGN KEY { BOOK_ID} REFERENCES BOOKS;

**BOOK_ID** in BOOK_CHAPTERS is a foreign key. While the same BOOK_ID value may
occur in multiple tuples of BOOK_CHAPTERS, there is one and only one tuple in BOOKS with that same
BOOK_ID value. The foreign key need not have the same name in the referencing relvar as in the referenced relvar. In
Tutorial D this fact would be shown using the RENAME operator. For example

VAR BOOKS BASE RELATION { BOOK_ID ISBN, BOOK_TITLE CHAR, BOOK_PRICE FLOAT } KEY { BOOK_ID };

VAR BOOK_CHAPTERS BASE RELATION { CHAPTER_ID INTEGER, CHAPTER_NUMBER INTEGER, BOOK_ISBN ISBN, CHAPTER_NAME CHAR } KEY { CHAPTER_ID } FOREIGN KEY { RENAME (BOOK_ISBN AS BOOK_ID) } REFERENCES BOOKS;

The foreign key can even reference the same relvar. That is, *R1* and *R2* in the foreign key definition can be the same relvar. For example,

VAR EMP BASE RELATION { ENO ENO, ...., MNO ENO, ...} KEY { ENO } FOREIGN KEY { RENAME ( MNO AS ENO ) } REFERENCES EMP ;

Here attribue MNO denotes the employee number of the manager of the employee identified by ENO; thus the referencing relvar and the referenced relvar are the same. In order for the tuple equality comparison to be valid we rename the MNO attribute, so that it is of same type, since an attribute's name, like ENO, is part of its type.

SQL supports certain referential actions such as CASCADE, which, though very useful, even essential, are not strictly part of the relational algebra.

A base relation is a relation of some type with an initial value.

*Definition:* A *view* or *virtual relvar V* is relvar whose value at time *t* is derived by evaluting a certain relational
expression at that time *t*. The expression is specified when *V* is defined and must mention at least one relvar.

Virtual relvars can be used like base relvars because of *closure*. We can define further views on top of them.

Tutorial D | SQL |
---|---|

VAR LS VIRTUAL (S WHERE CITY = 'London'); | CREATE VIEW LS AS (SELECT S.* FROM S WHERE S,CITY = 'London'); |

Views can have candidate keys just like base relvars. They can have integrity constraints. We can update views (although SQL's support is weak for this). You can discover a MySQL VIEW's underlying SELECT statement with this command:

mysql> CREATE VIEW LS AS (SELECT S.* FROM S WHERE S,CITY = 'London'); mysql> SELECT LS, is_updatable, view_definition FROM INFORMATION_SCHEMA.VIEWS;

A snapshot is not a view because it has its own separate copy of the data.

The heading of a relvar is meant to represent a certain predicate, a generic statement about the real world.
It is the intended interpretation, the meaning, of the relvar, also called the *intention*.

A predicate is a truth-valued function that returns TRUE or FALSE. This if relvar *R* has predicate *P*, then every tuple *t* in
*R* at some given time can be regarded as a certain proposition derived by invoking *P* with the attribute values from *t* as arguments.
Every proposistion *p* obtained by substituting a tuples's values always evaluates to TRUE.

So a given relvar contains, at any given time, all and only the tuples that represent true propositions (instantiations that return TRUE) of the predicate.

A database should be thought of as a collection of facts or true propositions, in which the heading of a relvar represents a description of something
going on in the real world. This intended intrepretation, the real meaning of the relvar, is called the relvar's *Predicate*. The Predicate can
be though of as a truth-value function that when given specific values returns TRUE or FALSE. So the heading of every relation should have an associated
Predicate that describes the relation's real meaning. The Predicate is a function whose arguments are the names of the attributes in the heading of the
relation, and a relvar's contents, its tuples, then become true propositions (at a given point in time). While the tuples may change over time, the
Predicate does not.

*Predicate Definition:*

Every relvar *R* has an associated Predicate *P*. *P* is the *intended interpretation* or
*intension* for *R*. It does not change over time. When the values of a tuple *p* are substitued in the revlar's predicate
*P*, the resulting proposition is always true. A relvar, at any given time, contains *all* and *only* the tuples that represent true
propostions, true instantions of the predicate.

*Extension of a Predicate*

If *P* is the predicate of some relvar *R* whose value at some given time is *r*, then the body of *r* constitutes the
*extension* of *P* at that time. The extension varies over time, but the intention, the Predicate, does not.

Types are things in the real world we can talk about; relations are true statements about those things. Types are to relations as nouns are to sentences. If we take, as an example, the relvar for the suppliers relation (on pp. 11, 61 in Date's book)

VAR S BASE RELATION { SNO SNO, SNAME NAME, STATUS INTEGER, CITY CHAR } KEY {SNO };

the Predicate would be the following sentence (recalling that the attribute's name comes first in the declaration above, then its type) :

*SNO* is a **supplier part number** for **supplier name** *SNAME* with **status** of *STATUS*,
which means ..., and is located in *CITY* **city**

Here is another example of the predicate for an email contacts relvar.

VAR EMAIL_CONTACTS BASE RELATION { CONTACT_ID INTEGER, NAME NAME, EMAIL EMAIL, OPT_OUT BOOLEAN } KEY {CONTACTID};

The Predicate, the sentence describing the meaning of each attribute, would be

*CONTACT_ID* is the **unique identifier** of someone **named** * NAME* whose **email address**
is *EMAIL* and whose **opt status** is *OPT_OUT*.

The current value of EMAIL_CONTACTS consists of all tuples that currently satisfies this Predicate.

Database Constraints should not be easy to violate because duplicate information, dulicate tuples or subsets of tuples which duplicate part of a larger tuple, could result. See the principle of orthogonal design

The relational algebra operators are defined on relations and not strictly speaking on relvars, although their
application to relvars is no different. Relation operators are *generic*: they apply, in effect, to all possible relations.
They are also *read-only*. They read their operations and return a result. INSERT, UPDATE and DELETE (and relation assignment),
while they are relational operators, aren't technically part of the relational algebra.

- When a relational algebra operations, like UNION or JOIN, requires a correspondence between operand attributes to be established, Tutorial D requires the attributes to have the same name.
- Tutorial D sometimes needs to rename attributes to avoid naming clashes or mismatches. SQL usually doesn't
- SQL requires most queries to conform to SELECT — FROM — WHERE template. Tutorial D has no analogous requirement.

Tutorial D RENAME operator takes one relation as input and returns a relation identical to the input except that one of the attributes has a different name. For example

Tutorial D RENAME | SQL equivalent |
---|---|

S RENAME (CITY AS SCITY) | SELECT S.SNO , S.NAME, S.STATUS, S.CITY AS SCITY FROM S |

Let *bx* be a boolean expression involving zero or more attribute names such that all of the attributes mentioned
are atrributes of the same relation *r*. Then the *restriction* of *r* according to *bx*:

rWHEREbx

is the relation with a heading the sameas that of *r* and a body consisting of all typles of *r*for which *bx*
evaluates to TRUE. For example,

S WHERE CITY = 'Paris' | SELECT S.* FROM S WHERE S.CITY = 'Paris' |

will not contain any duplicate tuples.

Let relation *r* have attributex X, Y,...Z (and possibly others). Then the *projection* of *r*
on *X, Y,...., Z*:

r { X, Y, ..., Z }

is a relation with (a) a heading derived from the heading of *r* by removing all attributes not mentioned in the set *{X, Y,
..., Z}* and (b) a body consistiting of all typles *{X x, Y y, ..., Z z} * such that a tuple appears in *r*
with *X* values *x*, *Y* values *y*, ..., and *Z* value *z*. For example,

Tutorial D | MySQL |
---|---|

S { SNAME, CITY, STATUS} | SELECT DISTINCT S.SNAME, S.CITY, S.STATUS FROM S |

DISTINCT is needed in the SQL formulation to insure no duplicates occurs. The Tutorial D forumlation can also be expressed in terms of the attributes to be discarded. Thus

S { SNAME, CITY, STATUS }

is equivalent to

S { ALL BUT CITY }

PROJECT has a higher precedence than other (all?) operators, so

S JOIN P { PNO, CITY }

is equivalent to

S JOIN ( P { PNO, CITY } )

*Definition:* Let relations *r* and *s* jave attributes *X1, X2, ..., Xm, Yq, Y2, ..., Yn, Z1, Z2, ..., Zp, * respectively;
in other words, the *Y's* (*n* of them) are the common attributes, the *X's* (*m* of them) are the other attributes of *r*,
and the *Z's* (*p* of them) are the other attributes of *s*. We can assume without loss of generality that none of the *X's*
has the same name as any of the *Z's*, thanks to the availability of RENAME. Now let the *X's* taken together be denoted just *X*,
and similarly for the *Y's* and the *Z's*. Then the *natural join*(*join* for short) of *r* and *s*:

r JOIN s

is a relation with (a) a heading that is the (set-theoretic) union of the heading of *r* and *s*
and (b) a body consisting of of the set of all tuples *t* such that *t* is the (set -theoretic) union of a tuple appearing in *r*
and a tuple appearing in *s*. In other words the heading is *(X, Y, Z)* and the body consists of all tuples *(X x, Y y, Z z)* such
that a tuple appears in *r* with *X* value *x* and *Y* value *y* and a tuple appears in *s* with *Y* value *y*
and *Z* value *z*.

Less formally, JOIN first locates tuples with the same values for the shared attribues. It then concatenates, joins, the unshared attribute values of those same tuples to produce the result.

The closest SQL equivalent to the Tutorial D expression P JOIN S would be SELECT * FROM P NATURAL JOIN S, though not all SQL products support this syntax; otherwise, one would have to list all the attributes

SELECT P.PNO, P.PNAME, P.COLOR, P.WIEGHT, P.CITY, S.SNO, S.SNAME, S.STATUS FROM P, S WHERE P.CITY = S.CITY;

*Intersection* and *cartesian product* are special cases of JOIN. If *m = p = 0* (meaning there are no *X's* and not *Z's*,
and *r* and *s* are thus of the same type), then *r JOIN s* degenerates to *r INTERSECT s*. If *n = 0*, meaning there are
no *Y's* and *r* and *s* thus have no common attributes, *r JOIN s* degenerates to *r TIMES s*.

Tutorial D JOIN | SQL equivalent | SQL equivalent |
---|---|---|

S JOIN SP | SELECT * FROM S NATURAL JOIN SP; | SELECT DISTINCT P.PNO, P.NAME, P.COLOR, P.WEIGHT, P.CTY, S.SNO, S.SNAME,
S.STATUS FROM S,SP WHERE S.SNO = SP.SNO; |

NATURAL JOIN does not require the use of DISTINCT because NATURAL JOIN will only show one SNO column, and since { SNO, PNO } is the primary key of SP, we know the tuples will be distinct, and for a given SNO value, the PNO values will always differ.

SQL supports several types of join: NATURAL JOIN, JOIN, INNER JOIN, LEFT (OUTER) JOIN, and RIGHT (OUTER) JOIN. They are explained at SQL JOINS. The MySQL JOIN syntax is explained at MySQL JOIN SYNTAX. Here are some sample MySQL joins queries.

SELECT * FROM S NATURAL JOIN SP;

SNO | SNAME | STATUS | CITY | PNO | QTY |
---|---|---|---|---|---|

S1 | Smith | 20 | London | P1 | 300 |

S1 | Smith | 20 | London | P2 | 200 |

S1 | Smith | 20 | London | P3 | 400 |

S1 | Smith | 20 | London | P4 | 200 |

S1 | Smith | 20 | London | P5 | 100 |

S1 | Smith | 20 | London | P6 | 100 |

S2 | Jones | 10 | Paris | P1 | 300 |

S2 | Jones | 10 | Paris | P2 | 400 |

S3 | Blake | 30 | Paris | P2 | 200 |

S4 | Clark | 20 | London | P2 | 200 |

S4 | Clark | 20 | London | P4 | 300 |

S4 | Clark | 20 | London | P5 | 400 |

SELECT DISTINCT S.* FROM S NATURAL JOIN SP;

SNO | SNAME | STATUS | CITY |
---|---|---|---|

S1 | Smith | 20 | London |

S2 | Jones | 10 | Paris |

S3 | Blake | 30 | Paris |

S4 | Clark | 20 | London |

SELECT * FROM S LEFT JOIN SP USING(SNO);

SNO | SNAME | STATUS | CITY | PNO | QTY |
---|---|---|---|---|---|

S1 | Smith | 20 | London | P1 | 300 |

S1 | Smith | 20 | London | P2 | 200 |

S1 | Smith | 20 | London | P3 | 400 |

S1 | Smith | 20 | London | P4 | 200 |

S1 | Smith | 20 | London | P5 | 100 |

S1 | Smith | 20 | London | P6 | 100 |

S2 | Jones | 10 | Paris | P1 | 300 |

S2 | Jones | 10 | Paris | P2 | 400 |

S3 | Blake | 30 | Paris | P2 | 200 |

S4 | Clark | 20 | London | P2 | 200 |

S4 | Clark | 20 | London | P4 | 300 |

S4 | Clark | 20 | London | P5 | 400 |

S5 | Admans | 30 | Athens | NULL | NULL |

SELECT DISTINCT S.* FROM S LEFT JOIN SP USING(SNO);

SNO | SNAME | STATUS | CITY |
---|---|---|---|

S1 | Smith | 20 | London |

S2 | Jones | 10 | Paris |

S3 | Blake | 30 | Paris |

S4 | Clark | 20 | London |

S5 | Admans | 30 | Athens |

SELECT DISTINCT S.* FROM S LEFT JOIN SP USING(SNO) WHERE SP.SNO IS NOT NULL;

SNO | SNAME | STATUS | CITY |
---|---|---|---|

S1 | Smith | 20 | London |

S2 | Jones | 10 | Paris |

S3 | Blake | 30 | Paris |

S4 | Clark | 20 | London |

SELECT DISTINCT S.* FROM S LEFT JOIN SP USING(SNO);

SNO | SNAME | STATUS | CITY |
---|---|---|---|

S1 | Smith | 20 | London |

S2 | Jones | 10 | Paris |

S3 | Blake | 30 | Paris |

S4 | Clark | 20 | London |

S5 | Admans | 30 | Athens |

Many queries that require JOIN really require an extended form of the operator called SEMIJOIN. The *semijoin* of *r* and *s* is the join of *r* and *s*, projected back on the attributes of *r*. So,
for example, S SEMIJOIN SP is equivalent to (S JOIN SP) { SNO, SNAME, STATUS, CITY }. An application
of SEMIJOIN would be the query "Get suppliers who supply at least one part".

Tutorial D SEMIJOIN | SQL equivalents | |

S SEMIJOIN SP | SELECT DISTINCT S.* FROM S NATURAL JOIN SP; | SELECT DISTINCT S.* FROM S,SP WHERE S.SNO = SP.SNO; |

The output would look like this.

SNO | SANME | STATUS | CITY |
---|---|---|---|

S1 | Smith | 20 | London |

S2 | Jones | 10 | Paris |

S3 | Blake | 30 | Paris |

S4 | Clark | 20 | London |

A more user-friendly spelling of SEMIJOIN is MATCHING: S MATCHING SP. Finally, observe that if the heading of *s* is a subset of *r* (that is, *p = 0*. See the definition of JOIN), then *r* JOIN *s*
degenerates to *r* MATCHING *s*. Likewise, if *m = 0*, *r* JOIN *s* degenerates to *s* MATCHING *r*.

The concept pf JOIN can be extended to join several relvars, which would be written JOIN { r, s, ..., w} and means r JOIN s JOIN ... JOIN w.
Thus *r* JOIN *s* can also be written JOIN { r, s }.

The JOIN of no relations, written JOIN {}, is TABLE_DEE. TABLE_DEE is of type RELATION {}; i.e., it has no attributes; and it
contains one tuple, the *0-tuple* or empty tuple. The join of any relation *r* with TABLE_DEE is simply *r*. As a consequence, the
join of no relations is TABLE_DEE.

Also *r* JOIN TABLE_DEE is the same as TABLE_DEE JOIN *r* because the result is simply the cartesian product.

Interest requires the relations or relvars to be of the same type.

Tutorial D | MySQL |
---|---|

S { CITY } INTERSECT P { CITY } | SELECT DISTINCT S.CITY FROM S INTERSECT SELECT DISTINCT P.CITY FROM P |

Note, DISTINCT is not strictly needed above. INTERSECT will remove duplicates.

The types of the operands must be the same.

S { CITY } UNION P { CITY } | SELECT DISTINCT S.CITY FROM S UNION DISTINCT SELECT DISTINCT P.CITY FROM P |

Since the default for UNION is UNION DISTINCT, DISTINCT is not strictly needed. As a consequence, the DISTINCTs following the two SELECTs aren't needed (note: the default for SELECT is ALL rather than DISTINCT).

Definition: If *r* and *s* are of the same type, then *r* MINUS *s* consists of all typles that appear in *r*
but not in *s*.

Tutorial D MINUS | SQL equivalent |
---|---|

S { CITY } MINUS P { CITY } | SELECT S.CITY FROM S EXCEPT SELECT P.CITY FROM P |

EXCEPT is the SQL equivalent of the relational algebra MINUS operator; however, not all SQL implementations support EXCEPT, including MySQL. An MySQL equivalent expression using a subquery is

Tutorial D MINUS | MySQL equivalent |
---|---|

S { CITY } MINUS P { CITY } | SELECT S.CITY FROM S WHERE S.CITY NOT IN (SELECT DISTINCT P.CITY FROM P) |

We extract those cities from the supplier relvar that are not in the parts relvar. MINUS can be done more efficiently in MySQL using LEFT JOIN, as Doing Intersect and Minus in MySQL explains. Using LEFT JOIN, the MySQL analogue of MINUS would be

MySQL equivalent of MINUS using LEFT JOIN |
---|

SELECT DISTINCT S.* FROM S LEFT JOIN SP USING(SNO) WHERE SP.SNO IS NULL |

MINUS is just a special case of the more general SEMIMINUS. SEMIMINUS does not require both operands to be of the same type. The definition of
*s* SEMIMINUS *r* is "*r* MINUS (*r* SEMIJOIN *s*)" which is the same as "*r* MINUS (*r* MATCHING
*s*)". Consider the query "Get suppliers who supply no parts at all":

Tutorial D SEMIMINUS | SQL equivalent |
---|---|

S SEMIMINUS SP | SELECT S.* FROM S EXCEPT SELECT S.* FROM S, SP WHERE S.SNO = SP.SNO |

Again, MySQL does not support EXCEPT. The MySQL analogue for SEMIMINUS would be

MySQL analogues for SEMIMINUS | |
---|---|

SELECT DISTINCT S.* FROM S LEFT JOIN SP USING(SNO) WHERE SP.SNO IS NULL | SELECT DISTINCT S.* FROM S WHERE S.SNO NOT IN (SELECT DISTINCT S.SNO FROM S NATURAL JOIN SP) |

Tutorial D does provide the alternative, more user-friendly syntax of "S NOT MATCHING SP". Thinking in terms of "NOT MATCHING" makes the MySQL expressions above perhaps more understandable.

In relational algebra, the divisor is another relation S, whose heading must be a subset of the heading of R. The division is over the common attribute(s), and the set of values used as the actual divisor are the values found in S.

DIVIDE is perhaps easier to first illustrate. The figure below shows a simple example of dividing a binary relation R1 by a unary relation R2. The division is over the shared attribute I2. The divisor is the set {1,2,3}, these being the values found in the shared attribute in R2. Inspecting the tuples of R1, the value 'a' occurs in tuples such that their I2 values (that is, set of values for R1.I2) match the divisor (the set of values in the divisor). So 'a' is included in the result, but 'b' is not because there is no tuple with '3' as the I2 value and 'b' as the I1 value.

*Definition*: Let *r* and *s* be such that the heading of *s* is a subset of the heading of *r*. Then
the *division* of *r* by *s*, *r* DIVIDEDBY *s* is shorthand for the following:

r{X} MINUS ( (r{X} TIMESs) MINUSr) {X}

where X is the set-theoretic difference between the heading of *r* and that of *s*. Let's use a example from the part and
suppliers database. Let's apply this definition to this example.

SP { SNO, PNO } DIVIDEBY P { PNO }

Using the definition, this would be re-written as

SP { SNO } MINUS ( ( SP { SNO } TIMES P { PNO } ) MINUS SP { SNO, PNO } ) { SNO }

If look at these steps one-by-one, we would see the following.

SP { SNO } **TIMES** P { PNO }

SP **{ SNO, PNO }**

( SP { SNO } TIMES P { PNO } ) **MINUS** SP { SNO, PNO }

SNO | PNO |
---|---|

s1 | p1 |

s1 | p2 |

s1 | p3 |

s1 | p4 |

s1 | p5 |

s1 | p6 |

s2 | p1 |

s2 | p2 |

s2 | p3 |

s2 | p4 |

s2 | p5 |

s2 | p6 |

s3 | p1 |

s3 | p2 |

s3 | p3 |

s3 | p4 |

s3 | p5 |

s3 | p6 |

s4 | p1 |

s4 | p2 |

s4 | p3 |

s4 | p4 |

s4 | p5 |

s4 | p6 |

SNO | PNO |
---|---|

s1 | p1 |

s1 | p2 |

s1 | p3 |

s1 | p4 |

s1 | p5 |

s1 | p6 |

s2 | p1 |

s2 | p2 |

s3 | p3 |

s4 | p2 |

s4 | p4 |

s4 | p5 |

SNO | PNO |
---|---|

s1 | p1 |

s1 | p2 |

s1 | p3 |

s1 | p4 |

s1 | p5 |

s1 | p6 |

s2 | p1 |

s2 | p2 |

s2 | p3 |

s2 | p4 |

s2 | p5 |

s2 | p6 |

s3 | p1 |

s3 | p2 |

s3 | p3 |

s3 | p4 |

s3 | p5 |

s3 | p6 |

s4 | p1 |

s4 | p2 |

s4 | p3 |

s4 | p4 |

s4 | p5 |

s4 | p6 |

<==>

SNO | PNO |
---|---|

s2 | p4 |

s2 | p5 |

s2 | p6 |

s3 | p1 |

s3 | p3 |

s3 | p4 |

s3 | p5 |

s3 | p6 |

s4 | p1 |

s4 | p3 |

s4 | p6 |

First, the cartesian product pairs the divisor P { PNO } — the set of unique PNO values from P — with each unique value of SNO in the dividend SP. Next MINUS finds potential attributes values that are in the final result. We can see this by noting that the set of tuples whose SNO value is 's1' have (collectively) a set of PNO values that matches the set of PNO values in the divisor. MINUS removes these 's1'-tuples completely. Next we project this intermediate result on SNO, which gives us.

( ( SP { SNO } TIMES P { PNO } ) MINUS SP { SNO, PNO } ) **{ SNO }**

SNO |
---|

s2 |

s3 |

s4 |

These are the set of SNO values whose PNO values *do not* match those of the divisor. Finally, we substract these values from SP { SNO }, which gives us back the 's1' value in SP.

SP { SNO } **MINUS** ( ( SP { SNO } TIMES P { PNO } ) MINUS SP { SNO, PNO } ) { SNO }

SNO |
---|

s1 |

So we see that SP { SNO, PNO } DIVIDEBY P { PNO } is:

SNO |
---|

s1 |

The result is loosely "supplier numbers for suppliers who supply all parts." This can be expressed SQL as

SELECT DISTINCT SPX.SNO FROM SP AS SPX WHERE NOT EXISTS ( SELECT P.PNO FROM P WHERE NOT EXITS ( SELECT SPY.SNO FROM SP AS SPY WHERE SPY.SNO = SPX.SNO AND SPY.PNO = P.PNO ) )

We claim that this query finds suppliers who supply (ship) all parts. Notice we want suppliers who have shipped parts, i.e., those SP.SNO
values *v*, such that there does not exist even one part, one P.PNO value, that is not found in those tuples of SP whose SP.SNO value
is *v*.

On the other hand, this query is loosely "supplier numbers for suppliers who do not supply all parts."

SELECT DISTINCT SPX.SNO FROM SP AS SPX WHERE EXISTS ( SELECT P.PNO FROM P WHERE NOT EXITS ( SELECT SPY.SNO FROM SP AS SPY WHERE SPY.SNO = SPX.SNO AND SPY.PNO = P.PNO ) )

Notice here we want suppliers who have shipped parts, i.e., those SP.SNO values *v*, such that there does exist at least one part,
one P.PNO value, that is not found in those tuples of SP whose SP.SNO value is *v*.

Note, any query that uses DIVIDEBY can be reformulated using relational comparison, which has the advantage of making the query clearer. A query using relational comparison to find "supplier numbers for suppliers who supply all parts" would be

WITH (SP RENAME ( SNO AS X ) ) AS R: S WHERE ( R WHERE X = SNO ) { PNO } = P { PNO };

WITH AS is explained below. How relational comparison, using WITH AS,is more straightforward (than the SQL analogue for) DIVIDEBY is explained.

Loosely, *extend* supports computation across tuples and *summarize* supports computation down tuples.

*Definition*: The *extension* of relation *r*, written

EXTENDrADD (expASX)

is a relation with a heading equal to *r* extended with the attribute *X* and a body consisting of all tuples *t* such
that *t* is a typle of *r* extended with a value for attribute *X* that is computed by evaluation *exp* on that tuple of
*r*. *r* must not have an attribute name *X* and *exp* must not refer to *X*.

For example, there are 454 grams to a pound. So we can extend a relation with a weight ( in lbs. ) attribute.

Tutorial D | SQL |
---|---|

EXTEND P ADD { WEIGHT * 454 AS GMWT } | SELECT P.*, ( P.WEIGHT * 454 ) AS GMWT FROM P |

ADD is not addition. It means extend the heading of the relvar with an additional attribute, which follows AS. The value of the extended attribute is computed by evaluating the expression before AS. Some other examples using SQL.

SELECT P.*, ( p.WEIGHT * 454 ) AS GMWT

FROM P

WHERE GMWT > 7000.0

Such a query can also be written closer in style to Tutorial D.

SELECT TEMP.PNO, TEMPGMWT

FROM ( SELECT P.PNO, ( P.WEIGHT * 454 ) AS GMWT FROM P ) AS TEMP WHERE TEMP.GMWT > 7000.0

*Definition*: Let *r* and *s* be realtions such that *s* is of the same type as some projection of *r*, and
let the attributes of *s* be *A1, A2, A3, ...*. Then the *summarization*

SUMMARIZErPER {s} ADD {summaryASX}

is a relation whose heading is equal to the heading of *s* extended with the attribute *X*, and body consisting of all tuples
*t* such that *t* is a tuple of *s* extended with a value for attribute *X*, where *X* is computed by evaluating
*summary* over all tuples of *r* that have the same value for attributes *A1, A2, ..., An* as tuples *t* does. *X* cannot
be an attribute of *s* and *summary* must not refer to *X*. The cardinality of the resulting relation is equal to that of *s*,
and the degree is equal to that of *s* plus one.

As an example take

SUMMARIZE SP PER ( S { SNO } ) ADD ( COUNT() AS P_COUNT)

whose result would be

SNO | P_COUNT |
---|---|

S1 | 6 |

S2 | 2 |

S3 | 1 |

S4 | 3 |

S5 | 0 |

While S { SNO } is not a projection of SP, it is of the same type as the projection SP { SNO }. The result will contain the same tuples as S { SNO }, the set of all suppliers, plus the attribute P_COUNT, where P_COUNT is the total number of suppliers in SP for that tuple's SNO value.

SUMMARIZE thus goes down 'columns' doing *summary* per tuple value in *s*. Notice that this analogous SQL statement

SELECT SP.PNO, COUNT(*) AS P_COUNT FROM SP GROUP BY SP.SNO

will only return tuples for suppliers S1, S2, S3 and S4, so it is not equivalent to the SUMMARIZE expression. This SQL expression is equivalent

SELECT SP.SNO, TEMP.PRODUCT_COUNT FROM S, LATERAL ( SELECT COUNT(*) AS PRODUCT_COUNT FROM SP WHERE SP.SNO = S.SNO ) AS TEMP

does not contain a row ( tuple ) for S5, since S5 is not in SP.

If *s* is a projection of *r* (not just "of the same type as" some projection
of *r*) then the expression can be simplied slightly; instead of

SUMMARIZE SPPER ( SP { SNO } )ADD ( MAX { QTY } AS MAXQ, MIN ( QTY ) AS MINQ )

you can write

SUMMARIZE SPBY ( SNO )ADD ( MAX { QTY } AS MAXQ, MIN ( QTY ) AS MINQ )

Various types of summaries are supported in Tutorial D: COUNT, SUM, AVG, MAX, MIN, COUNTD, SUMD, AVGD ( where "D" stands for "eliminate redundant duplicate values before summarizing" ).

SUMMARIZE SP PER ( SP { SNO } ) ADD ( MAX ( QTY ) AS MAXQ, MIN ( QTY ) AS MINQ )

In this example SUMMARIZE has no PER specification.

SUMMARIZE ( S WHERE CITY = 'London' ) ADD ( COUNT ( ) AS N )

Since this summarize has no PER specification, the summarizing is done per TABLE_DEE, i.e., it is shorthand for

SUMMARIZE ( S WHERE CITY = 'London' ) PER ( TABLE_DEE ) ADD ( COUNT ( ) AS N )

Recall TABLE_DEE is a relation with no attributes and one tupe (the 0-tuple). TABLE_DEE fits the definitino of SUMMARIZE because it is a projection of the relation in question, S, on the empty set of attributes. The output of this SUMMARIZE therefore has on attribute and one type.

Finally, there is a difference between

VAR N INTEGER; N := COUNT ( S WHERE CITY = 'London' );

SUMMARIZE returns a relation, but the aggregate operator above returns a scalar. It's true that it might be thought of as "returning" one scalar value for each tuple in the PER relation, but that scalr value is then appended to that tuple to produce a tuple in the overall SUMMARIZE result.

SQL does have something analogous to the BY form of SUMMARIZE, but not the more general PER form. The SQL analog of

SUMMARIZE SP BY { SNO } ADD ( SUM ( QTY ) AS TQ )

is

SELECT SP.SNO, SUM ( SP.QTU ) AS TQ

Finally, here is an example of EXTEND expression that is logically equivalent to SUMMARIZE. The summarization expression of

SUMMARIZE SP PER ( S { SNO } ) ADD ( COUNT ( ) AS NP )

is equivalent to this EXTEND expression, which uses the WITH AS operator introduced next.

WITH ( SP RENAME ( SNO AS X ) AS R : EXTEND ( S { SNO } ) ADD ( COUNT ( R WHERE X = SNO ) AS NP )

On p. 104 the Tutorial D **WITH AS** operator's use is illustrated to "get suppliers who supply all parts."

WITH (SP RENAME ( SNO AS X ) ) AS R: S WHERE ( R WHERE X = SNO ) { PNO } = P { PNO };

This is a restriction, S WHERE *exp1* = *exp2*, where *exp1* is, ( R WHERE X = SNO ) { PNO } and *exp2* is, P { PNO }. We examine each
supplier, say, Sx in relvar S, comparing Sx's associated SNO value to the SNO values in SP. That is the meaning of

( R WHERE X = SNO )

For example, for the tuple in S with SNO value of 'S1', after doing

( R WHERE X = SNO )

the resulting set of PNO values from SP corresponding to the 'S1' value from S would be: { P1, P2, P3, P4, P5, P6 }. Next we project this result on PNO, which yields the same set of values, the set of unique part numbers supplied by 'S1'. Next we compare this result to P { PNO }, the set of all unique part numbers.

( R WHERE X = SNO ) { PNO } = P { PNO }

Continuing the example using 'S1', the expression ( R WHERE X = SNO ) { PNO } = P { PNO } would evaluate to TRUE. So the tuple of S containing the SNO value of 'S1' would be in the final result. Thus, the resulting relation will be those tuples of S that represent suppliers who supply (have shipped parts to customers) all parts. In the sample database there is only one such suppliers. Thus, the result is

RELATION { TUPLE { SNO SNO('S1'), SNAME NAME('Smith'), STATUS 20, CITY 'London' } }

So the forgoing query found "suppliers who supply all parts". And it is simplier than the earlier DIVIDEBY query, which was actually a
relational formulation of the query, "Get supplier numbers for supplier who *supply at least one part and in fact* supply all parts."

Here is another example a query using WITH AS that answers: "Get paris of supplier numbers, *Sx* and *Sy* say, such that
*Sx* and *Sy* supply exactly the same set of parts."

WITH ( S RENAME ( SNO AS SX ) { SX } AS RX, ( S RENAME ( SNO AS SY ) { SY } AS RY : ( RX JOIN RY ) WHERE ( SP WHERE SNO = SX ) { PNO } = ( SP WHERE SNO = SY ) { PNO }

Again, this is a restriction of the result of a JOIN: the outermost operation is ( RX JOIN RY ) WHERE
*exp1* = *expr2*.

Appending "SX < SY" to the WHERE clause here would produce a slightly tidier result: it would eliminate pairs of the form (Sx, Sx) and ensure that the pairs (Sx, Sy) and (Sy, Sx) don't appear (since JOIN is effectively the cartesian product).

Distributive, associate and commutative laws apply to various operations as mentioned on pp. 100-101. This allows an optimizer to rewrite queries in such a way the performance is optimal.

Strictly speaking relational algreba has no updateing (assignment) or comparison operators and no notion relvars. Operations like UPDATE, INSERT and DELETE are shorthand for algebraic assigment operations. For example, given

VAR PQ BASE RELATION { PNO PNO, QTY QTY } KEY { PNO };

Inserting into PQ

INSERT PQ ( SUMMARIZE SP PER ( P { PNO } ) ADD ( SUM ( QTY ) AS QTY ) );

is equivalent to the longhand assigment of:

PQ := PQ UNION ( SUMMARIZE SP PER ( P { PNO } ) ADD ( SUM ( QTY ) AS QTY ) );

A DELETE of

DELETE S WHERE CITY = 'Athens';

is equivalent to

S := S WHERE NOT CITY = 'Athens';

and an UPDATE of

UPDATE P WHERE CITY = 'London' ( WEIGHT := 2 * WEIGHT, CITY := 'Oslo' );

has the longhand equivalent:

P := WITH ( P WHERE CITY = 'London' ) AS R1, ( EXTEND R1 ADD ( 2 * WEIGHT AS NEW_WEIGHT, 'Oslo' AS NEW_CITY ) ) AS R2, R2 { ALL BUT WEIGHT, CITY } AS R3, R3 RENAME { NEW_WEIGHT AS WEIGHT, NEW_CITY AS CITY } AS R4, P MINUS R1 AS R5 : R5 UNION R4;

First, R1 is the set of tuples to be updated, extended with the new weight and new city as R2. Then we throw away from R2 everything but the weight and the city. This is R3. Then we rename the new weight and city with the proper attribute names of WEIGHT and CITY. This is R4. Then we identify those tuples not to be updated (but to be retained). This is R5. Finally, the result is the union of R5 and R4.

A SQL expression can be thought of as being implemented (at least conceptually) in three steps.

- The FROM clause ...
- Next, the WHERE clause ...

To be done later...(See Simple SQL).

These are examples of Tutorial D and SQL analogues of various queries

Get supplier numbers for suppliers who supply part P1.

(SP WHERE SNO = PNO('P1')) { SNO } | SELECT DISTINCT SP.SNO FROM SP WHERE SP.PNO = PNO('P1'); |

Get suppliers with status in the range 15 to 25 inclusive.

S WHERE STATUS ≥ 15 AND STATUS ≤ 25 | SELECT DISTINCT S.STATUS FROM S WHERE S.STATUS ≥ 15 AND STATUS ≤ 25 |

Get part numbers for parts supplied by a supplier in London.

(SP MATCHING ( S WHERE CITY = 'London' )) { PNO } | SELECT DISTINCT SP.PNO FROM SP, S WHERE S.CITY = CITY('London') AND SP.PNO = S.PNO; |

Comments: SP MATCHING ( S WHERE CITY = 'London' ) is SP JOIN S, with the result project back onto the attributes of SP. Since we only want PNO, JOIN would have worked just as well.

Get part numbers for parts not supplied by any supplier in London.

(SP NOT MATCHING ( S WHERE CITY = 'London' )) { PNO } | SELECT P.PNO FROM P EXCEPT SELECT SP.PNO FROM SP, S WHERE SP.SNO = S.SNO AND S.CITY = 'London'; |

Comments: NOT MATCHING is equivalent to SP SEMIMINUS (SP MATCHING (S WHERE CITY = 'London')), which is SP MINUS (SP MATCHING S). This query could also be expresse, if Tutorial D supports "not equal", as SP MATCHING (S WHERE CITY NOT = CITY('London').

Get city names for suppliers in which at least two two suppliers are located.

(SUMMARIZE S BY CITY ADD (COUNT() AS CNUM)) WHERE CNUM > 1 | SELECT TEMP.CITY FROM (SELECT S.CITY, COUNT(*) AS CNUM FROM S GROUP BY S.CITY) AS TEMP WHERE TEMP.CNUM > 1 |

Comment: For the Tutorial D, we do a restrict on the result of summarize. For the SQL analogue, we also do COUNT(*) per city. GROUP BY specifies how the count should be done; withtout it, we would simply get a total count of cities in each row of the result.

Get all pairs of part numbers such that some supplier supplies both of the indicated parts.

WITH SP { SNO, PNO } AS Z: ( (Z RENAME (PNO AS X)) JOIN (Z RENAME (PNO AS Y)) ) { X, Y } | SELECT XX.PNO AS X, YY.PNO AS Y FROM SP AS XX, SP AS YY WHERE XX.SNO = YY.SNO; |

Comments: From looking at relation SP it is clear that supplier S1 supplies every part listed in P; consequently, for any pairing of part numbers (Px, Py) there will always exist a supplier, namely S1, who supplies both part numbers. Now how does the query ensure this? The SQL query select pairs of part numbers, PNO, such that their suppliers are the same, XX.SNO = YY.SNO, i.e., these parts are both supplied by the same supplier. Note, DISTINCT is not needed because distinct values X and Y are being returned.

In the Tutorial D query, we, in essence, join two copies of SP, SP1 JOIN SP2, where the PNO attribute has been renamed in each copy, so that the join occurs on SNO, the only remaining common attribute. The other two "PNO" columns are both included, as X and Y, giving pairs of part numbers (Px, Py) such that they share a common supplier value, a common SNO value. So for, say the SNO value of 'S2', we would have the pairs (P1, P1), (P1, P2), (P2, P1) and (P2, P2). For 'S4' we would have the pairs: (P2, P2), (P2, P4), (P4, P2) and (P4, P4). For SNO value of 'S1', we would have all possible combinations of P1 through P6. Finally, we take the join result and project it onto { X, Y }. This eliminates duplicate tuples, duplicate pairs.

Get the total number of parts supplied by supplier S1.

WITH (SP WHERE SNO = SNO('S1')) AS R: SUMMARIZE SP PER R { SNO } ADD (COUNT() AS TOTAL) | SELECT SP.SNO, TEMP.TOTAL FROM SP, LATERAL (SELECT COUNT(*) AS TOTAL FROM SP WHERE SP.SNO = SNO('S1')) AS TEMP |

or with the count by itselfi

EXTEND TABLE_DEE ADD (COUNT(SP WHERE SNO = 'S1') AS TOTAL) | SELECT COUNT(*) AS TOTAL FROM SP WHERE SNO = SNO('S1'); |

Comments: Tutorial D also allows for the later expression to be written

COUNT (SP WHERE SNO = SNO('S1'))

Get supplier numbers for suppliers with a STATUS lower than that of supplier S1.

(S WHERE STATUS < STATUS FROM ( TUPLE FROM (S WHERE SNO = SNO('S1') )) { SNO } | SELECT S.SNO FROM S WHERE STATUS <
(SELECT S.STATUS FROM S WHERE S.SNO = SNO('S1') ) )) |

Comments: The Tutorial D "TUPLE FROM" operator extracts a tuple from the restriction. Then the FROM operator extracts the STATUS attribute from the tuple. So we restrict S to the tuples where STATUS is always less than the STATUS of any of the tuples whose SNO value is S1.Integrity Constraints

An integrity constraint, or just constraint, is just a formal name for *business rules*. Contraints guarantee consistentency. They cannot quarantee
correctness, which involves insuring that every relvar's predicate, it's fundamental meaning, has never been violated.

Type constraints are simply the set of legal values for the type. They are checked when the type's selector is invoked. In Tutorial D a type contraint looks like this

TYPE POINT POSSREP CARTESIAN { X NUMERIC, Y NUMERIC CONSTRAINT SQRT (X ** 2, Y ** 2) ≤ 100.0 };

POSSREP here stands for "possible representation". If omitted, the POSSREP defaults to the same name as the type. The constraint is that points must lie within circle whose radius is 100. SQL does not support type constraints.

The THE_operators (there is one for each component) return the components of the type. Given this POINT

POINT P(NUMERIC(5), NUMERIC(5));

THE_X(P) returns the X component's value, and THE_Y(P) returns the Y component's value.

Database constraints are constraints on the values that can appear in a given database. They should be checked at the end of any statement that assigns a value to some relvar in the database: in SQL at any INSERT or UPDATE. Attributes have a built-in type constaint because the attribute is always of a certain type. Together with type constraints they form the business rules of an application. A relvar constraint that involves just one relvar is a single-relvar constraint. In SQL database constaints are expressed by means of CREATE ASSERTION statements. This single-relvar constaints checks that "status values must be in the range 1 to 100 inclusive":

Tutorial D | SQL |
---|---|

CONSTRAINT C1 IS_EMPTY (S WHERE STATUS < 1 OR STATUS > 100) | CREATE ASSERTION C1 CHECK (NOT EXISTS (SELECT S.* FROM S WHERE S.STATUS < 1 OR S.STATUS > 100)); |

This constraint involves just a single attribute of a single relvar. The constraint "supplies in London must have status 20" would be written

Tutorial D | SQL |
---|---|

CONSTRAINT C1 IS_EMPTY (S WHERE CITY = 'London' AND STATUS ≠ 20); | CREATE ASSERTION C1 CHECK (NOT EXISTS (SELECT S.* FROM S WHERE S.CITY = 'London' AND S.STATUS <> 20)); |

Although it involves two attributes, this is still a single-relvar constraint. Here is constraint that involves two revlars, a multi-relvar constraint: "no supplier with status less than 20 can supply part P6."

Tutorial D | SQL |
---|---|

CONSTRAINT C1 IS_EMPTY ((S JOIN SP) WHERE STATUS < 20 AND PNO = PNO('P6')); | CREATE ASSERTION C1 CHECK (NOT EXISTS (SELECT S.* FROM S, SP WHERE S.SNO = SP.SNO AND S.STATUS < 20 AND SP.PNO = PNO('P6'))); |

All the constraints (AND'ed together) involving a relvar R is the "relvar constraint" for R. The databse constraint for a given database DB is the AND of all of the relvar constraints for the relvars in DB.

All red parts must weight less than 50 lbs.

Tutorial D | SQL |
---|---|

CONSTRAINT C1 IS_EMPTY (P WHERE COLOR = COLOR('Red') AND WEIGHT ≥ WEIGHT(50.0)); | CREATE ASSERTION C1 CHECK (NOT EXISTS
(SELECT P.* FROM P WHERE P.COLOR = COLOR('Red') AND P.WEIGHT ≤ 50)); |

Every London supplier must supply part P2.

Tutorial D | SQL |
---|---|

CONSTRAINT C1 IS_EMPTY (WITH (SP RENAME (SNO AS X)) AS R: S WHERE CITY = 'London' AND TUPLE {PNO PNO('P2')} ∉ (R WHERE X = SNO) {PNO} ); | CREATE ASSERTION C1 CHECK (NOT EXISTS (SELECT * FROM S WHERE S.CITY = 'London' AND NOT EXISTS (SELECT * FROM SP WHERE SP.SNO = S.SNO AND SP.PNO = PNO('P2')) ); |

For further examples, see the answers to 6-16.

The intent of DB design theory is to reduce redundancy so that the database never is in an inconsistent state. While database design is ultimately a subjective endeavor, there are certain formal principles to keep in mind.

A functional dependency is a special type of single-relvar integrity constraint important in database normalization. If an attribute's value depends on the value of another attribute, it is functionaly dependent on the other attribute. The formal definition is:

*Definition:* Let *A* and *B* be subsets of the heading of relvar *R*, then *R* satisfies the functional dependency
(*FD*) *A* → *B* (read "*B* is functionally dependent on *A*" or "*A* functionaly determines *B*") if (and only
if), in every relation that is a legal value for *R*, whenever two tulpes have the same value for *A*, they also have the same value for *B*.

*A* and(or) *B* can be sets of attributes. As an example of a functional dependency integrity constraint suppose we require that if two
suppliers are in the same city, then they must have the same status, or { CITY } →
{ STATUS }.

SNO | SNAME | STATUS | CITY |
---|---|---|---|

S1 | Smith | 20 | London |

S2 | Jones | 30 | Paris |

S3 | Blake | 30 | Paris |

S4 | Clark | 20 | London |

S3 | Blake | 30 | Paris |

S4 | Clark | 20 | London |

S5 | Admans | 30 | Athens |

In Tutorial D, this FD constraint can be expressed as:

CONSTRAINT C1 COUNT(S { CITY }) = COUNT(S { CITY, STATUS });

That is, whenever two cities are identical, the pair <city, status> will always identical.

A functional dependency (FD) clearly continues to remains valid if you add attributes to *A* (to the left side of the functional dependency) or
substract them from *B* (the right side of the functional dependency). Obviously, every relvar candidate key represents a functional dependency
constraint (from the key to the set of all the attributes of *R*, as well as from the key to every subset of attributes). As just noted, this
functional dependency remains valid if we add elements to the key. Adding attributes to a key creates what is called a *superkey*. A *superkey*
has the uniqueness property (as only one distinct tuple contains the superkey) that all keys must have, but it does not have the irreducibility
property that a key must also have.

*Definition:* Relvar *R* is in BCNF if and only if, for every non-trivial functional dependency *A* → *B* satisfied by *R*,
*A* is a superkey for *R*.

Trivial functional dependencies are the obvious sorts of FDs, in which the attributes on the left are a superset of the attributes on the right, like these:

{ CITY } → { CITY }

{ SNO } → { SNO }

{ CITY, STATUS } → { CITY}

The functional dependencies of a relvar in Boyce/Codd Normal Form (besides the trivial functional dependencies) are always *from* superkeys:
a key or a set of attributes containing a key. To decompose a relvar *R* into BCNF, we decompose it into smaller relvars with fewer
attributes. To ensure that no information is lost (which ultimatly means the predicate remains valid), we must restrict the decomposition
to projections of *R* that when joined return the original relvar. So we use project as the decomposition operator for achieving Boyce-Codd NormalForm,
and join as the recomposition operator.

We can decomposed RS, with its FD { CITY } → { STATUS }, into the following two projections:

SNO | SNAME | STATUS | CITY |
---|---|---|---|

S1 | Smith | 20 | London |

S2 | Jones | 30 | Paris |

S3 | Blake | 30 | Paris |

S4 | Clark | 20 | London |

S3 | Blake | 30 | Paris |

S4 | Clark | 20 | London |

S5 | Admans | 30 | Athens |

=>

SNO | SNAME | CITY |
---|---|---|

S1 | Smith | London |

S2 | Jones | Paris |

S3 | Blake | Paris |

S4 | Clark | London |

S3 | Blake | Paris |

S4 | Clark | London |

S5 | Admans | Athens |

CITY | STATUS |
---|---|

London | 20 |

Paris | 30 |

Athens | 30 |

SNC still has SNO as a key. Its only functional dependencies are from its key. The FD { CITY } → { STATUS } makes CITY the candidate key for CS. So both SNC and CS are in BCNF. We have eliminated the redundant STATUS values from RS, and we have not lost any information: RS = SNC JOIN CS.

As another example, suppose we started with a suppliers relvar that included quanity:

SNO | STATUS | PNO | QTY |
---|---|---|---|

S1 | 20 | P1 | 300 |

S1 | 20 | P2 | 200 |

S1 | 20 | P3 | 400 |

S1 | 20 | P4 | 200 |

S1 | 20 | P5 | 100 |

S1 | 20 | P6 | 100 |

S2 | 10 | P1 | 300 |

S2 | 10 | P2 | 400 |

S3 | 30 | P2 | 200 |

S4 | 20 | P2 | 200 |

S4 | 20 | P4 | 300 |

S4 | 20 | P5 | 400 |

Its key is still { SNO, PNO }. But this new suppliers relvar satisfies a FD of { SNO } → { STATUS }, and therefore it has duplicate STATUS values. To eliminate these redundant status values we decompose it into these two projections:

SNO | STATUS | PNO | QTY |
---|---|---|---|

S1 | 20 | P1 | 300 |

S1 | 20 | P2 | 200 |

S1 | 20 | P3 | 400 |

S1 | 20 | P4 | 200 |

S1 | 20 | P5 | 100 |

S1 | 20 | P6 | 100 |

S2 | 10 | P1 | 300 |

S2 | 10 | P2 | 400 |

S3 | 30 | P2 | 200 |

S4 | 20 | P2 | 200 |

S4 | 20 | P4 | 300 |

S4 | 20 | P5 | 400 |

=>

SNO | PNO | QTY |
---|---|---|

S1 | P1 | 300 |

S1 | P2 | 200 |

S1 | P3 | 400 |

S1 | P4 | 200 |

S1 | P5 | 100 |

S1 | P6 | 100 |

S2 | P1 | 100 |

S2 | P2 | 400 |

S3 | P2 | 200 |

S4 | P2 | 200 |

S4 | P4 | 300 |

S4 | P5 | 400 |

SNO | STATUS |
---|---|

S1 | 20 |

S2 | 10 |

S3 | 30 |

S4 | 20 |

STP was not in BCNF because the FD { SNO } → { STATUS } was not from a superkey. In relvar SS the functional dependency is now from a superkey (in fact, a key), so SS is in BCNF, as is SP because { SNO, PNO } is the key. The decomposition is lossless: STP = JOIN (SP, SS).

Decomposition is, as shown above, the process of taking projections of the original relvar. We must ensure, though, that information is not lost: the join of the resulting relvars should be the original relvar. For example, consider this decomposition of RS which loses information:

SNO | SNAME | STATUS | CITY |
---|---|---|---|

S1 | Smith | 20 | London |

S2 | Jones | 30 | Paris |

S3 | Blake | 30 | Paris |

S4 | Clark | 20 | London |

S3 | Blake | 30 | Paris |

S4 | Clark | 20 | London |

S5 | Admans | 30 | Athens |

=>

SNO | SNAME | STATUS |
---|---|---|

S1 | Smith | 20 |

S2 | Jones | 30 |

S3 | Blake | 30 |

S4 | Clark | 20 |

S5 | Admans | 30 |

CITY | STATUS |
---|---|

London | 20 |

Paris | 30 |

Athens | 30 |

We have lost information: RS ≠ JOIN(SNS, CS). We no longer know the city of the supplier. Is S2 located in Paris or
Athens? Just like the first example we decomposed the orginal relvar using project. What condition(s) *must* be met in order for our projections to
always be equal to their join, ensuring our decompositions lossless.

The Heath Theorem provides a sufficient condition for lossless decomposition. It tells us, if we meet the theorem's conditions, the decomposition will be lossless. The theorem states:

*Heath Theorem*: If we take subsets of the heading of *R*, say, *A*, *B*, and *C* such that their union is equal to the heading, and if
*R* satisfies the functional dependency constraint *A* → *B*, then *R* is equal to the join of its projections on
*AB* and *AC*, where *AB* is the union of *A* and *B* and *AC* is the union of *A* and *C*.

If you think about it, this theorem is just a formal statement of something obvious: if you decompose a relvar into two projections such that both
of the projections contain the "left side" of the functional dependency (the *A* in *A* → *B*), and one of the projections also contains
the "right side" (the *B* in *A* → *B*) and the other also contains the remaining attributes, those on neither side of the FD, then
join always gives you back the original relvar.

The Heath theorem gives us a sufficient condition to ensure the decomposition is lossles. If the conditions of the Heath Theorem are met, we know that our decomposition is lossless. It does not give a necessary condition: it does not tell us however if other decompositions will be lossey or lossless. What the necessary condition is for lossless decomposition is discussed in the answers section of chapter 7.

The RS and STP decomposition examples above are both instances where Heath's theorem tells us the decomposition will be lossless; we will not lose
information. In the case of relvar STP, for example, STP satisfies the FD { SNO } → { STATUS }. { SNO } corresponds to *A* in the
theorem, { STATUS } corresponds to B, and { PNO, QTY } to C. The union of A, B and C is clearly the heading of STP. { SNO, STATUS } corresponds to AB,
the union of A and B. { SNO, PNO, QTY } corresponds to AC, the union of A and B. Finally, we applying the theorem, we see that

JOIN(STP { SNO, STATUS }, STP { SNO, PNO, QTY }) = STP;

Likewise, in the case of relvar RS, we have the FD { CITY } → { STATUS }. { CITY } is A, { STATUS } is B, { STATUS, CITY } is AB, and { SNO, SNAME, CITY } is AC. Again, we see that

JOIN(RS { SNO, SNAME, CITY }, RS { STATUS, CITY }) = RS;

Here is another decomposition example. We have a relvar in first normal form: all values are atomic. But we have a lot of redundancy.

questionID | answertype | answer | answercode | id |
---|---|---|---|---|

experience1 | radio | Very experienced | very | 1 |

experience1 | radio | Somewhat experienced | somewhat | 1 |

experience1 | radio | Little experience | little | 1 |

experience1 | radio | No experience | none | 1 |

experience2 | radio | Very experienced | very | 2 |

experience2 | radio | Somewhat experienced | somewhat | 2 |

experience2 | radio | Little experience | little | 2 |

experience2 | radio | No experience | none | 2 |

experience3 | radio | Very experienced | very | 3 |

experience3 | radio | Somewhat experienced | somewhat | 3 |

experience3 | radio | Little experience | little | 3 |

experience3 | radio | No experience | none | 3 |

easeofuse1 | radio | Very simple to use | vsimple | 6 |

easeofuse1 | radio | Somewhat simple to use | simple | 6 |

easeofuse1 | radio | Somewhat difficult to use | diff | 6 |

usefulness1 | radio | Very useful | very | 7 |

usefulness1 | radio | Somewhat useful | somewhat | 7 |

usefulness1 | radio | Not that useful | little | 7 |

usefulness1 | radio | Not useful at all | none | 7 |

relevance1 | radio | Very relevant | very | 8 |

relevance1 | radio | Somewhat relevant | somewhat | 8 |

relevance1 | radio | A little relevant | little | 8 |

relevance1 | radio | Not relevant at all | none | 8 |

understand1 | radio | Yes | yes | 9 |

understand1 | radio | No | no | 9 |

recommend1 | radio | Yes | yes | 10 |

recommend1 | radio | No | no | 10 |

compare1 | radio | More useful | more | 11 |

compare1 | radio | Equally useful | equal | 11 |

compare1 | radio | Less useful | less | 11 |

review1 | textarea | NULL | 12 | |

useremotelabs | radio | Yes | yes | 4 |

useremotelabs | radio | No | no | 4 |

whynotremotelabs | textarea | NULL | 5 |

We first notice that this relvar has the following functional dependencies:

{ id } → { questionID }.

{ questionID } → { answertype }.

{ id } → { answertype }.

The last dependency is due to the transitivity of these first two dependencies. We can normalize survey_answers by splitting it into two relations: survey_questions and survey_answers.

questionID | answertype | id |
---|---|---|

experience1 | radio | 1 |

experience2 | radio | 2 |

experience3 | radio | 3 |

easeofuse1 | radio | 6 |

usefulness1 | radio | 7 |

relevance1 | radio | 8 |

understand1 | radio | 9 |

recommend1 | radio | 10 |

compare1 | radio | 11 |

review1 | textarea | 12 |

useremotelabs | radio | 4 |

whynotremotelabs | textarea | 5 |

answer | answercode | id |
---|---|---|

Very experienced | very | 1 |

Somewhat experienced | somewhat | 1 |

Little experience | little | 1 |

No experience | none | 1 |

Very experienced | very | 2 |

Somewhat experienced | somewhat | 2 |

Little experience | little | 2 |

No experience | none | 2 |

Very experienced | very | 3 |

Somewhat experienced | somewhat | 3 |

Little experience | little | 3 |

No experience | none | 3 |

Very simple to use | vsimple | 6 |

Somewhat simple to use | simple | 6 |

Somewhat difficult to use | diff | 6 |

Very useful | very | 7 |

Somewhat useful | somewhat | 7 |

Not that useful | little | 7 |

Not useful at all | none | 7 |

Very relevant | very | 8 |

Somewhat relevant | somewhat | 8 |

A little relevant | little | 8 |

Not relevant at all | none | 8 |

Yes | yes | 9 |

No | no | 9 |

Yes | yes | 10 |

No | no | 10 |

More useful | more | 11 |

Equally useful | equal | 11 |

Less useful | less | 11 |

NULL | NULL | 12 |

Yes | yes | 4 |

No | no | 4 |

NULL | NULL | 5 |

The functional dependencies for the new survery_questions revlar are still the same:

{ id } → { questionID }.

{ questionID } → { answertype }.

{ id } → { answertype }.

And since the candidate keys for the new survey_questions relvar are { questionID } and {id}, this means survery_questions is in BCNF: all its
functional dependencies are *from* attributes that contain candidate keys. Functional dependencies can be from sets of attributes (more than one), but here
the FDs are from sets containing only one attribute.

What about the relvar survey_answers? First, we notice there is an entity integrity problem with id values of 12 and 5, but we can just delete those tuples because id is not a candidate key here. It is a foreign key referring to the candidate key for survey_questions. As long as there is an existing id in survey_questions for every id in survey_answers referential integrety will not be violated. So there is no danger in deleting these two tuples.

The new survey_answers appears to have the function dependency { answer } → { answercode}. Boyce Codd Normal Form requires all functional
dependencies to be *from* supersets, but { answer } contains neither one of the two candidate keys for survey_answers, {answer, id} and
{answercode, id}, meaning our new survery_answers is still not in BCNF. We need to decompose this survey_answers relvar further. Consider this
decomposition of survey_answers into survey_answer and answer_choices:

answercode | id |
---|---|

very | 1 |

somewhat | 1 |

little | 1 |

none | 1 |

very | 2 |

somewhat | 2 |

little | 2 |

none | 2 |

very | 3 |

somewhat | 3 |

little | 3 |

none | 3 |

vsimple | 6 |

simple | 6 |

diff | 6 |

very | 7 |

somewhat | 7 |

little | 7 |

none | 7 |

very | 8 |

somewhat | 8 |

little | 8 |

none | 8 |

yes | 9 |

no | 9 |

yes | 10 |

no | 10 |

more | 11 |

equal | 11 |

less | 11 |

yes | 4 |

no | 4 |

answer | answercode |
---|---|

Very experienced | very |

Somewhat experienced | somewhat |

Little experience | little |

No experience | none |

Very simple to use | vsimple |

Somewhat simple to use | simple |

Somewhat difficult to use | diff |

Somewhat useful | somewhat |

Not that useful | little |

Not useful at all | none |

Very relevant | very |

Somewhat relevant | somewhat |

A little relevant | little |

Not relevant at all | none |

Yes | yes |

No | No |

More useful | more |

Equally useful | equal |

Less useful | less |

Both relvars appear to be in BCNF. The only candidate key for survery_answers is its entire heading { answercode, id }. But since survery_answers
only has trivial functional dependencies, it is in BCNF. The only candidate key for answer_choices is { answer }. Since answer_choices has the FD
{ answer } → { answercode}, which is a dependency *from* a key, it also is in BCNF. But we now have a different soft of problem. If we tried
to join these two relations together on the common foreign key answercode, we would have no idea whether id 1 corresponds to answer "Very experienced"
or answer "Very relevant". We have lost information, making this a lossy decomposition.

So we try yet again.

answer | id |
---|---|

Very experienced | 1 |

Somewhat experienced | 1 |

Little experience | 1 |

No experience | 1 |

Very experienced | 2 |

Somewhat experienced | 2 |

Little experience | 2 |

No experience | 2 |

Very experienced | 3 |

Somewhat experienced | 3 |

Little experience | 3 |

No experience | 3 |

Very simple to use | 6 |

Somewhat simple to use | 6 |

Somewhat difficult to use | 6 |

Very useful | 7 |

Somewhat useful | 7 |

Not that useful | 7 |

Not useful at all | 7 |

Very relevant | 8 |

Somewhat relevant | 8 |

A little relevant | 8 |

Not relevant at all | 8 |

Yes | 9 |

No | 9 |

Yes | 10 |

No | 10 |

More useful | 11 |

Equally useful | 11 |

Less useful | 11 |

Yes | 4 |

No | 4 |

answer | answercode |
---|---|

Very experienced | very |

Somewhat experienced | somewhat |

Little experience | little |

No experience | none |

Very simple to use | vsimple |

Somewhat simple to use | simple |

Somewhat difficult to use | diff |

Somewhat useful | somewhat |

Not that useful | little |

Not useful at all | none |

Very relevant | very |

Somewhat relevant | somewhat |

A little relevant | little |

Not relevant at all | none |

Yes | yes |

No | no |

More useful | more |

Equally useful | equal |

Less useful | less |

This time, not only are both relations in BCNF, but if you join them together on answer, you get back the exact relation we had before. This is a nonloss decomposition, and we note that this decomposition complies with the criteria of Heath's theorem. To confirm that both the new survey_answers and answer_choices are in BCNF, we see that

A join dependency or **JD** is a set of projections on a relation which, when joined together, returns the original relation. The
formal definition of a join dependency is:

*Definition*: Let *A*, *B*, *C*, ..., *Z* be subsets of the heading of a relvar *R*. Then *R* satisfies the
*join dependency* (JD)

∗ { A, B, ..., Z }if and only if every relation that is a valid value for

JOIN ( R { A }, R { B }, ..., R { Z } ) = R;

To say a relvar *R* can be losslessly decomposed into certain projections on *A*, *B*, *C*, ..., *Z* is to
say that it satisfies the JD ∗ { A, B, ..., Z }. Join dependencies are just another way of describing nonlossless
decompositions of a relvar.

It also follows from the the Heath theorem that every FD is a JD. Take the revlar RS above. It has the FD { CITY } → { STATUS }. We first decomposed it into the relvars SNC and CS, with attributes of { SNO, SNAME, CITY } and { CITY, STATUS }, respectively. RS has thus satisfied the join dependency ∗ { { SNO, SNAME, CITY }, { CITY, STATUS } }. Likewise, when survey_answers had three attributes—{id, answer, answercode}— it really had only one join dependency: ∗ {{id, answer}, {answer, answer_code}}.

The existence of FDs and multi–valued dependencies (MVDs) implies that certain decompositions have lossless join. That is, FDs and MVDs imply corresponding JDs. JDs, therefore, constitute a generalization of FDs and MVDs. As such, while every FD and MVD can be stated as an equivalent JD, there can be JDs which are not FDs or MVDs.

∗ { {id, answer}, {answer,answercode} }

This means that { id, answer } and { answer, answercode } were the only two subsets in the original survey_answers that could be nonlossly decomposed into two separate relations. So, in order to satisfy Boyce/Codd Normal Form, that's just what we did.

Superkeys always imply certain functional dependencies (from the superkeys). Since every FD is also a JD (according to the Heath Theorem), superkeys always imply certain join dependencies. Take our suppliers relvar SS for example. { SNO } is a superkey (actually a key). SS satisfies the JD of

∗ { {SNO, SNAME}, { SNO, STATUS }, { SNO, CITY } }

The definition of 5th Normal Form (5NF) is:

*Definition*: Relvar *R* is in 5NF if and only if every non-trivial join dependency satisfied by *R* is implied by the superkeys of
*R*

A trivial join dependency is a join dependency in which one of the subsets of the heading is the entire heading. Obviously every relvar has trivial join dependencies because a subset consisting of the entire header of a relvar can be joined with any other subset of the relvar. Also the same subset consisting of all the attributes of the header can be joined with TABLE_DEE.

Most relvars that are in BCNF are also in 5NF. The exceptions are small. The following theorem explains that the only time when a BCNF relvar could possibly not be in 5NF, could possibly have a nontrivial JD that does not consist entirely of superkeys, is when there are keys with more than one attribute.

*Theorem*: Let *R* be a relvar in BCNF (or even just a 3NF relvar) and let *R* have no composite keys (that is, keys consisting of two or
more attributes), then *R* is in 5NF.

So if you can get to BCNF (which is easy enough) and there aren't any composite keys in your BCNF relvar, you don't have to worry about the complexities of general JDs and 5NF: you know without have to think about the matter any further that the relvar is simply in 5NF.

If a relvar is in 5NF, the only nontrivial JDs are those implied by superkeys. The only lossless projection decompostions are ones in which every
projection is on the attributes of some superkeys; each such projection includes some key of *R*. As a consequence, the corresponding
"recomposition" joins are all **one-to-one (from one tuple to one tuple)**, and no redunancies are or can be eliminated by the decompostions
("joins are all one-to-one" means: "the recomposition join is a bijection. In
other words A JOIN B means A and B have some superkey in common (not necessarily just one attribute) and each tuple in A corresponds to exactly one
tuple in B and vice versa." See Dbforums reply:)

5NF does not necessarily mean that all possible redundancy has been eliminated, however. It says that further nonloss decomposition of a 5NF relvar *R*
into projections, while it might be possible, won't eliminate any redundancies. But this does not mean that *R* is redundancy free. Certain types
of redundancies cannot be reduced through projections. Consider the following 5NF relvar SPJ which suffers from redundancy. Its predicate is: Supplier
SNO supplies part PNO to project JNO in quanity QTY. The sole key is { SNO, PNO, JNO }.

SNO | PNO | JNO | QTY |
---|---|---|---|

S1 | P1 | J1 | 200 |

S1 | P3 | J4 | 700 |

S2 | P3 | J1 | 400 |

S2 | P3 | J2 | 200 |

S2 | P3 | J3 | 200 |

S2 | P3 | J4 | 500 |

S2 | P3 | J5 | 600 |

S2 | P3 | J6 | 400 |

S2 | P3 | J7 | 800 |

S2 | P5 | J1 | 100 |

We see the fact that supplier S2 supplies part P3 repeated several times. There are other another redundancies. The fact that part P3 is supplied to project J4 (JNO stands for project number) is repeated. So is the fact that J1 is supplied by supplier S2. The only nontrivial functional dependency satisfied by this relvar is this functional dependency:

{ SNO, PNO, JNO } → { QTY }which is an arrow out of a superkey. In other words, QTY depends on all three of SNO, PNO and JNO, and it can't appear in a relvar with anything less than all three. Hence, there is no nonloss decomposition that can remove the redundancies. The only JD the relvar satisfies is the trivial JD ∗ { {SNO, PNO, JNO, QTY} }.

It is always possible to decompose a non-5NF. Because 5NF is the final normal form with respect to projection as the decomposition operator,
it is sometimes called *projection/join* normal form to stress that the point so long as we limit ourselves to projection as the decomposion
operator and join as the recomposition operator.

Finally, keep in mind to say that a revlar is in BCNF does not mean that it is not in 5NF (though it usually is).

Join dependency means that an table, after it has been decomposed into two or more smaller tables, must be capable of being joined again on common keys to form the original table. Stated another way, 5NF indicates when an entity cannot be further decomposed using projections.

If we examine survery_questions again, which was in BCNF

questionID | answertype | id |
---|---|---|

experience1 | radio | 1 |

experience2 | radio | 2 |

experience3 | radio | 3 |

easeofuse1 | radio | 6 |

usefulness1 | radio | 7 |

relevance1 | radio | 8 |

understand1 | radio | 9 |

recommend1 | radio | 10 |

compare1 | radio | 11 |

review1 | textarea | 12 |

useremotelabs | radio | 4 |

whynotremotelabs | textarea | 5 |

we notice that it has these join dependencies:

∗{ {id, questionID}, {questionID,answertype} }

∗{ {questionID, id}, {id,answertype} }

survey_questions could easily be further nonloss decomposed into two separate relations—in two different ways! So why don't we need to decompose it? Well, not only is survey_questions already in BCNF, but it's also in 5th Normal Form. This means that each join dependency is satisfied by superkeys: {id, questionID} is a superkey, {questionID,answertype} is also a superkey, and so are the two attribute sets in the second JD.

What does a JD really mean intuitively? Take this simplify version of SPJ without the QTY attribute? The predicate for SPJ is: Supplier SNO supplies part PNO to project JNO. The sole key is { SNO, PNO, JNO }.

SNO | PNO | JNO |
---|---|---|

S1 | P1 | J2 |

S1 | P2 | J1 |

S2 | P1 | J1 |

S1 | P1 | J1 |

Let's attach a real-world meaning to SPJ. Suppose SPJ tells us that all three of the following are true propostions:

- Smith supplies monkey wrenches to some project (think of S1 as Smith and P1 as monkey wrenches).
- Somebody supplies monkey wrenches to the Manhattan project (think of P1 as monkey wrenches and J1 as the Manhattan project).
- Something is supplied to the Manhattan project by Smith (think of J1 as the Manhattan project and S1 as Smith)

Then is the following proposition true?

- Smith supplies monkey wrenches to the Manhattan project (S1 supplies P1 to project J1).

Normally, propositions 1, 2 and 3 would not imply proposition 4. We know that Smith supplies wrenches to *some* project (say,
project *z*) from proposition 1, that some supplier (say, supplier *x*) supplies monkey wrenches to the Manhattan project (from proposition 2),
and that Smith supplies some part (say, part *y*) to the Manhattan project—but we cannot validly infer, just on the basis of the three propositions,
that *x* is Smith or *y* is monkey wrenches or *z* is the Manhattan project.

Here is how a join dependency helps supply the answer. SPJ satisfies the join dependency ∗ { { SNO, PNO }, { PNO, JNO }, { SNO, JNO } }, which we will abbreviate as ∗ { SP, PJ, SJ }. This JD tell us that SPJ satisfies the following obvious constraint:

IF <s,p> ∈ SP AND <p,j> ∈ PJ AND <s,j> ∈ SJ THEN <s,p,j> ∈ SPJ

For example, the tuples <S1, P1>, <P1, J1> and <S1, J1> appear in SP, PJ, and SJ, respectively, therefore the tuple <S1, P1, J1>
will obviously appear in their join SPJ. If we look at it from a different slightly different angle, we see that the tuple <s, p> appears in SP if
(and only if) there exists some value *z* such that <s, p, z> appears in SPJ. Likewise, the tuple <p, j> appears in PJ if (and only if)
there exists some value *x* such that <x, p, j> appears in PJ, and the tuple <s, j> appears in SJ if (and only if) <s, y, j>
appears in SJ for some *y*. So we see that the earlier constraint is logically equivalent to this one:

IF for some x, y, z <s,p,z> ∈ SPJ AND <x,p,j> ∈ SPJ AND <s,y,j> ∈ SPJ THEN <s,p,j> ∈ SPJ

Given SPJ above, tuples <**S1**,**P1**,J2>, <S2,**P1**,**J1**>
and <**S1**,P2,**J1**> all appear in SPJ, and therefore so does <**S1**,**P1**,**J1
**>. The join dependecy ∗ { SP, PJ, SJ } tells us that, given propositions 1, 2 and 3, proposition 4 is also true. We can validly
infer proposition 4 from propositions 1, 2, and 3.

This constraint has a cyclic nature (IF *s* is connect to *p* and *p* is connected to *j* and *j* is connected back to *s* again,
THEN *s* and *p* and *j* must all be directly connected, in the sense that they must all appear together in the same tuple). Such cyclic
constraints are instances when we might have a relvar that is in BCNF but not in 5NF. Such cyclic constraints are, though, quite rare in practice.

Well, unless we get a data set into First Normal Form, we aren't dealing with relations at all—so it must go first. And if a relation is in 5th Normal Form, it's also in BCNF, so theoretically we could try to skip BCNF altogether. However, generally it's much easier to deal with functional dependencies than join dependencies (can you imagine figuring out all JDs for the original survey_answers?), so going for Boyce/Codd Normal Form is the natural progression.

And once relations are in BCNF, most of the time they will also be in 5th Normal Form—in fact, if a BCNF relation has a primary key which contains only one attribute, you can bet on it. But still, it's good to use 5th Normal Form as a final check.

Normalization theory formalizes, explicitly describes, what are certain commonsense principles of design.

The objective of reducing redundancy can conflict with the objective of dependency preservation. Consider the following relvar:

ADDR { STREET, CITY, STATE, ZIP }

Assume ADDR has these FDs:

{ STREET, CITY, STATE } → { CITY } { ZIP } → { CITY, STATE }

Since ZIP is not a key ADDR is not in BNCF. If we apply Heath's theorem and decompose it into BCNF projections:

ZCS { ZIP, CITY, STATE } KEY { ZIP } ZS { ZIP, STREET } KEY { ZIP, STREET }

The original FD of { STREET, CITY, STATE } → { CITY } is gone. It is satisfied by the join of ZCS and ZS, but not by either of those projections. As a result, ZCS and CS cannot be independently updated. For example, if ZCS and CS have these values:

ZIP | CITY | STATE |
---|---|---|

10003 | New York | NY |

10111 | New York | NY |

ZIP | STREET |
---|---|

10003 | Broadway |

If we attempt to insert the tuple <10111, Broadway> into ZS, we will violate the "missing" FD, which says there can only be on zip code
for any given street in the same city and state. However, this fact can't be determined without examining projection ZCS as well as projection
ZS. For precisely this kind of reason, the dependency preservation objective says: *don't split dependencies acrosss projections*.

A FD is a single relvar constraint. When a FD is eliminated during decomposition, it changes to become a multi-relvar constraint. The dependency preservation principle is revelant when you need the restraint to remain a single-relvar restraint.

Normalization is intended to reduce redundancy *within* relvars. Orthogonality refers to avoiding redundancy *across* relvars, avoiding
duplicate tuples in two or more distinct relvars. Duplicate tuples can occur when constraints overlap. For example, if relvar SA below consists of
suppliers who are in Paris, and relvar SB is suppliers who either aren't in Paris or who have STATUS of 30, this overlap leads to duplicate tuples.

SNO | SNAME | STATUS | CITY |
---|---|---|---|

S2 | Jones | 10 | Paris |

S3 | Blake | 30 | Paris |

SNO | SNAME | STATUS | CITY |
---|---|---|---|

S1 | Jones | 20 | London |

S3 | Blake | 30 | Paris |

S4 | Clark | 20 | London |

S5 | Adams | 30 | Athens |

Tuple S3 must appear in both relvars. If it didn't it would violate their constraints. Obviously, if we never have relvars of the same type, we will never have duplicate tuples. But we must also take into account subsets of the attributes of a tuple. For example,

SNO | SNAME | STATUS |
---|---|---|

S1 | Smith | 20 |

S2 | Jones | 10 |

S3 | Blake | 30 |

S4 | Clark | 20 |

S5 | Adams | 30 |

SNO | SNAME | CITY |
---|---|---|

S1 | Smith | London |

S2 | Jones | Paris |

S3 | Blake | Paris |

S4 | Clark | London |

S5 | Adams | Athens |

If we take just the two attributes [SNO, SNAME] from each tuple of SX and SY, we see they are identical. This type of redundancy also leads to update anomalies. The principle of orthogonal design can be put this way:

*The principle of Orthogonal Design:* If *A* and *B* are distinct relvars in the same database, then there must not exist nonlossless
decompositions of *A* and *B* into *A1*, *A2*, ... *Am* and *B1*, *B2*, ... *Bm*, respectively, such that the
relvar constraints for some projection *Ai* in the set *A1*, *A2*, ... *Am* and some projection *Bj* in the set *B1*,
*B2*, ... *Bm*, permit the same tuple to appear in both.

That is, such decompositions "must not exist" because they introduce redundacy across the resulting relvars. Since one "nonloss decomposition" is the identity projection (the projection of a relvar on all its attributes), this definition also covers the case illustrated by the first example of SA and SB.

Like the principles of normalization, the principle of Orthogonal Design is basically just common sense. Normalization is intended to reduce redundancy within relvars, and the principle of orthogonality is intended to reduce redundancy across relvars.

The example of SX and SY illustrates a lossless decomposition of S. SX is the projection S { SNO, SNAME, STATUS}, and SY is the projection S {SNO, SNAME, CITY }. Both SX and SY are in 5NF. Yet the decomposition is bad because SX and SY contain duplicate tuples.

- Add key points from lesson 8 and 9, having to do with PHP/SQL implementation issues:
- protecting against injection attacks
- ???