Confusion over BCNF

1.1k views Asked by At

Schema: R(A,B,C,D,E,F,G,H,I,J) and functional dependencies FD = { A->DE, IJ->H, I->A, J->FG, G->BC }

Question: Is relation in BCNF?

Answer: It's not because A is not superkey.

I'm aware under which conditions relation is in BCNF, but what confuses me all the time is superkey. Could anyone explain why in the answer A is not superkey? And why not pick, for example, IJ or I as a superkey? k

2

There are 2 answers

0
Renzo On

The definition of BCNF requires that no non-trivial functional dependency of a relational R has a determinant (left hand side) which is a superkey. A superkey being an attribute or set of attributes that determines all the attributes of the relation. So, if at least a dependency has a determinant which is not a superkey, the relation is not in BCNF.

In this example we could start from any dependency. Let’s start from A → DE. We can check if A is a superkey by calculating its closure and see if it contains all the attributes:

A+ → A
A+ → ADE (because of A → DE)

No other attribute can be added, so A is not a superkey of R, and the relation is not in BCNF.

Similarly we could see that I, J, and G are not superkeys.

Actually this relation has a unique candidate key, IJ, fact that can be checked by calculating its closure IJ+. This means that each superkey must contain IJ.

0
philipxy On

One does not "pick" a superkey. Given a schema and functional dependencies, some sets of attributes are candidate keys. Every superset of a candidate key is a superkey.

The notion of candidate key is usually defined in terms of superkey, though. A superkey is a set of attributes that determines every attribute. A candidate key is a superkey that has no proper (smaller) subset that is a superkey.

Another way to define "superkey" is a set of attributes where every subrow of values for it is unique. Hence we often express or recognize that a set of attributes is a superkey by saying that it is "unique".

(You can arbitrarily "pick" one candidate key to be "primary key", but this has no relevance to relational theory. Telling your choice to a DBMS might affect what it does. Ironically, a set of columns declared PRIMARY KEY in SQL states that it is a superkey, not necessarily a candidate key, ie not necessarily a primary key. As far as constraints are concerned it means UNIQUE NOT NULL.)

When some functional dependencies hold in a schema, other ones do. Collectively the ones that hold are those that the originals imply per Armstrong's axioms. For {A} to be a superkey, some subset of it has to be a candidate key. But given your set of functional dependencies, neither {} nor {A} is a candidate key. Similarly for {I}. But {IJ}, if you determine its closure, determines all attributes. So it is a superkey. So so is every superset of it. Since no proper subset of it is, it is also a candidate key.