Table with unique identifier in Third normal form?

1.1k views Asked by At

Suppose, I have a Table with the columns:

  • person_id (primary key)
  • first_name
  • last_name
  • birthday

I also have a unique constraint on the combination {first_name, last_name} (I know that more people can have the same name, but I want to keep my example simple). I want to know whether this Table is in Third normal form.


My reasoning (before EDIT):

  • All fields can only contain atomic values, so the Table is in First normal form.
  • The candidate keys are 1) person_id, 2) [first_name, last_name]
  • The only non-prime attribute is birthday.
  • The attribute birthday is not functionally dependent on part of candidate key 1 (which is impossible anyway, since there is only 1 attribute in candidate key 1)
  • The attribute birthday is not functionally dependent on part of candidate key 2
  • Therefore, this Table is in Second normal form.
  • The attribute birthday (is/is not) non-transitively dependent on candidate key 1
  • The attribute birthday is non-transitively dependent on candidate key 1

The Question (before EDIT):

The question that I cannot answer is if birthday is non-transitively dependent on person_id. Functionally, there is no relationship at all between this id number and the birthday.

  1. Does this mean that there is a transitive dependency (birthday depends on [first_name, last_name], and each combination [first_name, last_name] maps to an id) and therefore not in 3NF?
  2. Does this mean that there is no dependency at all, and therefore not in 3NF?
  3. Am I misinterpreting the difficult language and is this Table in 3NF?

My reasoning (after EDIT):

  • If you know the person_id, you know his first name, last name and his birthday, so there are the FDs {person_id} -> {first_name}, {person_id} -> {last_name} and {person_id} -> {birthday}.
  • If you know a person's first and last name, you know his person_id and birthday, so there are the FDs {first_name, last_name} -> {person_id} and {first_name, last_name} -> {birthday}.
  • If you know a person's birthday, you don't know anything about his person_id or name, so there are no FDs from birthday to another (set of) attribute(s).

  • All fields can only contain atomic values, so the Table is in First normal form.

  • The candidate keys are 1) {person_id}, 2) {first_name, last_name}
  • The only non-prime attribute is {birthday}.
  • The attribute {birthday} is not FD on part of CK 1 (which is impossible anyway, since there is only 1 attribute in CK 1)
  • The attribute {birthday} is not FD on part of CK 2
  • Therefore, this Table is in Second normal form.

  • There is an FD {person_id} -> {birthday}, so the attribute {birthday} is non-transitively dependent on CK 1

  • There is an FD {first_name, last_name} -> {birthday}, so the attribute {birthday} is non-transitively dependent on CK 2
  • Therefore, this Table is in Third normal form.

There is a dependency {person_id} -> {first_name, last_name} -> {birthday}, but since there is also a direct dependency {person_id} -> {birthday}, this dependency is not transitively.

The Question (after EDIT):

I don't have a predefined set of FDs from a book, so I am not sure whether the FDs are correct. Can someone confirm this, or if they don't look right, show how I can find the FDs in this practical example?


Third reasoning (second EDIT):

FD's:

  • If you only know a person's person_id, you know his first name, last name and his birthday (there cannot be multiple people with the same person_id)
    • FD: {person_id} -> {first_name}
    • FD: {person_id} -> {last_name}
    • FD: {person_id} -> {birthday}
  • Supersets including {person_id} no longer need to be considered
  • If you only know a person's first_name, you don't know any other field of this person (there can be multiple people with the same first_name)
    • Not FD: {first_name} -> {person_id}
    • Not FD: {first_name} -> {last_name}
    • Not FD: {first_name} -> {birthday}
  • If you only know a person's last_name, you don't know any other field of this person (there can be multiple people with the same last_name)
    • Not FD: {last_name} -> {person_id}
    • Not FD: {last_name} -> {first_name}
    • Not FD: {last_name} -> {birthday}
  • If you only know a person's birthday, you don't know any other field of this person (there can be multiple people with the same birthday)
    • Not FD: {birthday} -> {person_id}
    • Not FD: {birthday} -> {first_name}
    • Not FD: {birthday} -> {last_name}
  • If you know a person's first_name and last_name, you know his person_id and his birthday (there cannot be multiple people with the same first_name and last_name)
    • FD: {first_name, last_name} -> {person_id}
    • FD: {first_name, last_name} -> {birthday}
  • Supersets including {first_name, last_name} no longer need to be considered
  • If you know a person's first_name and birthday, you don't know any other field of this person (there can be multiple people with the same first_name and birthday)
    • Not FD: {first_name, birthday} -> {person_id}
    • Not FD: {first_name, birthday} -> {last_name}
  • If you know a person's last_name and birthday, you don't know any other field of this person (there can be multiple people with the same last_name and birthday)
    • Not FD: {last_name, birthday} -> {person_id}
    • Not FD: {last_name, birthday} -> {first_name}

Normal forms:

  • All attributes can only contain single values, so the Table is in First normal form.

  • Looking at the FDs, there are two candidate keys: 1) {person_id}, 2) {first_name, last_name}

  • The only non-prime attribute is {birthday}.
  • The attribute {birthday} is not FD on part of CK 1 (which is impossible anyway, since there is only 1 attribute in CK 1)
  • The attribute {birthday} is not FD on part of CK 2 (i.e. there is no FD {first_name} -> {birthday} or FD {last_name} -> {birthday})
  • Therefore, this Table is in Second normal form.

  • S transitively determines T when there exists an X such that S -> X and X -> T and not(X -> S)

  • Let S = CK1 = {person_id} and T = {birthday}. The only X such that S -> X and X -> T is when X = {first_name, last_name}. However, then also X -> S holds. Therefore, S non-transitively determines T.
  • Let S = CK2 = {first_name, last_name} and T = {birthday}. The only X such that S -> X and X -> T is when X = {person_id}. However, then also X -> S holds. Therefore, S non-transitively determines T.
  • Therefore, this Table is in Third normal form.
1

There are 1 answers

3
philipxy On BEST ANSWER

Re your original question:

Your organization and reasoning are unsound. First give the all the FDs. Eg this determines the CKs. Eg you cannot reason soundly on just giving the (alleged) CKs (which imply certain FDs) and a couple of non-FDs. Eg "non-transitively dependent" cannot be determined without knowing all the FDs. Only then can you write sound bullets & answer your numbered questions.

But let's assume that {first_name,last_name} and {person_id} really are the only CKs and that there are no FDs other than those implied by the fact that each CK determines every attribute not in it.

Functionally, there is no relationship at all between this id number and the birthday.

I don't know what you mean by "Functionally, there is no relationship at all between". Maybe you are trying to say that {person_id} does not functionally determine {birthday}. But it does, because a CK determines all attributes not in it. Maybe you mean you don't see an application constraint between people ids and birthdays and/or a table constraint involving a table's person_id and a birthday values. But there is: A given person only has one birthday at a time, and in the table a person_id only ever has one birthday at a time. This is a consequence of the meaning of and rules around "people", "birthdays", person_id and birthday. The constraint on person_id and birthday is expressed by "{person_id} -> {birthday}" and you have to know whether it is the case as part of determining the intial list of all FDs (that precedes determining CKs).

S transitively determines T when there exists an X such that S -> X and X -> T and not(X -> S). S non-transitively determines T when it doesn't transitively determine it.

  1. Does this mean that there is a transitive dependency (birthday depends on [first_name, last_name], and each combination [first_name, last_name] maps to an id) and therefore not in 3NF?

I don't know what you are trying to say by "each combination maps to an id" let alone why it implies non-3NF. Maybe you are trying to say that taking {person_id} as S and {birthday} as T and {first_name, last_name} as X we have S -> X and X -> T so (wrongly) a non-prime attribute is transitively dependent on a CK so the relation is not in 3NF. But you did not satisfy not(X -> S).

For {person_id} as S and {birthday} as T the only possibility for X -> T has {first_name,last_name} as X but X -> S because X is a key so S -> T is not transitive.

Similarly for {first_name,last_name} as S and {birthday} as T the only possibility for X -> T has {person_id} as X but X -> S because X is a key so S -> T is not transitive.

  1. Does this mean that there is no dependency at all, and therefore not in 3NF?

Since the relation in in 2NF and every non-prime attribute is non-transitively dependent on every CK, the relation is in 3NF.

  1. Am I misinterpreting the difficult language and is this Table in 3NF?

You didn't claim it was or wasn't, did you?

(Please edit your question to use proper technical terms.)

Re your EDIT version

(You acknowledged in comments that your last bullet was supposed to have CK 2 and that it was unsound. And that my guesses at your unclear phrasings were more or less what you meant.)

  • All fields can only contain atomic values, so the Table is in First normal form.

Normalization only makes sense for relational "tables", ie relations. That means unique unordered attributes ("columns") and tuples ("rows"). With one value per attribute per tuple. All relations are in 1NF.:

A relational table is always in 1NF. Each column of a row has a single value of the column's type. A non-relational database is "normalized" to tables ie 1NF (first sense of "normalized") which gets rid of repeating groups. Then those tables/relations are "normalized" to higher normal forms (second sense of "normalized").

"Atomic" is not helpful: "Atomic" originally meant not a relation.:

In Codd's original 1970 paper he explained that "atomic" meant not a relation (ie not a table):

So far, we have discussed examples of relations which are defined on simple domains--domains whose elements are atomic (nondecomposable) values. Nonatomic values can be discussed within the relational framework. Thus, some domains may have relations as elements.

By the time of Codd's 1990 book The Relational Model for Database Management: Version 2:

From a database perspective, data can be classified into two types: atomic and compound.

In the relational model there is only one type of compound data: the relation.

And a relation is a single value so there's nothing wrong with relation-valued attributes. (Pace Codd's changing opinion on that.)

  • The candidate keys are 1) {person_id}, 2) {first_name, last_name}
  • The only non-prime attribute is {birthday}.

To normalize you must know for every subset of attributes what attributes are (non-trivially) functionally dependent on it. Although every superset of a determinant determines what it does, so that takes care of a lot of them. You skipped that step.

You cannot show that {first_name,last_name} is a CK without showing that {first_name} and {last_name} aren't CKs via what each determines. Assuming you do, you still won't have considered remaining possible determinants {first_name,birthday} and {last_name,birthday}.

You cannot show that those are the only CKs until you show that there are no other CKs. You must show for every subset of attributes whether it is a CK. Although no superset of a CK is a CK, so that takes care of a lot of them. There are algorithms.

  • There is an FD {person_id} -> {birthday}, so the attribute {birthday} is non-transitively dependent on CK 1
  • There is an FD {first_name, last_name} -> {birthday}, so the attribute {birthday} is non-transitively dependent on CK 2

Your new last two bullets are unjustified. Look at my message's definition and use of "(non) transitively dependent"; just knowing S -> T does not tell you enough. When there's a non-transitive FD S -> X -> T it must also be that S -> T; so knowing S -> T alone does not tell you about whether S transitively or non-transitively determines T. "->" does not mean "directly"; non-transitively is the only meaningful notion of "directly".

Maybe by "so" you mean "so as shown below for the first of these two cases"?

There is a dependency {person_id} -> {first_name, last_name} -> {birthday}, but since there is also a direct dependency {person_id} -> {birthday}, this dependency is not transitively.

See above: "direct" is a misconception. And as I said in my original answer it's since {first_name, last_name} -> {person_id} for CK1 and {person_id} ->{first_name, last_name} for CK 2.

I don't have a predefined set of FDs from a book, so I am not sure whether the FDs are correct. Can someone confirm this, or if they don't look right, show how I can find the FDs in this practical example?

You have to consider every possible value the table can have due to every possible application situation that can come up and the criterion (predicate) by which you are to put rows into the table vs leave them out. You can probably think of counterexamples to putative FDs, where two rows can share the same value for a putative determinant. Eg for {first_name,birthday} and {last_name,birthday} you can expect two different people to have the same name and birthday. (You can check the last two putative FDs.)

(Now your language is clearer. Roughly speaking your errors (still) come from not using definitions and skipping steps.)

Re your second EDIT version:

It now seems like you have probably done everything soundly. (Although I can't know for sure because you don't specifically make clear that there are no more 2-element attribute sets & there are no more attribute sets; why that pair is the set of CKs; and the 2NF/3NF "therefore"s.)

Phrasings like "If you know a person's last_name and birthday, you don't know any other field of this person" are problematic. Me: If I only know two fields, of course I don't know others; so there's never a FD? You: For a person. Me: But if I know the person then I know their first_name; so there is an FD? You: If you know first_name and birthday for one person but not who; you don't know any other field. Me: Sometimes I do know other fields; so the implication is false; so there's an FD? It turns out that "know" is a super-confusing word good to avoid. Write, "Given ... there exists ...". As you did in "(there cannot be multiple ...)".