Understanding Database Normalization with easy Examples

Updated on Sep 30, 2020

Normalization is a method of organizing data in a database to reduce data redundancy and eliminate any undesirable anomalies such as Update and deletion anomaly. There could be various degree or form of normalization and in this article we will learn them step by step with easy examples.

What is database normalization

Formally stated,

Normalization is the systematic method to design a database schema such that duplicate and redundant data is avoided.

There are number of normalization level starting from level 1 to level 5. Each normal form describes how to get rid of a specific problem in database design.We will explore these levels by taking example of an imaginary dataset for a retail shop. Consider a retail shop wants to keep track of products and customers. So they decide to design a database. In their first attempt they design something like this -

Date	Product	Customer	City
01 Jan	Orange, Apple	Ram	Lucedale
01 Jan	Edible Oil	Harry	Spring Valley
02 Jan	Grape Juice, Butter	Emily	Wellton
02 Jan	French Loaf	Ram	Lucedale

But this scheme of storing data has a few problems. For example, there are multiple values in single cell, there are data duplication or redundancy etc. So let us see how normalization helps here.

1st Normal Form

First Normal form requires that value of each attribute contains only a single value from the domain of that attribute. In short, this means you can only put one value for each cell of your table.

In the above table, we have put both 'Orange' and 'Apple' and 'Grape Juice' and 'Butter' in the same cells. This offends the First normal form. We can normalize this problem away by putting all the products into separate rows. See below,

Date	Product	Customer	City
01 Jan	Orange	Ram	Lucedale
01 Jan	Apple	Ram	Lucedale
01 Jan	Edible Oil	Harry	Spring Valley
02 Jan	Grape Juice	Emily	Wellton
02 Jan	Butter	Emily	Wellton
02 Jan	French Loaf	Ram	Lucedale

2nd Normal Form

As you see in the above example, there are lot of data redundancy. The customer "Ram" appears 3 times in the table and "Emily" appears twice. Problem is - everytime this customer is appearing, its address is also appearing. This redundancy may create many problems. For example, if "Ram" change his address today, then you will need to update his address in all the rows where it appears (in this case 3 rows). 2nd Normal Form help you get rid of this issue. Let us see what 2nd Normal Form really mean.

A table is in 2nd Normal Form if and only if it is in First Normal Form and every non-prime attribute of the table is dependent on the whole of a candidate key

Let us now understand this statement. Look at the above table and try to determine the key of this table. Key is something that uniquely identifies each record of the table. Clearly column "Date" is not the key as "Date" is not unique. Similary neither column "Product Name" nor "Customer Name" is a key. However, the combination of "Date", "Product Name" and "Customer Name" together form the key of this table as that combination is unique and hence it uniquely identifies each record of the table. These columns or attributes are called "Prime Attribute" and they form the candidate key. The other column(s) like "Customer City" is the "non-prime attribute".

Having armed ourselves with the definition of key, non-prime attributes etc., let us now look back to the definition of 2nd Normal Form. It says that every non-prime attribute must be dependent on the whole of a key. In our case this means, the non-prime attribute "Customer City" must be dependent on the whole combination of the key. But "Customer City" is only related to the "Customer Name" but has no connection with "Product Name" or "Date" etc. So the non-prime attribute is not dependent on the whole of the candidate key.

In order to bring this table to 2nd Normal form, we will need to break this table like this

Date	Product	Customer
01 Jan	Orange	Ram
01 Jan	Apple	Ram
01 Jan	Edible Oil	Harry
02 Jan	Grape Juice	Emily
02 Jan	Butter	Emily
02 Jan	French Loaf	Ram

And then we have a separate customer table like this

Customer	City
Ram	Lucedale
Harry	Spring Valley
Emily	Wellton

3rd Normal Form

Third Normal form requires that every non-prime attribute of the table must be non-transitively dependent on the every key of the table.

To simplify the above statement consider this situation. Suppose we decide to store customers' zip code along with customer's city in the second table like below:

Customer	City	Zip
Ram	Lucedale	230 110
Harry	Spring Valley	219 266
Emily	Wellton	269 041

If you look at this table, the table is in 2NF form. "Customer City" and "Zip" both are non-prime attributes and both of them are dependent on the whole of the key which is "Customer Name" in this case. But this table is not in third normal form yet. Because third normal form requires every non-prime attribute of the table must be dependent non-transitively, i.e., Directly on the key. City is not directly dependent on Customer Name. If you knew Zip code, you could also derive the name of the city. So city is actually dependent on the key through the Zip code (transitive dependency).

Zip	City
230 110	Lucedale
219 266	Spring Valley
269 041	Wellton

Once we separate out the city and zip code detail in a new table like above, we can just refer the Zip code in our customer table. In this case, all our tables are in third normal form.