Introduction to Normalization – SQL Tutorial

Introduction to Normalization – SQL Tutorial

5 mins read1.2K Views Comment
Updated on Nov 22, 2022 19:30 IST

This article will cover what Normalization in SQL is, its forms, such as 1NF, 2NF, etc., and examples.

2022_04_Normalization-In-SQL.jpg

Databases store redundant data and retrieving data without duplicates can be a tedious task. Normalization in SQL can be used to retrieve data without redundancy. In this article you will get a detailed walkthrough on what normalization is and its various normal forms.

The following topics are covered in this article:

  1. What is Normalization?
  2. Normal Forms 

What is Normalization?

Normalization is the process of organizing the data to reduce duplicates in tables. It improves data integrity with the various normal forms used step by step. The 1st normal form was defined by Edgar F Codd in the 1970s.

Large tables are broken down into small tables and for every interaction to occur on the data, data present in the database has to be normalized. Incase the data is not present in normal forms, then the following anomalies occur:

  1. INSERTION –  Occurs when users cannot insert data into the table without the presence of another attribute.
  2. UPDATION  – Occurs during partial update of data.
  3. DELETION – Occurs when a specific set of attributes are lost because of deleting another set of attributes. 

Now that you know what normalization is, let us understand the various normal forms.

Recommended online courses

Best-suited Database and SQL courses for you

Learn Database and SQL with these high-rated online courses

– / –
2 months
4.24 K
6 weeks
– / –
15 hours
– / –
– / –
– / –
1 hours
– / –
30 hours
– / –
31 hours
– / –
80 hours

Normal Forms 

The following are the four normal forms used to enhance the distribution of data. 

  1. 1st Normal Form (1NF)
  2. 2nd Normal Form (2NF)
  3. 3rd Normal Form (3NF)
  4. Boyce-Codd Normal Form(BCNF)
2022_04_Normalization.jpg

You must remember that to normalize any piece of data you will start normalizing from the 1st normal form and go to the higher normal forms [based on the requirement].

To understand normal forms better, let us consider the following table:

Patients table

Patient ID Patient Name Sex Age MedicineID Medicines Taken Category
1 Sheela F 42 M01, M02 Meftal Forte, Meftal Spas Pain Killer, Pain Killer
2 Rehan M 15 M03, M04 Cetaphil Day Cream, Cetaphil Night Cream Face Cream, Face Cream
3 Anay M 25 M01 Meftal Forte Pain Killer

1st Normal Form (1NF)

In this normal form each table cell must only have a single value. So, basically this normal form ensures that atomicity is maintained and all the records in a table hold a unique value. 

Here, please note that this table is not further divided and if the table holds composite or multi-valued attributes then it violates 1NF form.

In our example the medicines taken and category have two values. So, currently it violates the 1st NF. So, the above table is normalized into 1NF as below:

Patient ID Patient Name Sex Age MedicineID Medicines Taken Category
1 Sheela F 42 M01 Meftal Forte Pain Killer 
1 Sheela F 42 M02 Meftal Spas Pain Killer
2 Rehan M 15 M03 Cetaphil Day Cream Face Cream
2 Rehan M 15 M04 Cetaphil Night Cream Face Cream
3 Anay M 25 M01 Meftal Forte Pain Killer

Now if you look at the above table, although every record is unique, there is a lot of data redundancy present. So, to avoid the same, let’s further normalize the above table.

2nd Normal Form (2NF)

For data to be normalized into 2NF, the data must be in 1NF and should have only a single-column primary key. So, basically the table should not have partial dependency. 

Here the above table can be normalized into two tables.

Table 1

Patient ID Patient Name Sex Age
1 Sheela F 42
2 Rehan M 15
3 Anay M 25

Table 2

Patient ID MedicineID Medications Taken
1 M01 Meftal Forte
1 M02 Meftal Spas
2 M03 Cetaphil Day Cream
2 M04 Cetaphil Night Cream
3 M01 Meftal Forte

In the above two tables, though each row holds a unique value and a single primary key, there is still a lot of data redundancy. So, let’s further normalize the data into 3NF form.

3rd Normal Form (3NF)

For data to be normalized into 3NF, the data must be in 1NF and 2NF. Apart from this, data should not have any transitive dependencies for non-prime attributes.  

So basically all attributes which do not belong to the candidate key set must not be dependent on other non-prime attributes.

In our above example, we see the Patient ID determines the MedicineID and the MedicineID determines the medicine. So basically the PatientID determines the medicine name through MedicineID, which implies the presence of transitive functional dependency.

So you can further normalize the above Table 2 in to the 3NF form as follows:

Table 1:

Patient ID MedicineID
1 M01
1 M02
2 M03
2 M04
3 M01

Table 2:

Medicine ID Medications Taken
M01 Meftal Forte
M02 Meftal Spas
M03 Cetaphil Day Cream
M04 Cetaphil Night Cream
M01 Meftal Forte

Here we can see that all the non-key attributes are fully dependent only on the primary key.

Next let us understand the Boyce Codd Normal Form.

Boyce Codd Normal Form (BCNF)

Boyce Codd Normal Form, also known as the 3.5 NF was developed by Raymond F. Boyce and Edgar F. Codd . For this normal form, you must make sure that the data is in 3NF. Here you can divide tables further to make sure a single candidate key is present. 

Refer to the below table to understand BCNF.

Patient ID Patient Name Specialization Doctor Name
1 Sheela Orthopedic Dr. Suhana
1 Sheela Cardiology Dr. Akash
2 Rehan ENT Dr. Harsh
2 Rehan General Medicine Dr. Suhana
3 Anay Ophthalmology Dr. Rohit
  • Every patient can go to multiple doctors
  • Multiple doctors can be specialized in various specialization
  • For every specialization, a doctor can treat patients

Here, the PatientID and Specialization can form the primary key indicating Specialization is a prime attribute.  Every doctor is also related to specialization which is not allowed in BCNF. 

So we can normalize the above table as follows:

Table 1:

Patient ID Doctor ID
1 DOC01
1 DOC02
2 DOC03
2 DOC01
3 DOC04

Table 2: 

Doctor ID Specialization Doctor Name
DOC01 Orthopedic Dr. Suhana
DOC02 Cardiology Dr. Akash
DOC03 ENT Dr. Harsh
DOC01 General Medicine Dr. Suhana
DOC04 Ophthalmology Dr. Rohit

With this, we end this article on Normalization in SQL. We hope you found it informative. You can refer to the article on MySQL Commands for a detailed walkthrough of all commands.

Top Trending Tech Articles:
Career Opportunities after BTech Online Python Compiler What is Coding Queue Data Structure Top Programming Language Trending DevOps Tools Highest Paid IT Jobs Most In Demand IT Skills Networking Interview Questions Features of Java Basic Linux Commands Amazon Interview Questions

Recently completed any professional course/certification from the market? Tell us what liked or disliked in the course for more curated content.

Click here to submit its review with Shiksha Online.

About the Author

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio