Introduction to Normalization – SQL Tutorial
This article will cover what Normalization in SQL is, its forms, such as 1NF, 2NF, etc., and examples.
Databases store redundant data and retrieving data without duplicates can be a tedious task. Normalization in SQL can be used to retrieve data without redundancy. In this article you will get a detailed walkthrough on what normalization is and its various normal forms.
The following topics are covered in this article:
What is Normalization?
Normalization is the process of organizing the data to reduce duplicates in tables. It improves data integrity with the various normal forms used step by step. The 1st normal form was defined by Edgar F Codd in the 1970s.
Large tables are broken down into small tables and for every interaction to occur on the data, data present in the database has to be normalized. Incase the data is not present in normal forms, then the following anomalies occur:
- INSERTION – Occurs when users cannot insert data into the table without the presence of another attribute.
- UPDATION – Occurs during partial update of data.
- DELETION – Occurs when a specific set of attributes are lost because of deleting another set of attributes.
Now that you know what normalization is, let us understand the various normal forms.
Best-suited Database and SQL courses for you
Learn Database and SQL with these high-rated online courses
Normal Forms
The following are the four normal forms used to enhance the distribution of data.
- 1st Normal Form (1NF)
- 2nd Normal Form (2NF)
- 3rd Normal Form (3NF)
- Boyce-Codd Normal Form(BCNF)
You must remember that to normalize any piece of data you will start normalizing from the 1st normal form and go to the higher normal forms [based on the requirement].
To understand normal forms better, let us consider the following table:
Patients table
Patient ID | Patient Name | Sex | Age | MedicineID | Medicines Taken | Category |
1 | Sheela | F | 42 | M01, M02 | Meftal Forte, Meftal Spas | Pain Killer, Pain Killer |
2 | Rehan | M | 15 | M03, M04 | Cetaphil Day Cream, Cetaphil Night Cream | Face Cream, Face Cream |
3 | Anay | M | 25 | M01 | Meftal Forte | Pain Killer |
1st Normal Form (1NF)
In this normal form each table cell must only have a single value. So, basically this normal form ensures that atomicity is maintained and all the records in a table hold a unique value.
Here, please note that this table is not further divided and if the table holds composite or multi-valued attributes then it violates 1NF form.
In our example the medicines taken and category have two values. So, currently it violates the 1st NF. So, the above table is normalized into 1NF as below:
Patient ID | Patient Name | Sex | Age | MedicineID | Medicines Taken | Category |
1 | Sheela | F | 42 | M01 | Meftal Forte | Pain Killer |
1 | Sheela | F | 42 | M02 | Meftal Spas | Pain Killer |
2 | Rehan | M | 15 | M03 | Cetaphil Day Cream | Face Cream |
2 | Rehan | M | 15 | M04 | Cetaphil Night Cream | Face Cream |
3 | Anay | M | 25 | M01 | Meftal Forte | Pain Killer |
Now if you look at the above table, although every record is unique, there is a lot of data redundancy present. So, to avoid the same, let’s further normalize the above table.
2nd Normal Form (2NF)
For data to be normalized into 2NF, the data must be in 1NF and should have only a single-column primary key. So, basically the table should not have partial dependency.
Here the above table can be normalized into two tables.
Table 1
Patient ID | Patient Name | Sex | Age |
1 | Sheela | F | 42 |
2 | Rehan | M | 15 |
3 | Anay | M | 25 |
Table 2
Patient ID | MedicineID | Medications Taken |
1 | M01 | Meftal Forte |
1 | M02 | Meftal Spas |
2 | M03 | Cetaphil Day Cream |
2 | M04 | Cetaphil Night Cream |
3 | M01 | Meftal Forte |
In the above two tables, though each row holds a unique value and a single primary key, there is still a lot of data redundancy. So, let’s further normalize the data into 3NF form.
3rd Normal Form (3NF)
For data to be normalized into 3NF, the data must be in 1NF and 2NF. Apart from this, data should not have any transitive dependencies for non-prime attributes.
So basically all attributes which do not belong to the candidate key set must not be dependent on other non-prime attributes.
In our above example, we see the Patient ID determines the MedicineID and the MedicineID determines the medicine. So basically the PatientID determines the medicine name through MedicineID, which implies the presence of transitive functional dependency.
So you can further normalize the above Table 2 in to the 3NF form as follows:
Table 1:
Patient ID | MedicineID |
1 | M01 |
1 | M02 |
2 | M03 |
2 | M04 |
3 | M01 |
Table 2:
Medicine ID | Medications Taken |
M01 | Meftal Forte |
M02 | Meftal Spas |
M03 | Cetaphil Day Cream |
M04 | Cetaphil Night Cream |
M01 | Meftal Forte |
Here we can see that all the non-key attributes are fully dependent only on the primary key.
Next let us understand the Boyce Codd Normal Form.
Boyce Codd Normal Form (BCNF)
Boyce Codd Normal Form, also known as the 3.5 NF was developed by Raymond F. Boyce and Edgar F. Codd . For this normal form, you must make sure that the data is in 3NF. Here you can divide tables further to make sure a single candidate key is present.
Refer to the below table to understand BCNF.
Patient ID | Patient Name | Specialization | Doctor Name |
1 | Sheela | Orthopedic | Dr. Suhana |
1 | Sheela | Cardiology | Dr. Akash |
2 | Rehan | ENT | Dr. Harsh |
2 | Rehan | General Medicine | Dr. Suhana |
3 | Anay | Ophthalmology | Dr. Rohit |
- Every patient can go to multiple doctors
- Multiple doctors can be specialized in various specialization
- For every specialization, a doctor can treat patients
Here, the PatientID and Specialization can form the primary key indicating Specialization is a prime attribute. Every doctor is also related to specialization which is not allowed in BCNF.
So we can normalize the above table as follows:
Table 1:
Patient ID | Doctor ID |
1 | DOC01 |
1 | DOC02 |
2 | DOC03 |
2 | DOC01 |
3 | DOC04 |
Table 2:
Doctor ID | Specialization | Doctor Name |
DOC01 | Orthopedic | Dr. Suhana |
DOC02 | Cardiology | Dr. Akash |
DOC03 | ENT | Dr. Harsh |
DOC01 | General Medicine | Dr. Suhana |
DOC04 | Ophthalmology | Dr. Rohit |
With this, we end this article on Normalization in SQL. We hope you found it informative. You can refer to the article on MySQL Commands for a detailed walkthrough of all commands.
Top Trending Tech Articles:Career Opportunities after BTech Online Python Compiler What is Coding Queue Data Structure Top Programming Language Trending DevOps Tools Highest Paid IT Jobs Most In Demand IT Skills Networking Interview Questions Features of Java Basic Linux Commands Amazon Interview Questions
Recently completed any professional course/certification from the market? Tell us what liked or disliked in the course for more curated content.
Click here to submit its review with Shiksha Online.
This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio