In this article we give the information about normalization in a database design technique that organizes tables in a manner that reduce redundancy (useless) and dependency of data.
Normalization in Database:
- Normalization is the process of organizing the data in the database. It is performed to reduced data redundancy in a database. Data redundancy means having the same data at multiple places. It is necessary to remove the data redundancy because it causes anomalies in a database which makes it very hard for a database administrator to maintain it.
- To avoid insertion, update & deletion anomaly. (Inconsistency or error)
- Normalization divides larger tables into smaller table and link them using relationships. (Primary key and Foreign key)
Anomaly (Insertion, Update, Delete)
Anomaly: Something that is not according to standard or normal
Table Name: student
Roll_No | Name | Address | Dept |
1 | Rahul | Pune | BCA |
1 | Rahul | Pune | BA |
2 | Pranav | Satara | MBA |
3 | Geeta | Sangli | BCA |
3 | Geeta | Sangli | BA |
Insert:
Insert into student (Roll_No, Name, Addrress) values (4, ‘Amol’, ‘Satara’);
Result: student does not allow null values
Update:
Update student set dept=’BCA’ where Roll_No=1;
Result: Update two rows
Delete:
Delete from student where Roll_No=1;
Result: Delete two rows
Normalization Types:
Here are most commonly used normal forms:
- First Normal Form (1NF)
- Second Normal Form (2NF)
- Third Normal Form (3 NF)
- Boyce & Codd Normal Form (BCNF)
- Forth Normal Form (4NF)
- Fifth Normal Form (5NF)
First Normal Form (1NF)
As per the rule of first normal form,
- An attributes (Column) of a table cannot hold multiple values.
- It should hold only atomic (forming a single unit) values.
- Each record needs to be unique.
Atomic value: An atomic value is a value that cannot be divided.
Table Name: student
Roll_No | Name | Mobile_No | Dept |
1 | Rahul | 9986532147 | BCA |
2 | Amol | 9865326589, 9965832658 | BA |
3 | Pranav | 9789563211, 9856231458 | MBA |
4 | Geeta | 9963524789 | BCA |
5 | Jaya | 9856231458 | BA |
Above table, column mobile number there are two mobile numbers in rows two and three so there is not an atomic value.
After First Normalization
Table Name: student
Roll_No | Name | Mobile_No | Dept |
1 | Rahul | 9986532147 | BCA |
2 | Amol | 9865326589 | BA |
2 | Amol | 9965832658 | BA |
3 | Pranav | 9789563211 | MBA |
3 | Pranav | 9856231458 | MBA |
4 | Geeta | 9963524789 | BCA |
5 | Jaya | 9856231458 | BA |
Second normal form(2NF):-
- It should be in the first normal form.
- All non key attributes are fully functional dependent on the primary key. In simple words it should not have partial dependency
A table or relation is in 2nd normal form when it meets all the requirements of 1st normal form and all non-key attributes are completely dependent on the primary key.
Table Name: Employee
Emp_ID |
Qualification |
Age |
101 | BCA | 20 |
101 | MCA | 20 |
101 | MBA | 20 |
102 | BCom | 21 |
102 | MBA | 21 |
103 | MCA | 22 |
104 | MSW | 23 |
104 | MBA | 23 |
Composite Primary Key: Emp_ID + Qualification
Non Key Attribute: Age
Example:
Q. Find the age of employee where employee ID is 102
SQL> Select * from Employee where emp_id=102;
O/P:
102 BCom 21
102 MBA 21
here we get the two age values that is not in proper out put.
After Second Normalization
To make the table compiles with 2NF we can break it in two tables like this:
Here,
1st NF:
- It should hold only atomic values.
- Each record needs to be unique.
2nd NF:
- Non key attribute should be depend on single key (that is primary key)
Third Normal Form (3rd NF):
- It is in second normal form.
- There is no transitive functional dependency.
- A transitive functional dependency is when changing a non-key column, might clause any of the other non-key column to change.
Here,
Emp_ID-> Zip-> City that is City depend on zip and zip depend on Emp_ID
Emp_ID-> City City dependent on Emp_ID
City Indirectly depend on Emp_ID is called transitive functional dependency.
After Third Normalization;
BCNF (Boyce-Codd Normal Form)
BCNF is an advanced version of the Third Normal Form (3NF) in relational database design. It deals with certain types of anomalies not handled by 3NF. BCNF is stricter than 3NF and ensures a higher level of data integrity.
Definition of BCNF
A relation is in Boyce-Codd Normal Form (BCNF) if:
For every non-trivial functional dependency (X → Y), X is a super key.
- Non-trivial means: Y is not a subset of X.
- Super key: A set of one or more attributes that uniquely identify a tuple in a relation.
Why BCNF is Needed (Beyond 3NF)?
3NF allows functional dependencies where the determinant is not a candidate key, as long as the dependent is a prime attribute.
This can still lead to anomalies. BCNF fixes this by ensuring every determinant is a super key.
Example of BCNF Violation
Table: CourseInstructor
CourseID | Instructor | Room |
CS101 | Smith | A1 |
CS102 | Smith | A1 |
CS103 | Jones | B1 |
Functional Dependencies:
- CourseID → Instructor (Each course has one instructor)
- Instructor → Room (Each instructor has a fixed room, but instructor is not a super key)
Issue:
- Instructor → Room violates BCNF because Instructor is not a super key.
How to Convert to BCNF
Step 1: Decompose the table into two:
- Instructor_Room (Instructor, Room)
- Course_Instructor (CourseID, Instructor)
Now:
- Instructor → Room is in a separate table where Instructor is a key.
- CourseID → Instructor remains in the second table.
Advantages of BCNF
- Removes more redundancy than 3NF.
- Eliminates anomalies related to non-super key dependencies.
- Ensures stronger data integrity.
Disadvantages
- Decomposition may lead to more tables, increasing the complexity of joins.
- Sometimes performance may degrade due to more joins in queries.
Advantages of Normalization:
Normalization is a process in database design that organizes data to reduce redundancy and improve data integrity.
- Eliminates Data Redundancy
- Removes duplicate data.
- Each piece of information is stored only once.
- Example: A student’s name is not repeated in every subject record.
- Improves Data Integrity and Consistency
- Ensures that data is accurate and consistent across the database.
- Updates are done in one place, avoiding conflicting or outdated values.
- Easier Data Maintenance
- Simplifies insert, update, and delete operations.
- Reduces the chances of anomalies (insertion, update, deletion anomalies).
- Saves Storage Space
- By removing duplicate data, less storage is used.
- More efficient use of memory and disk space.
- Better Query Performance in Some Cases
- Smaller tables are faster to scan and join under certain conditions.
- Well-organized schema can help the database engine optimize queries.
- Logical Data Structure
- Makes the database more understandable and organized.
- Tables are designed with clear relationships and purposes.
- Flexibility in Database Design
- Easier to make future changes in schema (e.g., adding new fields or relations).
- Supports Data Integrity Constraints
- Normalized tables better support foreign keys, primary keys, and unique constraints, helping maintain referential integrity.
Some More:
POP- Introduction to Programming Using ‘C’
OOP – Object Oriented Programming
DBMS – Database Management System
RDBMS – Relational Database Management System
Join Now: Data Warehousing and Data Mining