Normalization in Database

In this article we give the information about normalization in a database design technique that organizes tables in a manner that reduce redundancy (useless) and dependency of data.

Normalization in Database:

Normalization is the process of organizing the data in the database. It is performed to reduced data redundancy in a database. Data redundancy means having the same data at multiple places. It is necessary to remove the data redundancy because it causes anomalies in a database which makes it very hard for a database administrator to maintain it.
To avoid insertion, update & deletion anomaly. (Inconsistency or error)
Normalization divides larger tables into smaller table and link them using relationships. (Primary key and Foreign key)

Anomaly (Insertion, Update, Delete)

Anomaly: Something that is not according to standard or normal

Table Name: student

Roll_No	Name	Address	Dept
1	Rahul	Pune	BCA
1	Rahul	Pune	BA
2	Pranav	Satara	MBA
3	Geeta	Sangli	BCA
3	Geeta	Sangli	BA

Insert:

Insert into student (Roll_No, Name, Addrress) values (4, ‘Amol’, ‘Satara’);

Result: student does not allow null values

Update:

Update student set dept=’BCA’ where Roll_No=1;

Result: Update two rows

Delete:

Delete from student where Roll_No=1;

Result: Delete two rows

Normalization Types:

Here are most commonly used normal forms:

First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3 NF)
Boyce & Codd Normal Form (BCNF)
Forth Normal Form (4NF)
Fifth Normal Form (5NF)

First Normal Form (1NF)

As per the rule of first normal form,

An attributes (Column) of a table cannot hold multiple values.
It should hold only atomic (forming a single unit) values.
Each record needs to be unique.

Atomic value: An atomic value is a value that cannot be divided.

Table Name: student

Roll_No	Name	Mobile_No	Dept
1	Rahul	9986532147	BCA
2	Amol	9865326589, 9965832658	BA
3	Pranav	9789563211, 9856231458	MBA
4	Geeta	9963524789	BCA
5	Jaya	9856231458	BA

Above table, column mobile number there are two mobile numbers in rows two and three so there is not an atomic value.

After First Normalization

Table Name: student

Roll_No	Name	Mobile_No	Dept
1	Rahul	9986532147	BCA
2	Amol	9865326589	BA
2	Amol	9965832658	BA
3	Pranav	9789563211	MBA
3	Pranav	9856231458	MBA
4	Geeta	9963524789	BCA
5	Jaya	9856231458	BA

Second normal form(2NF):-

It should be in the first normal form.
All non key attributes are fully functional dependent on the primary key. In simple words it should not have partial dependency

A table or relation is in 2nd normal form when it meets all the requirements of 1st normal form and all non-key attributes are completely dependent on the primary key.

Table Name: Employee

Emp_ID	Qualification	Age
101	BCA	20
101	MCA	20
101	MBA	20
102	BCom	21
102	MBA	21
103	MCA	22
104	MSW	23
104	MBA	23

Composite Primary Key: Emp_ID + Qualification

Non Key Attribute: Age

Example:

Q. Find the age of employee where employee ID is 102

SQL> Select * from Employee where emp_id=102;

O/P:

102 BCom 21

102 MBA 21

here we get the two age values that is not in proper out put.

After Second Normalization

To make the table compiles with 2NF we can break it in two tables like this:

Second NF

Here,

1st NF:

It should hold only atomic values.
Each record needs to be unique.

2nd NF:

Non key attribute should be depend on single key (that is primary key)

Third Normal Form (3rd NF):

It is in second normal form.
There is no transitive functional dependency.
A transitive functional dependency is when changing a non-key column, might clause any of the other non-key column to change.

ThirdNF

Here,

Emp_ID-> Zip-> City that is City depend on zip and zip depend on Emp_ID

Emp_ID-> City City dependent on Emp_ID

City Indirectly depend on Emp_ID is called transitive functional dependency.

After Third Normalization;

BCNF (Boyce-Codd Normal Form)

BCNF is an advanced version of the Third Normal Form (3NF) in relational database design. It deals with certain types of anomalies not handled by 3NF. BCNF is stricter than 3NF and ensures a higher level of data integrity.

Definition of BCNF

A relation is in Boyce-Codd Normal Form (BCNF) if:

For every non-trivial functional dependency (X → Y), X is a super key.

Non-trivial means: Y is not a subset of X.
Super key: A set of one or more attributes that uniquely identify a tuple in a relation.

Why BCNF is Needed (Beyond 3NF)?

3NF allows functional dependencies where the determinant is not a candidate key, as long as the dependent is a prime attribute.
This can still lead to anomalies. BCNF fixes this by ensuring every determinant is a super key.

Example of BCNF Violation

Table: CourseInstructor

CourseID	Instructor	Room
CS101	Smith	A1
CS102	Smith	A1
CS103	Jones	B1

Functional Dependencies:

CourseID → Instructor (Each course has one instructor)
Instructor → Room (Each instructor has a fixed room, but instructor is not a super key)

Issue:

Instructor → Room violates BCNF because Instructor is not a super key.

How to Convert to BCNF

Step 1: Decompose the table into two:

Instructor_Room (Instructor, Room)
Course_Instructor (CourseID, Instructor)

Now:

Instructor → Room is in a separate table where Instructor is a key.
CourseID → Instructor remains in the second table.

Advantages of BCNF

Removes more redundancy than 3NF.
Eliminates anomalies related to non-super key dependencies.
Ensures stronger data integrity.

Disadvantages

Decomposition may lead to more tables, increasing the complexity of joins.
Sometimes performance may degrade due to more joins in queries.

Advantages of Normalization:

Normalization is a process in database design that organizes data to reduce redundancy and improve data integrity.

Eliminates Data Redundancy

Removes duplicate data.
Each piece of information is stored only once.
Example: A student’s name is not repeated in every subject record.

Improves Data Integrity and Consistency

Ensures that data is accurate and consistent across the database.
Updates are done in one place, avoiding conflicting or outdated values.

Easier Data Maintenance

Simplifies insert, update, and delete operations.
Reduces the chances of anomalies (insertion, update, deletion anomalies).

Saves Storage Space

By removing duplicate data, less storage is used.
More efficient use of memory and disk space.

Better Query Performance in Some Cases

Smaller tables are faster to scan and join under certain conditions.
Well-organized schema can help the database engine optimize queries.

Logical Data Structure

Makes the database more understandable and organized.
Tables are designed with clear relationships and purposes.

Flexibility in Database Design

Easier to make future changes in schema (e.g., adding new fields or relations).

Supports Data Integrity Constraints

Normalized tables better support foreign keys, primary keys, and unique constraints, helping maintain referential integrity.

Some More:

POP- Introduction to Programming Using ‘C’

DS – Data structure Using C

OOP – Object Oriented Programming

Java Programming

DBMS – Database Management System

RDBMS – Relational Database Management System

Join Now: Data Warehousing and Data Mining