In this article we give the information about normalization in a database design technique that organizes tables in a manner that reduce redundancy (useless) and dependency of data.

Normalization in Database:

  • Normalization is the process of organizing the data in the database. It is performed to reduced data redundancy in a database. Data redundancy means having the same data at multiple places. It is necessary to remove the data redundancy because it causes anomalies in a database which makes it very hard for a database administrator to maintain it.
  • To avoid insertion, update & deletion anomaly. (Inconsistency or error)
  • Normalization divides larger tables into smaller table and link them  using relationships. (Primary key and Foreign key)

Anomaly (Insertion, Update, Delete)

Anomaly: Something that is not according to standard or normal

Table Name: student

Roll_No Name Address Dept
1 Rahul Pune BCA
1 Rahul Pune BA
2 Pranav Satara MBA
3 Geeta Sangli BCA
3 Geeta Sangli BA

Insert:

Insert into student (Roll_No, Name, Addrress) values (4, ‘Amol’, ‘Satara’);

Result: student does not allow null values

Update:

Update student set dept=’BCA’ where Roll_No=1;

Result: Update two rows

Delete:

Delete from student where Roll_No=1;

Result: Delete two rows

Normalization Types:

Here are most commonly used normal forms:

  1. First Normal Form (1NF)
  2. Second Normal Form (2NF)
  3. Third Normal Form (3 NF)
  4. Boyce & Codd Normal Form (BCNF)
  5. Forth Normal Form (4NF)
  6. Fifth Normal Form (5NF)

First Normal Form (1NF)

As per the rule of first normal form,

  • An attributes (Column) of a table cannot hold multiple values.
  • It should hold only atomic (forming a single unit) values.
  • Each record needs to be unique.

Atomic value: An atomic value is a value that cannot be divided.

Table Name: student

Roll_No Name Mobile_No Dept
1 Rahul 9986532147 BCA
2 Amol 9865326589, 9965832658 BA
3 Pranav 9789563211, 9856231458 MBA
4 Geeta 9963524789 BCA
5 Jaya 9856231458 BA

Above table, column mobile number there are two mobile numbers in rows two and three so there is not an atomic value.

After First Normalization

Table Name: student

Roll_No Name Mobile_No Dept
1 Rahul 9986532147 BCA
2 Amol 9865326589 BA
2 Amol 9965832658 BA
3 Pranav 9789563211 MBA
3 Pranav 9856231458 MBA
4 Geeta 9963524789 BCA
5 Jaya 9856231458 BA

Second normal form(2NF):-

  1. It should be in the first normal form.
  2. All non key attributes are fully functional dependent on the primary key. In simple words it should not have partial dependency

A table or relation is in 2nd normal form when it meets all the requirements of 1st normal form and all non-key attributes are completely dependent on the primary key.

Table Name: Employee

Emp_ID

Qualification

Age
101 BCA 20
101 MCA 20
101 MBA 20
102 BCom 21
102 MBA 21
103 MCA 22
104 MSW 23
104 MBA 23

Composite Primary Key: Emp_ID + Qualification

Non Key Attribute: Age

Example:

Q. Find the age of employee where employee ID is 102

SQL> Select * from Employee where emp_id=102;

O/P: 

102      BCom        21

102      MBA          21

here we get the two age values that is not in proper out put.

After Second Normalization

To make the table compiles with 2NF we can break it in two tables like this:

Second NF

Here,

1st NF: 

  • It should hold only atomic values.
  • Each record needs to be unique.

2nd NF:

  • Non key attribute should be depend on single key (that is primary key)

Third Normal Form (3rd NF):

  • It is in second normal form.
  • There is no transitive functional dependency.
  • A transitive functional dependency is when changing a non-key column, might clause any of the other non-key column to change.

ThirdNF

Here,

Emp_ID-> Zip-> City       that is City depend on zip and zip depend on Emp_ID

Emp_ID-> City                  City dependent on Emp_ID

City Indirectly depend on Emp_ID is called transitive functional dependency.

After Third Normalization;

BCNF (Boyce-Codd Normal Form)

BCNF is an advanced version of the Third Normal Form (3NF) in relational database design. It deals with certain types of anomalies not handled by 3NF. BCNF is stricter than 3NF and ensures a higher level of data integrity.

Definition of BCNF

A relation is in Boyce-Codd Normal Form (BCNF) if:

For every non-trivial functional dependency (X → Y), X is a super key.

  • Non-trivial means: Y is not a subset of X.
  • Super key: A set of one or more attributes that uniquely identify a tuple in a relation.

Why BCNF is Needed (Beyond 3NF)?

3NF allows functional dependencies where the determinant is not a candidate key, as long as the dependent is a prime attribute.
This can still lead to anomalies. BCNF fixes this by ensuring every determinant is a super key.

Example of BCNF Violation

Table: CourseInstructor

CourseID Instructor Room
CS101 Smith A1
CS102 Smith A1
CS103 Jones B1

Functional Dependencies:

  1. CourseID → Instructor (Each course has one instructor)
  2. Instructor → Room (Each instructor has a fixed room, but instructor is not a super key)

Issue:

  • Instructor → Room violates BCNF because Instructor is not a super key.

How to Convert to BCNF

Step 1: Decompose the table into two:

  • Instructor_Room (Instructor, Room)
  • Course_Instructor (CourseID, Instructor)

Now:

  • Instructor → Room is in a separate table where Instructor is a key.
  • CourseID → Instructor remains in the second table.

Advantages of BCNF

  • Removes more redundancy than 3NF.
  • Eliminates anomalies related to non-super key dependencies.
  • Ensures stronger data integrity.

Disadvantages

  • Decomposition may lead to more tables, increasing the complexity of joins.
  • Sometimes performance may degrade due to more joins in queries.

Advantages of Normalization:

Normalization is a process in database design that organizes data to reduce redundancy and improve data integrity.

  1. Eliminates Data Redundancy
  • Removes duplicate data.
  • Each piece of information is stored only once.
  • Example: A student’s name is not repeated in every subject record.
  1. Improves Data Integrity and Consistency
  • Ensures that data is accurate and consistent across the database.
  • Updates are done in one place, avoiding conflicting or outdated values.
  1. Easier Data Maintenance
  • Simplifies insert, update, and delete operations.
  • Reduces the chances of anomalies (insertion, update, deletion anomalies).
  1. Saves Storage Space
  • By removing duplicate data, less storage is used.
  • More efficient use of memory and disk space.
  1. Better Query Performance in Some Cases
  • Smaller tables are faster to scan and join under certain conditions.
  • Well-organized schema can help the database engine optimize queries.
  1. Logical Data Structure
  • Makes the database more understandable and organized.
  • Tables are designed with clear relationships and purposes.
  1. Flexibility in Database Design
  • Easier to make future changes in schema (e.g., adding new fields or relations).
  1. Supports Data Integrity Constraints
  • Normalized tables better support foreign keys, primary keys, and unique constraints, helping maintain referential integrity.

Some More: 

POP- Introduction to Programming Using ‘C’

DS – Data structure Using C

OOP – Object Oriented Programming 

Java Programming

DBMS – Database Management System

RDBMS – Relational Database Management System

Join Now: Data Warehousing and Data Mining 

Leave a Reply

Your email address will not be published. Required fields are marked *