Normalization is the process of organizing and structuring data in a database to minimize redundancy and improve efficiency. It involves breaking down a large table into smaller tables and establishing relationships between them.
Advantages of normalization:
- Eliminates Redundancy: By breaking down data into smaller tables, normalization helps remove duplicate or redundant information. This improves data consistency and reduces storage space.
- Improves Data Integrity: Normalization ensures that data is accurate and consistent across the database. By eliminating redundant data, it reduces the chances of inconsistencies or conflicting information.
- Enhances Data Consistency: With normalized tables, updates or modifications need to be made in fewer places. This helps maintain data consistency, as changes are applied consistently throughout the database.
- Simplifies Data Maintenance: Normalized tables are easier to maintain and update. Since data is divided into smaller, manageable tables, it becomes simpler to add, modify, or delete records without affecting other parts of the database.
- Supports Efficient Querying: Normalization optimizes database performance by reducing data redundancy and establishing relationships between tables. This allows for faster and more efficient retrieval of information through queries.
- Facilitates Scalability: Normalized databases are more scalable, as the structure is flexible and adaptable to changes. As new data needs arise, it is easier to expand the database without sacrificing efficiency or introducing inconsistencies.
Concept of 1NF,2NF and 3NF:
1NF, 2NF, and 3NF are abbreviations for the first three levels of database normalization. They represent first normal form, second normal form, and third normal form, respectively.
Additional normal forms beyond 3NF include 4NF (fourth normal form) and 5NF (fifth normal form). Although there is also 6NF (sixth normal form), the most commonly encountered normal form is 3NF.
Each level of normalization builds upon the previous ones. This means that the principles of 1NF are carried over to 2NF, and so on.
First Normal Form (1NF):
To satisfy the requirements of 1NF, a table must meet the following conditions:
- Each cell should contain only a single value (atomicity).
- A primary key must exist for unique identification.
- There should be no duplicate rows or columns.
- Each column should have only one value for each row in the table.
Second Normal Form (2NF):
While 1NF eliminates repeating groups, it does not address redundancy. Therefore, 2NF was introduced.
A table is considered to be in 2NF if it fulfills the following criteria:
- It already satisfies 1NF requirements.
- No partial dependencies exist. In other words, all non-key attributes are fully dependent on the primary key.
Third Normal Form (3NF):
When a table is in 2NF, it removes repeating groups and redundancy. However, it may still have transitive partial dependencies.
This means that a non-prime attribute (an attribute that is not part of the candidate key) depends on another non-prime attribute. The purpose of the third normal form (3NF) is to eliminate such dependencies.
Therefore, for a table to be in 3NF, it must:
- Be in 2NF.
- Have no transitive partial dependencies.
Examples:
Let us consider the following table:
Employee_Id | Name | Job_Code | Job | State_Code | Home_State |
E01 | Ram | J01 J02 | Chef Waiter | 26 | Damak |
E02 | Hari | J02 J03 | Waiter Bartender | 56 | Urlabari |
E03 | Kripa | J01 | Chef | 56 | Urlabari |
The above table is not in 1NF as it contains non-atomic value. So, making the table in 1 NF as follows:
Employee_Id | Name | Job_Code | Job | State_Code | Home_State |
E01 | Ram | J01 | Chef | 26 | Damak |
E01 | Ram | J02 | Waiter | 26 | Damak |
E02 | Hari | J02 | Waiter | 56 | Urlabari |
E02 | Hari | J03 | Bartender | 56 | Urlabari |
E03 | Kripa | J01 | Chef | 56 | Urlabari |
The table is now in 1NF but not in 2NF because,
With Employee_id, we can know Name, State_Code and Home_State. This means they are dependent on Employee_Id. Now separating the table into two tables, attribute dependent on Employee_Id & attribute independent on Employee_ID:
Employees table:
Employee_Id | Name | State_Code | Home_State |
E01 | Ram | 26 | Damak |
E02 | Hari | 56 | Urlabari |
E03 | Kripa | 56 | Urlabari |
Employee_Roles table:
Employee_Id | Job_Code |
E01 | J01 |
E01 | J02 |
E02 | J02 |
E02 | J03 |
E03 | J01 |
Jobs table:
Job_Code | Job |
J01 | Chef |
J02 | Waiter |
J03 | Bartender |
Now the tables are in 2NF but not in 3NF because,
Home_state depends on State_code.
So breaking Employees table into two tables,
Employees table:
Employee_Id | Name | State_Code |
E01 | Ram | 26 |
E02 | Hari | 56 |
E03 | Kripa | 56 |
State table:
State_Code | Home_State |
26 | Damak |
56 | Urlabari |
Employee_Roles table:
Employee_Id | Job_Code |
E01 | J01 |
E01 | J02 |
E02 | J02 |
E02 | J03 |
E03 | J01 |
Jobs table:
Job_Code | Job |
J01 | Chef |
J02 | Waiter |
J03 | Bartender |
Hence the tables are in 3NF.