Database Normalization
Organizing tables to reduce redundancy and avoid update anomalies
Overview
Normalization is the process of splitting data into well-structured tables so each fact is stored in exactly one place. It removes duplication and prevents anomalies where updating one copy of a value leaves others stale. The common beginner goal is to reach third normal form (3NF), where every non-key column depends only on the primary key.
Syntax / Usage
Instead of repeating related details in every row, move them into their own table and link with a foreign key.
-- Unnormalized: customer name repeats on every order
-- orders(id, customer_name, customer_city, total)
-- Normalized: customer details live in one table
CREATE TABLE customers (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
city TEXT
);
CREATE TABLE orders (
id INTEGER PRIMARY KEY,
customer_id INTEGER NOT NULL REFERENCES customers(id),
total NUMERIC(10, 2) NOT NULL
);
Examples
Because the customer name is stored once, a single update fixes it everywhere:
UPDATE customers SET city = 'Berlin' WHERE id = 42;
Splitting a repeating group into its own table (first normal form):
-- One row per phone number instead of phone1, phone2, phone3
CREATE TABLE customer_phones (
customer_id INTEGER REFERENCES customers(id),
phone TEXT NOT NULL
);
Common Mistakes
- Repeating the same descriptive text across many rows instead of referencing one table
- Storing multiple values in a single column, breaking first normal form
- Over-normalizing tiny lookup values so every query needs extra joins
- Confusing normalization (correctness) with performance tuning, which sometimes calls for controlled denormalization
See Also
databases-relational-basics databases-keys-and-relationships databases-sql-vs-nosql