stackademic

The leading education platform for anyone with an interest in software development.

Normalization

Organizing tables to reduce redundancy and avoid update anomalies

Overview

Normalization is the process of splitting data into well-structured tables so each fact is stored in exactly one place. It removes duplication and prevents anomalies where updating one copy of a value leaves others stale. The common beginner goal is to reach third normal form (3NF), where every non-key column depends only on the primary key.

Syntax / Usage

Instead of repeating related details in every row, move them into their own table and link with a foreign key.

-- Unnormalized: customer name repeats on every order
-- orders(id, customer_name, customer_city, total)

-- Normalized: customer details live in one table
CREATE TABLE customers (
  id   INTEGER PRIMARY KEY,
  name TEXT NOT NULL,
  city TEXT
);

CREATE TABLE orders (
  id          INTEGER PRIMARY KEY,
  customer_id INTEGER NOT NULL REFERENCES customers(id),
  total       NUMERIC(10, 2) NOT NULL
);

Examples

Because the customer name is stored once, a single update fixes it everywhere:

UPDATE customers SET city = 'Berlin' WHERE id = 42;

Splitting a repeating group into its own table (first normal form):

-- One row per phone number instead of phone1, phone2, phone3
CREATE TABLE customer_phones (
  customer_id INTEGER REFERENCES customers(id),
  phone       TEXT NOT NULL
);

Common Mistakes

  • Repeating the same descriptive text across many rows instead of referencing one table
  • Storing multiple values in a single column, breaking first normal form
  • Over-normalizing tiny lookup values so every query needs extra joins
  • Confusing normalization (correctness) with performance tuning, which sometimes calls for controlled denormalization

See Also

databases-relational-basics databases-keys-and-relationships databases-sql-vs-nosql