
Errors teach more than success ever will.
You can write elegant models, optimize hyperparameters like a wizard, and still lose an entire day to a silent bug hiding in plain sight.
That’s the part nobody tells you.
Debugging in AI isn’t just fixing code — it’s detective work across data, math, and assumptions. After ~4 years deep in Python (and more broken pipelines than I’d like to admit), I’ve noticed something: the same painful lessons keep repeating.
Let’s skip the obvious ones. These are the debugging lessons you only learn after things go really wrong.
1. If Your Model Is “Too Good,” It’s Probably Broken
Accuracy: 99.8% You: I’m a genius. Reality: You leaked the labels.
Data leakage is the most expensive “success” you’ll ever celebrate.
A classic example:
from sklearn.model_selection import train_test_split
import pandas as pd
df = pd.read_csv("data.csv")
X = df.drop("target", axis=1)
y = df["target"]
# ❌ Wrong: scaling before split
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2)
You just let your test data influence training.
Fix it:
# ✅ Correct: split first, then fit only on train
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
Rule: If results look magical, assume contamination before celebration.
2. Print Statements Still Beat Fancy Debuggers
Yes, you have debuggers. Yes, you have logging frameworks.
And yet:
print(X.shape, y.shape)
print(np.isnan(X).sum())
print(np.unique(y))
These three lines have saved more projects than any IDE ever will.
Why? Because AI bugs are often data bugs, not code bugs.
Fact: Over 70% of time in ML projects is spent cleaning and validating data — not modeling.
So before stepping through code, just ask:
- What’s the shape?
- What’s missing?
- What’s weird?
3. Your Model Isn’t Failing — Your Data Pipeline Is
You tweak layers. Change optimizers. Adjust learning rates.
Still garbage output.
The issue?
def preprocess(text):
return text.lower().strip()
# Somewhere else...
def preprocess(text):
return text.replace(".", "")
Two different preprocessing functions. Same name. Different behavior.
Congratulations — you trained and tested on different realities.
Fix this with a single source of truth:
class Preprocessor:
def __init__(self):
pass
def transform(self, text):
return text.lower().replace(".", "").strip()
Lesson: In AI systems, inconsistency is a silent killer.
4. Randomness Will Gaslight You
You run the same training twice.
First run: 92% accuracy Second run: 78%
Now you’re questioning your life choices.
Set your seeds:
import random
import numpy as np
import torch
def set_seed(seed=42):
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
set_seed()
But here’s the twist: even this isn’t always enough (hello, CUDA nondeterminism).
Real lesson: Reproducibility isn’t a switch — it’s a discipline.
5. Shape Mismatches Are the Final Boss
Nothing humbles you faster than:
def debug_tensor(name, tensor):
print(f"{name}: shape={tensor.shape}, dtype={tensor.dtype}")
debug_tensor("input", x)
debug_tensor("output", y)
One missing token. One off-by-one error. Hours gone.
Debug shapes aggressively:
And when working with PyTorch:
Pro move: Fail early. Fail loudly.
6. Silent Failures Are Worse Than Crashes
A crash is honest. Silence is deceptive.
Example:
Training runs. No errors. No learning.
You just built a very expensive random number generator.
Add sanity checks:
If your loss doesn’t move, something’s fundamentally wrong.
7. Overfitting on Purpose Is a Superpower
One of the fastest debugging tricks:
Can your model memorize 10 samples?
small_X = X_train[:10]
small_y = y_train[:10]
for _ in range(500):
preds = model(small_X)
loss = criterion(preds, small_y)
loss.backward()
optimizer.step()
If it can’t overfit:
- Your architecture is broken
- Your loss is wrong
- Your gradients aren’t flowing
If it can:
- Your pipeline is probably fine
- The issue is generalization
This single trick can save days.
8. Logs > Memory (Always)
You think you’ll remember what changed.
You won’t.
Track everything:
import json
from datetime import datetime
def log_experiment(params, metrics):
record = {
"time": str(datetime.now()),
"params": params,
"metrics": metrics
}
with open("experiments.json", "a") as f:
f.write(json.dumps(record) + "\n")
log_experiment(
{"lr": 0.001, "batch_size": 32},
{"accuracy": 0.91}
)
Because debugging isn’t just fixing errors — it’s comparing timelines.
Thanks for walking through this journey with me. I appreciate you taking the time to read.