8 Python Debugging Lessons Every AI Engineer Learns

You can write elegant models, optimize hyperparameters like a wizard, and still lose an entire day to a silent bug hiding in plain sight.

That’s the part nobody tells you.

Debugging in AI isn’t just fixing code — it’s detective work across data, math, and assumptions. After ~4 years deep in Python (and more broken pipelines than I’d like to admit), I’ve noticed something: the same painful lessons keep repeating.

Let’s skip the obvious ones. These are the debugging lessons you only learn after things go really wrong.

1. If Your Model Is “Too Good,” It’s Probably Broken

Accuracy: 99.8% You: I’m a genius. Reality: You leaked the labels.

Data leakage is the most expensive “success” you’ll ever celebrate.

A classic example:

from sklearn.model_selection import train_test_split
import pandas as pd

df = pd.read_csv("data.csv")

X = df.drop("target", axis=1)
y = df["target"]

# ❌ Wrong: scaling before split
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2)

You just let your test data influence training.

Fix it:

# ✅ Correct: split first, then fit only on train
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Rule: If results look magical, assume contamination before celebration.

2. Print Statements Still Beat Fancy Debuggers

Yes, you have debuggers. Yes, you have logging frameworks.

And yet:

print(X.shape, y.shape)
print(np.isnan(X).sum())
print(np.unique(y))

These three lines have saved more projects than any IDE ever will.

Why? Because AI bugs are often data bugs, not code bugs.

Fact: Over 70% of time in ML projects is spent cleaning and validating data — not modeling.

So before stepping through code, just ask:

What’s the shape?
What’s missing?
What’s weird?

3. Your Model Isn’t Failing — Your Data Pipeline Is

You tweak layers. Change optimizers. Adjust learning rates.

Still garbage output.

The issue?

def preprocess(text):
    return text.lower().strip()

# Somewhere else...
def preprocess(text):
    return text.replace(".", "")

Two different preprocessing functions. Same name. Different behavior.

Congratulations — you trained and tested on different realities.

Fix this with a single source of truth:

class Preprocessor:
    def __init__(self):
        pass

    def transform(self, text):
        return text.lower().replace(".", "").strip()

Lesson: In AI systems, inconsistency is a silent killer.

4. Randomness Will Gaslight You

You run the same training twice.

First run: 92% accuracy Second run: 78%

Now you’re questioning your life choices.

Set your seeds:

import random
import numpy as np
import torch

def set_seed(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

set_seed()

But here’s the twist: even this isn’t always enough (hello, CUDA nondeterminism).

Real lesson: Reproducibility isn’t a switch — it’s a discipline.

5. Shape Mismatches Are the Final Boss

Nothing humbles you faster than:

def debug_tensor(name, tensor):
    print(f"{name}: shape={tensor.shape}, dtype={tensor.dtype}")

debug_tensor("input", x)
debug_tensor("output", y)

One missing token. One off-by-one error. Hours gone.

Debug shapes aggressively:

And when working with PyTorch:

Pro move: Fail early. Fail loudly.

6. Silent Failures Are Worse Than Crashes

A crash is honest. Silence is deceptive.

Example:

Training runs. No errors. No learning.

You just built a very expensive random number generator.

Add sanity checks:

If your loss doesn’t move, something’s fundamentally wrong.

7. Overfitting on Purpose Is a Superpower

One of the fastest debugging tricks:

Can your model memorize 10 samples?

small_X = X_train[:10]
small_y = y_train[:10]

for _ in range(500):
    preds = model(small_X)
    loss = criterion(preds, small_y)
    loss.backward()
    optimizer.step()

If it can’t overfit:

Your architecture is broken
Your loss is wrong
Your gradients aren’t flowing

If it can:

Your pipeline is probably fine
The issue is generalization

This single trick can save days.

8. Logs > Memory (Always)

You think you’ll remember what changed.

You won’t.

Track everything:

import json
from datetime import datetime

def log_experiment(params, metrics):
    record = {
        "time": str(datetime.now()),
        "params": params,
        "metrics": metrics
    }
    with open("experiments.json", "a") as f:
        f.write(json.dumps(record) + "\n")

log_experiment(
    {"lr": 0.001, "batch_size": 32},
    {"accuracy": 0.91}
)

Because debugging isn’t just fixing errors — it’s comparing timelines.

Thanks for walking through this journey with me. I appreciate you taking the time to read.

Comments

Loading comments…