Stop Shipping AI Slop: 9-Step Code & Security Audit Before You Push

Working code and clean code aren’t the same thing.

Right now there’s a line of people vibe coding their way through projects — and honestly, the speed is real. AI can scaffold something functional in minutes. That part is genuinely useful, especially when you’re building while you learn. Nobody’s taking that away.

But here’s what’s also real: a lot of that code is slop. Functions that do six things. Variables named x and temp2. Imports that haven't been used since three refactors ago. Security vulnerabilities sitting quietly in utility functions nobody reviewed. Logic so tangled that even the person who wrote it couldn't explain it a week later.

People can see it. Developers can see it. Reviewers can see it. Anyone who opens that codebase and knows what clean code looks like can see it immediately. AI-generated spaghetti has a texture to it — but texture that makes you uncomfortable. Like an ugly sweater that makes you sweaty and itchy.

Building while you learn is fine. Using AI to accelerate is fine. But if you’re shipping code you don’t understand, that you haven’t audited, that you’ve never actually read line by line — that codebase has a shelf life. It will break in ways you can’t debug. It will be seen for exactly what it is: a mess that works until it doesn’t.

The difference between code that holds up and code that doesn’t isn’t talent. It’s discipline. It’s a standard you apply every single time, without exceptions, before anything touches your main branch.

This is that standard.

Why a Pipeline at All?

Here’s the thing nobody tells you when you start writing code for security-adjacent projects: it’s not enough for your code to work. It has to be defensible.

Working code runs. Clean code runs, stays readable six months later, doesn’t hide vulnerabilities in complexity, and doesn’t surprise you when something breaks. Those are not the same bar.

The pipeline I’m about to walk you through doesn’t take long once it’s habitual. The tools run fast. What it does take is the discipline to not skip Step 4 (the security scan) just because you “only added a utility function.” That’s how vulnerabilities get shipped.

Step 1: Back Up First

cp src/file.py src/file.py.bak

This is five seconds of work that has saved me hours. Before you touch any file for an audit pass, copy it. If something goes sideways mid-edit — a tool makes an unexpected change, you lose your place in a refactor — you restore instantly. One command.

It sounds obvious. Do it anyway.

Step 2: ruff — Fast Lint First

ruff check src/file.py

ruff is a Python linter built in Rust. It’s fast in a way that feels almost disrespectful to other tools — we’re talking milliseconds on files that would take flake8 seconds.

It catches style violations, unused imports, shadowed variables, and a surprising amount of logic-level issues. I run it first because it clears out the noise before the heavier tools run. If a file has 12 unused imports and 8 style warnings, every other tool’s output is buried under that noise.

ruff first. Clear the surface. Then go deeper.

Step 3: black — Format and Move On

black --check src/file.py   # what will change?
black --diff src/file.py    # exactly what will change?
black src/file.py           # apply it

I used to have opinions about formatting. I don’t anymore. black took them from me and I’m grateful.

black is opinionated and zero-config. It reformats your code to one consistent style and gives you no options about it. That’s the point. You stop wasting energy on formatting debates and spend it on the things that actually matter.

One thing to watch: black and pylint occasionally disagree. Black wraps long lines in ways that pylint will flag as style violations. Always re-run pylint after black, not before, or you’ll chase warnings that aren’t there yet.

Step 4: bandit — The Security Scan

bandit src/file.py -f txt

This is the one I take most seriously. bandit scans Python code for known vulnerability patterns — things like hardcoded passwords, unsafe deserialization, weak cryptography, shell injection risks. It reports findings by severity (High, Medium, Low) and confidence.

My rules:

High severity: Fixed before anything else happens. Non-negotiable.
Medium severity: Reviewed and understood. Not dismissed until I know exactly what it’s flagging and why.
Low severity: Candidates for # nosec annotation only if the finding is genuinely intentional and it be can explained why in a comment.

The keyword there is understood. The most dangerous thing you can do with a security tool is silence its warnings without understanding them. That’s not security — that’s theater.

Step 5: mypy — Catch Type Bugs Before Runtime

mypy src/file.py --ignore-missing-imports

Static type checking catches a whole category of bugs that tests often miss — type mismatches that only surface at runtime under specific conditions. mypy finds them at analysis time.

--ignore-missing-imports is there because a lot of third-party libraries don't ship type stubs. Without it, mypy throws errors on perfectly valid imports from libraries that simply haven't annotated their code. You don't want that noise.

Think of type annotations as documentation that the interpreter enforces. Write them. Let mypy check them.

Step 6: pylint — The Deep Analysis

pylint src/file.py --output-format=text

pylint gives you a score out of 10.00 and tells you exactly why you’re not higher. It catches everything from missing docstrings and naming convention violations to architectural complexity and code that’s “technically working but structurally a mess.”

I aim for 9.5 or above. A genuine 9.3 is better than a gamed 10.00 with a stack of silenced warnings. The score exists to surface problems, not to be inflated.

And again — run pylint after black. Black’s line-wrapping behavior can introduce new pylint warnings. If you run pylint before black, apply black, and then skip re-running pylint, you’re not actually checking the final version of the file.

Step 7: pytest — All Tests Must Pass

python3 -m pytest tests/ -v

This one is simple: if tests fail, you don’t commit. Full stop.

Not “I’ll fix the tests later.” Not “it’s probably fine.” Tests exist to tell you when something broke. If you commit over failing tests, you’ve made the tests meaningless. You’ve broken your own feedback mechanism.

Green test suite before every commit. No exceptions.

Step 8: vulture — Find the Dead Code

vulture src/file.py

Dead code is technical debt with a pulse. Unused functions, abandoned variables, imports that haven’t been needed since a refactor three months ago — they all accumulate, clutter the codebase, and occasionally cause real confusion when someone (you, six months from now) tries to understand what a file is doing.

vulture finds them. The default 60% confidence threshold means it’s surfacing probable dead code, not just certainties. Treat findings as leads, not verdicts. Public API utility functions will always show as “unused” because vulture can’t see cross-file calls — that’s expected. Everything else deserves a second look.

Step 9: radon — Know Your Complexity

radon cc src/file.py -s

Cyclomatic complexity measures the number of independent paths through a function. radon grades each function from A (simple) to F (extremely complex).

The grades I care most about:

C — Worth examining. Schedule a refactor.
D — Needs refactoring. Prioritize it.
F — Refactor before it spreads.

A function with 25 code paths is a function you cannot confidently test, cannot confidently debug, and cannot confidently hand off to someone else. High complexity is a compounding liability. Catch it early.

The Commit Gate

Here’s the checklist I run through before anything gets committed:

[ ] ruff — zero warnings
[ ] black — formatting applied
[ ] bandit — zero High severity findings
[ ] mypy — no type errors
[ ] pylint — 9.5 or above
[ ] pytest — all tests passing
[ ] vulture — dead code reviewed
[ ] radon — no F grades

Every item. Every time.

The Part Nobody Warns You About: Vibe Hackers

AI lowered the bar for building apps. It lowered the bar for writing attack scripts too.

People who couldn’t write a working exploit in 2020 can generate one in thirty seconds now. Automated scanners, credential stuffers, API abuse scripts, SQL injection payloads — these are running against public-facing apps constantly, at scale, looking for exactly the gaps that unaudited AI-generated code leaves open. We call them vibe hackers. They don’t need to understand the code either. They just run it.

The answer isn’t to stop using AI. It’s to use AI to fight the misuse of AI. That means scanning your own code — Step 4 of this pipeline — auditing your dependencies, and actually understanding what your app exposes. If you’re building anything with real users, you also need to know that CCPA (California) and GDPR (EU) apply the moment you’re collecting data. “I didn’t know” is not a legal defense and the fines are not small.

Security is not a feature you add at the end. It’s the foundation. It should be in every step of your build.

This Pipeline Isn’t Just for Python

These tools are Python-based but the discipline applies to every language. Whatever you’re writing in, there are equivalent tools — and you should be running them before every push.

Here’s a quick breakdown of the top languages, what to watch for, and where attackers focus:

JavaScript / Node.js — The language of the web and most web vulnerabilities. XSS, prototype pollution, supply chain attacks through npm. Tools: eslint, semgrep, npm audit.

C — The language exploits are written in. Buffer overflows, memory corruption, shellcode. No runtime safety net. Tools: cppcheck, valgrind, clang-tidy.

C++ — Same attack surface as C with more complexity. Rootkits and ransomware live here. Tools: cppcheck, AddressSanitizer.

Java — Enterprise backend, Android, and the home of Log4Shell. Deserialization vulnerabilities are Java’s most dangerous attack surface. Tools: SpotBugs, SonarQube, OWASP Dependency-Check.

Go — Fast growing in cloud tooling and in malware. C2 frameworks are being built in Go specifically because AV struggles with it. Tools: golangci-lint, gosec.

Rust — Memory safe by design, increasingly used in both security tooling and malware for the same reason. AV struggles with Rust binaries too. Tools: clippy, cargo audit.

C# — Windows ecosystem and offensive tooling. SharpHound, Covenant C2, and most red team utilities are C#. Tools: Roslyn analyzers, SonarQube.

Ruby — Rails and Metasploit. If you’re writing Metasploit modules, you’re writing Ruby. Tools: RuboCop, Brakeman, bundler-audit.

PHP — Legacy web but still everywhere. Webshells are almost always PHP. Tools: PHPStan, Psalm.

Assembly (x86/x64) — Not a build language for most people, but as someone in cybersecurity this one had to be on the list. Shellcode, reverse engineering, malware analysis — when you’re reading a CVE exploit or analyzing a binary, you’re reading Assembly. Tools: Ghidra, IDA Free, Binary Ninja.

The full reference with detailed breakdowns lives in the GitHub repo → github.com/commit-issues/code-audit

Why This Matters

I built this pipeline because I wanted my code to be something I could defend. Not just “it works” — but “I know what’s in it, I know it’s not hiding vulnerabilities, I know it’s maintainable, and I can prove it.”

That standard didn’t come from a course or a textbook. It came from running enough audits to understand what each tool actually surfaces, and what the gaps are between them.

If you’re building security tools, or anything that handles sensitive data, or just anything you care about — you need a standard like this. Yours might look different from mine. That’s fine. The point is that you have one, you follow it consistently, and you understand why each step is there.

Tools don’t write clean code. Discipline does.