Beyond CLAUDE.md

Most teams discover CLAUDE.md, paste a few rules into it, and call it done. Better than nothing. Still not enough.

If you want agent output you can trust, you need a harness. Not a prompt. A harness.

Think of it like this: CLAUDE.md sets defaults. The harness defines behavior.

What a harness actually is

Harness engineering is the work around the model, not inside it. You shape what the agent can see, what it can run, and how it gets corrected when it drifts.

In practice, that usually means three layers:

Project instructions in CLAUDE.md so every run starts with your standards.
Skills for repeatable workflows the model should execute the same way every time.
Custom CLIs that expose your systems as safe, narrow interfaces instead of free-form shell chaos.

That stack is where reliability comes from. Not clever prompting.

Layer 1: CLAUDE.md sets the ground rules

Your CLAUDE.md should answer questions before the model asks them: code style, architecture boundaries, how to run tests, what not to touch, when to ask for review.

Good instruction files reduce ambiguity. They do not remove judgment. The model still needs structure around execution, which is where the next two layers matter.

Layer 2: skills stop you from re-explaining workflows

Any process you repeat more than twice should become a skill. Bug triage. UI verification. Release notes. Dependency audits. Whatever your team runs every week.

Without skills, the agent improvises. Sometimes it improvises well. Sometimes it invents a new process at 2 a.m. that nobody can debug later.

Skills turn "please do this carefully" into a deterministic runbook.

Layer 3: custom CLIs create safe boundaries

Most enterprise systems were not designed for LLM-first workflows. That is why custom CLIs matter. You give the agent a stable command surface with explicit inputs, explicit outputs, and known failure modes.

Instead of asking an agent to poke APIs directly, you hand it commands like:

t-linear issue SIE-27
t-linear comment SIE-27 "progress update"
t-linear update SIE-27 --state "In Review"

Now it can move fast without guessing schema details, endpoint behavior, or auth flows. You moved complexity out of the prompt and into tooling where it belongs.

Back pressure is part of the harness

A harness without feedback is still a guessing machine. Add checks that force reality back into the loop: tests, linters, screenshots, and build verification.

This is the same pattern behind strong CI pipelines. Agentic workflows just make it more obvious: if your system does not push back, low-quality output accumulates fast.

Where to start this week

Rewrite CLAUDE.md to include concrete commands and non-negotiable boundaries.
Extract one repeated workflow into a skill.
Wrap one internal API into a focused CLI command the agent can call safely.

Do those three things and your agent quality changes immediately.

Want the full system?

This post maps to Module 4 of my course concept: Harness Engineering. In the workshop, we build this setup end-to-end on a live project, not just slides.

If you want to go deeper, check the full course concept or see the workshop formats.