The 5 Tiers of Agentic Harness Sophistication

Maudel Team avatar
Maudel Team
Cover for The 5 Tiers of Agentic Harness Sophistication

Not all agent systems are created equal. A chatbot that suggests code fixes and an autonomous system that manages an entire software delivery initiative are both called “AI agents” — but they operate at fundamentally different levels of sophistication.

The problem is that most classification systems focus on capability — what an agent can do. But capability alone doesn’t tell you how the system actually behaves in production. A system with advanced planning, multi-agent coordination, and tool use might still require human approval at every step. That’s not the same as a system that owns execution end-to-end.

We need a framework based on operational shape — how the system actually runs, who holds authority, and how long it can sustain work without human intervention.

Operational Shape Over Capability

A tier model should not just describe capability. It should describe operational shape — the actual behavior pattern of the system in production.

This framework uses two primary axes:

  • X-axis: Human Authority → Agentic Authority — Who decides what happens next?
  • Y-axis: Short-Running → Persistent/Run-Until-Done — How long can the system own work without a human in the loop?

Eight dimensions define each tier: agentic authority, human authority, scope of responsibility, time horizon, orchestration style, evaluation approach, agent count, and whether work continues without human presence.

A critical override rule applies: A harness is classified by the highest level of authority it can exercise in normal operation, not by the most advanced component it contains. This prevents vendor inflation. A system may have multiple agents, planners, and verifiers — but if it requires human approval at every consequential step, it’s still Tier 2 or 3.

The Five Tiers

Tier 1: Conversational Harness

The human is clearly in control. The agent helps think, explain, draft, and suggest. Human authority is very high; agentic authority is low. The system stops and waits for the user after major thoughts.

Examples: ChatGPT conversations, Claude chat, basic enterprise Q&A bots.

Litmus test: The system stops and waits for the user after major thoughts.

Tier 2: Guided Execution Harness

The agent can take actions — editing files, running commands, calling APIs — but only within a human-governed loop. The agent proposes, the human approves. You still feel like you’re driving.

Examples: Claude Code with user approvals, coding agents that ask before modifying files.

Litmus test: The system can act, but you still feel like you’re driving it.

Tier 3: Orchestrated Run Harness

The first true orchestration tier. The harness owns execution for a bounded run. It can spawn workers, route subtasks, choose tools dynamically, and execute multi-step workflows — but all within one bounded initiative.

Planning is system-generated, structured, and decomposed into subtasks. Built-in self-checking is first-class. The human starts the run; many internal agentic steps occur without interruption.

Examples: A coding orchestrator that spawns planner, implementer, and verifier agents in a single run; document-processing workflows.

Litmus test: Can take a goal, decompose it, execute internal steps, and return a result without babysitting.

Tier 4: Persistent Orchestration Harness

The system manages ongoing work over time — operationally like a team coordinator or factory foreman. It manages multiple active and pending tasks with the ability to pause, resume, retry, escalate, and reroute. It coordinates multiple agents over longer timelines.

Planning is persistent and can evolve during the initiative. Evaluation is mandatory and sophisticated — not just “is this good?” but “did the task succeed?”, “did it violate policy?”, “should we retry or escalate?”, “is it safe to continue?”

Persistence is critical: audit trails, permission logs, recovery mechanisms, resumability, and task identity are essential. The system survives interruptions and keeps progressing.

Examples: Long-running software implementation orchestration, enterprise operations copilots managing queues, agentic SDLC orchestration.

Litmus test: Can juggle several tasks over time, survive interruptions, and keep progressing without a human every few minutes.

Tier 5: Autonomous Factory Harness (Dark Factory)

The system owns the initiative end-to-end. Full planning, orchestration, execution, and evaluation with no required human oversight in the operating loop. It can spawn, coordinate, terminate, and replace workers; operate continuously until objective completion or policy stop.

Important caveat: This tier is more aspirational than commonly deployed. Many claim Tier 5, but most systems are actually Tier 3 or 4 wearing Tier 5 marketing.

Litmus test: Can be given a mission and left alone to plan, execute, verify, adapt, and finish.

The 5-Question Field Classification

To quickly classify any agent harness, ask these five questions:

  1. Can it act without asking every step? (No → Tier 1)
  2. Can it keep working after the initial run starts? (No → Tier 2)
  3. Can it manage multiple tasks over time? (No → Tier 3)
  4. Can it evaluate its own work and continue based on that? (No → Tier 3)
  5. Can it complete an initiative without human review in the loop? (No → Tier 4, Yes → Tier 5)

Where Maudel Stands: Tier 4

Maudel is best classified as a Tier 4 — Persistent Orchestration Harness. More specifically, it is a governed, enterprise-grade Tier 4 system composed of multiple Tier 3 orchestrated runs.

Why Tier 4 and not Tier 3: Maudel is not framed as “give me one task and I’ll do one run.” It is a system for running software initiatives — from idea through PRD, stories, implementation, testing, and deployment. This exceeds a bounded run in both scope and duration. The system maintains persistent workflow state, manages task queues and dependencies, and coordinates multiple agents over extended timelines.

Why Tier 4 and not Tier 5: Maudel’s design deliberately emphasizes control, safety, oversight, traceability, and deterministic orchestration. It is agentically powerful but operationally governed — a deliberate design choice to keep humans and deterministic policy in meaningful control.

We’ve Built Across Four of the First Five Tiers

At Maudel, we don’t just theorize about agentic harness sophistication — we’ve built systems that operate across Tiers 1 through 4.

  • Tier 1 — Conversational interfaces for requirement gathering and expert consultation
  • Tier 2 — Guided execution with human-in-the-loop governance at every decision point
  • Tier 3 — Bounded orchestrated runs for individual pipeline stages — planning, implementation, testing, and validation
  • Tier 4 — Persistent orchestration managing end-to-end software delivery initiatives across stages, roles, artifacts, and verification loops

This breadth of experience is what makes us Agentic Enterprise Orchestration experts. We understand the tradeoffs at every tier — when to let agents run, when to gate with governance, and how to build systems that are both powerful and trustworthy.

The Classification Dimensions

For those who want the full picture, here are the eight dimensions that define each tier:

DimensionTier 1Tier 2Tier 3Tier 4Tier 5
Agentic AuthorityLowModerateModerate-HighHigh (governed)Very High
Human AuthorityVery HighHighMediumMedium-LowVery Low
ExecutionLight tool useGated actionsMulti-step, boundedPersistent, multi-taskEnd-to-end autonomous
PlanningAgent suggestsAgent proposesSystem-generatedPersistent, evolvingFirst-class, adaptive
EvaluationHuman judgmentMostly humanBuilt-in self-checkingMandatory, sophisticatedFirst-class, continuous
Time HorizonSingle turnOne sessionMinutes to hoursHours to daysRun-until-done
OrchestrationMinimalImplicitExplicitStrong hybridAdvanced hybrid
Agent StructureSingleSingle + toolsSingle/multi, short-runMulti-agent, long-runningMulti-agent ecosystem

What This Means for Your Organization

If you’re evaluating or building agentic systems, this framework gives you a shared vocabulary for what you’re actually building — not what the marketing says.

The most common mistake we see: organizations building Tier 3 systems and marketing them as Tier 5. The second most common: organizations stuck at Tier 2 because they haven’t invested in the orchestration layer that enables Tier 3+.

The orchestration layer is the differentiator. It’s what separates “we use AI agents” from “we have a governed, traceable, enterprise-grade AI engineering process.”

That’s what we build at Maudel.