Building Execution Visibility: From Black Box to Glass Box

Maudel Team

•Mar 10, 2026

Cover for Building Execution Visibility: From Black Box to Glass Box

Our first pipeline was a black box. You kicked off an execution, waited anywhere from five minutes to an hour, and either got a result or an error message. There was no way to know what was happening inside. And engineers wouldn’t use it.

The transformation from black box to glass box happened over ten days in March 2026 — and it changed everything about how our team interacted with the system.

Day 1: Streaming Events

The first step was exposing what the AI agent was actually doing in real time. We built a streaming event system that captures:

Thinking events — when the agent is reasoning
Tool usage — when it calls external tools or MCP endpoints
File operations — reads, writes, and modifications
Subagent handoffs — when work is delegated to specialized agents
Text output — the actual content being generated

Each event streams to the UI as it happens. No polling. No waiting for completion. You see the agent work in real time.

Day 6: The Command Center

Streaming events were useful but overwhelming. Raw event streams are like watching server logs — technically informative but practically unusable for most of the team.

We built a dashboard with four tabs:

Summary — high-level status, current stage, completion percentage, and milestone checklists. For managers and leads who want the big picture.

Stream — the real-time bubble timeline showing every agent action. For engineers who want to watch the work happen. Each bubble shows the event type, duration, and content preview.

Decisions — pending Human-in-the-Loop decisions with context, options, and timeouts. For stakeholders who need to approve or steer.

Log — full execution history with timestamps, token counts, and cost data. For debugging and post-execution analysis.

Day 7: Token Tracking and Cost Monitoring

Once we had streaming, adding telemetry was straightforward. Every execution now tracks:

Thinking tokens vs. output tokens
Per-stage token consumption
Cumulative cost per execution
Per-step milestone completion

This data isn’t just for billing. It tells you which stages are expensive, which prompts are inefficient, and where the agent is spending disproportionate time reasoning.

The Trust Equation

The adoption curve tells the story. Before streaming: 2 engineers willing to run the pipeline. After the command center: the full team.

The reason is simple. Engineers won’t trust what they can’t see. When the pipeline was a black box, the only signal was the final output — and if it was wrong, there was no way to understand why. With real-time visibility, engineers can:

Catch problems early — a wrong tool call in stage 2 is visible immediately, not after 40 minutes of wasted execution
Build intuition — watching the agent work teaches engineers how to write better prompts and structure better inputs
Debug effectively — when something goes wrong, the execution log pinpoints exactly where and why

The Lesson

AI trust is built through transparency. If your AI pipeline doesn’t show its work, your team won’t use it. Observability isn’t a nice-to-have for AI-driven development — it’s the foundation of adoption.

The dashboard became a command center, not a status page. And that distinction matters.

Back to all posts