
Our first pipeline was a black box. You kicked off an execution, waited anywhere from five minutes to an hour, and either got a result or an error message. There was no way to know what was happening inside. And engineers wouldn’t use it.
The transformation from black box to glass box happened over ten days in March 2026 — and it changed everything about how our team interacted with the system.
Day 1: Streaming Events
The first step was exposing what the AI agent was actually doing in real time. We built a streaming event system that captures:
- Thinking events — when the agent is reasoning
- Tool usage — when it calls external tools or MCP endpoints
- File operations — reads, writes, and modifications
- Subagent handoffs — when work is delegated to specialized agents
- Text output — the actual content being generated
Each event streams to the UI as it happens. No polling. No waiting for completion. You see the agent work in real time.
Day 6: The Command Center
Streaming events were useful but overwhelming. Raw event streams are like watching server logs — technically informative but practically unusable for most of the team.
We built a dashboard with four tabs:
Summary — high-level status, current stage, completion percentage, and milestone checklists. For managers and leads who want the big picture.
Stream — the real-time bubble timeline showing every agent action. For engineers who want to watch the work happen. Each bubble shows the event type, duration, and content preview.
Decisions — pending Human-in-the-Loop decisions with context, options, and timeouts. For stakeholders who need to approve or steer.
Log — full execution history with timestamps, token counts, and cost data. For debugging and post-execution analysis.
Day 7: Token Tracking and Cost Monitoring
Once we had streaming, adding telemetry was straightforward. Every execution now tracks:
- Thinking tokens vs. output tokens
- Per-stage token consumption
- Cumulative cost per execution
- Per-step milestone completion
This data isn’t just for billing. It tells you which stages are expensive, which prompts are inefficient, and where the agent is spending disproportionate time reasoning.
The Trust Equation
The adoption curve tells the story. Before streaming: 2 engineers willing to run the pipeline. After the command center: the full team.
The reason is simple. Engineers won’t trust what they can’t see. When the pipeline was a black box, the only signal was the final output — and if it was wrong, there was no way to understand why. With real-time visibility, engineers can:
- Catch problems early — a wrong tool call in stage 2 is visible immediately, not after 40 minutes of wasted execution
- Build intuition — watching the agent work teaches engineers how to write better prompts and structure better inputs
- Debug effectively — when something goes wrong, the execution log pinpoints exactly where and why
The Lesson
AI trust is built through transparency. If your AI pipeline doesn’t show its work, your team won’t use it. Observability isn’t a nice-to-have for AI-driven development — it’s the foundation of adoption.
The dashboard became a command center, not a status page. And that distinction matters.
