mission-control-ui: Engineering Retrospective

mission-control-ui: Dashboards for Autonomous Systems

The harder problem in building autonomous agent systems is not getting the agents to run — it’s knowing what they’re doing while they run. Agents that operate without human involvement at each step can get into bad states, loop on a failed subtask, or proceed through a workflow that’s technically executing but producing wrong outputs. Without a purpose-built interface for monitoring that state, you’re left reading logs and guessing. mission-control-ui was built to close that gap: a dashboard for operators to monitor running agent jobs, inspect their state, and intervene when necessary.

What Changed

The first design challenge was deciding what “state” actually meant for an agent system. A running HTTP service has a handful of meaningful status indicators: up/down, request latency, error rate. An agent running a multi-step workflow has much richer state — which step it’s on, what its last LLM call returned, whether it’s waiting on an external dependency, how many retries it’s burned on the current step. The dashboard needed to represent all of this without overwhelming the operator.

The solution was a two-level view: a job list that showed running and recent workflows at a summary level, and a detail panel that expanded into the full step trace for a selected job. The summary level showed enough to identify which jobs needed attention — status (running, blocked, failed, complete), elapsed time, current step name, and a simple health indicator derived from retry count. The detail panel showed the full checkpoint history, step inputs and outputs, and the gate results for any enforcement points in the workflow. This let operators triage at a glance and drill in only when something actually needed investigation.

Action controls were the most contested part of the design. The question was how much the dashboard should let operators do versus simply observe. Full control — ability to modify workflow state, inject values, skip steps — would make the dashboard powerful but also make it easy to corrupt a running workflow. The decision was to keep the action surface narrow: operators could pause a job, retry the current step, or cancel. Anything more invasive required going directly to the underlying system. That constraint kept the UI honest about what it was: a monitoring tool with limited intervention capabilities, not a workflow editor.

Why It Mattered

The dashboard made it practical to run longer, more complex agent workflows with confidence. Without it, increasing workflow complexity meant increasing the amount of time spent watching logs to verify things were proceeding correctly. With it, you could queue up several jobs, check the dashboard periodically, and trust that anything that needed attention would surface visibly rather than failing silently.

The human oversight question turned out to be the most important design constraint. Fully autonomous operation is only sustainable if the operator can verify that the system is behaving correctly without reviewing every decision. The dashboard made that verification cheap enough that it was actually done rather than skipped.

What Held Up / What Didn’t

The two-level view held up well — the summary/detail split proved to be the right abstraction. The health indicator calculation needed several iterations to avoid false positives; early versions flagged jobs as unhealthy too aggressively based on retry count alone, without accounting for the fact that some steps legitimately require retries under normal conditions. The action surface constraint was initially frustrating but ended up being correct: the times where I wanted more control were almost always situations where the right answer was to cancel the job and fix the underlying workflow definition, not patch the running instance from the dashboard.