How I built Portarium: a governed AI control plane for trustworthy automation

I learned quickly that governance belongs in the control path, not in a doc that only gets copied into tickets.

Context

When Portarium started, it had the look and feel of many early AI systems: impressive in demos, but operationally under-governed.

Agents could execute useful work quickly, but the surrounding system could not answer the three questions regulators, operators, and users eventually asked:

What happened?
Why did it happen?
Who approved the high-impact action?

The technical challenge was not adding yet another approval dialog. The challenge was that trust had to be treated as a core architectural constraint and not as a feature bolted on at release time.

I needed a system that let agents stay productive while operators kept control at the points that mattered.

Forces

I faced a set of tensions that never fully disappeared; they were the reason the design looks conservative:

Safety first vs. experimental velocity
Every additional guardrail added friction. Every shortcut removed safety. I could not choose a single winner.
Autonomy vs. auditability
Autonomous execution is the point of the product, but opaque autonomy is operationally unusable.
Single run success vs. fleet-level consistency
It was tempting to fix edge cases by hand in each tool path, yet that made failure modes diverge across CLI, API, and demo surfaces.
Observability depth vs. cognitive load
More state and events improve diagnosability, but they also require operators to learn a richer model before they can act.
Policy flexibility vs. deterministic behavior
A policy engine must evolve, yet behavior under policy must remain predictable across time.
Fast recoverability vs. durable traceability
Retrying quickly is easy; replaying confidently later is hard unless state transitions are explicit.

Architecture Decisions

I made a small number of decisions that shifted Portarium from an experimental shell to a governable platform:

Elevate execution into explicit states
run, agent, workflow, credential, and machine were treated as durable domain objects, each with explicit transitions.
This made failure, pause, and completion states first-class, not implied side effects.
Model “approval required” as a normal state, not an exception
Approval was represented in the same state graph as normal work progress: queued, blocked, approved, rejected, and resumed.
The result was that “human in the loop” became observable, replayable, and testable.
Place policy checks on the hot path
Policy evaluation moved before side effects instead of after.
This ensured role, risk, and egress constraints were enforced wherever execution originated.
Share one control contract across interfaces
I forced CLI, cockpit, demos, and internal integrations to call the same execution primitives.
This eliminated “governance drift” across surfaces and turned inconsistent behavior from an inevitability into a bug.
Preserve deterministic event shape across tooling boundaries
A stable event and run metadata contract made downstream diagnostics, UIs, and audit tooling composable and replaceable.

The architecture intentionally looks dull from the outside: not elegant by accident, but boring because boring is often what keeps production systems honest.

Consequences

Positive outcomes

Clear intent at every important step
Operators can now trace a run as a finite chain of states instead of a black box.
Lower governance surprise
Approval points are explicit before effectful actions, so risk decisions are visible at the right time.
Higher confidence in extension
New tools and adapters can be introduced behind existing control boundaries with fewer semantics surprises.
Better incident response
State and event history now supports meaningful “where did it go wrong?” analysis without re-running live systems.

Downsides and risks

Increased engineering drag
Every feature now needs to pass through a more opinionated contract, which slows rapid prototyping.
Larger conceptual surface for new team members
The mental model is stronger than in the early prototype era; onboarding takes longer.
Coupling pressure at governance boundaries
Policy, execution, and observability share critical edges. Refactors there are expensive until abstraction boundaries harden further.
Operational debt in traces
Richer event logs are only useful when they stay curated; without discipline they become noise.

Selected commit timeline

These commits are the inflection points that most influenced the architecture:

d6fe5e0 — introduces the approval-wait flow and connects it to tools, CLI, and demos.
b11b05f — ADR-0117 formalizes approval-wait semantics.
a0567bd — companion approval CLI experiment used to validate operator interaction.
98a72bb — long-polling REST approval-wait loop experiment.
8d626bc — filesystem-watch approval propagation experiment.
b843e36 — WebSocket-based approval-wait loop experiment.
a53c387 — EventEmitter approval loop variant.
fc97038 — async reliability fix with Promise.race around approval signaling.

Refactorings for the future

This structure holds, but it is not finished:

Split policy from the orchestration runtime through a strict plugin boundary so policy experimentation can move faster without destabilizing the core contract.
Introduce a typed workflow DSL for approval rules so most policy work stops requiring custom plumbing.
Replace ad-hoc adapter assumptions with capability descriptors per tool integration.
Add deterministic simulation mode to replay state transitions end-to-end before rollout.
Split long-lived trace retention from operational tracing to control storage growth and reduce incident-noise.

Why this matters

The central lesson is straightforward: autonomy without an explicit governance model scales poorly, and a governance model without engineering discipline becomes theater.

Portarium now occupies the narrow middle ground. Agents still move, but they move inside a plane where intent, risk, and effect are modelled, logged, and reversible.
That shape is what made the project viable beyond demos.

Fagan inspection: design review by commit evidence

I run each retrospective article through a Fagan-style inspection checklist so claims are supported by history, not vibes.

Inspection scope

Inputs: linked commits in this article, architecture claims in prose, and explicit design trade-offs.
Objective: separate intentional architecture from incidental implementation details.
Exit condition: no major claim remains unlinked to a commit trail or clear constraint.

What I inspected

Problem framing — was the failure mode explicit and specific?
Decision rationale — was the reason for each structural choice clear?
Contract boundaries — are state transitions, validation, and permissions explicit?
Verification posture — are risks paired with tests, gates, or operational safeguards?
Residual risk — what is still uncertain and where is next evidence needed?

Findings

Pass condition: each design direction is defensible as a trade-off, not preference.
Pass condition: at least one linked commit backs every architectural claim.
Pass condition: failure modes are named with mitigation decisions.
Risk condition: any unsupported claim becomes a follow-up inspection item.

How I design things (Fagan-oriented)

Start with a concrete failure, not a feature idea.
Define invariants before interface details.
Make state and lifecycle transitions explicit.
Keep observability at decision points, not only at failures.
Treat governance as a design constraint, not a post hoc process.

Next design action

Turn this inspection into a backlog trail: each remaining risk maps to one upcoming commit with acceptance evidence.