Why I built AgenticSDLC: making software planning and execution auditable by design

I build tooling for the moments where planning gets loud and execution quietly drifts. My goal: make meaning survive handoffs from idea to run output.

AgenticSDLC started from a familiar software reality: great ideas die in handoffs, and handoffs happen everywhere—between humans, models, and tools.

The project became an experiment in building an SDLC that preserves intent from the first ambiguity statement to final execution output.

Problem

Software teams can usually describe the failure modes:

Requirements shift every few days and the source of truth fragments across chat, tickets, and terminal logs.
Execution can be correct in code and still be unrecoverably wrong in rationale, because the “why” is not represented as a durable artifact.
Compliance, onboarding, and post-mortems are forced to reconstruct intent from incomplete traces rather than rely on a structured execution record.

For this work, the core problem was not simply “more process.” It was: how do we make planning and execution observable enough to be auditable while still feeling fast enough for day-to-day work?

Forces

Preserve traceability end-to-end.
Keep developer ergonomics high in the terminal, where actual operators spend the most time.
Handle uncertainty explicitly so the workflow fails loudly on ambiguity instead of silently papering over it.
Minimize accidental complexity: each feature had to be implementable, reversible, and reviewable.
Keep external dependencies bounded, because every integration is an availability and upgrade risk.

These forces are in tension. The fastest path (minimal structure + strong defaults) tends to hide reasoning. The strictest path (strict schemas + hard gates) can become slow enough that people bypass it.

Solution

I adopted a staged architecture:

Establish a planning contract that forces inputs into machine-checkable structures.
Convert ambiguous or incomplete input into explicit follow-up questions and confidence annotations.
Tie every task to lifecycle metadata so its current state is queryable, resumable, and reviewable.
Correlate artifacts across phases: interviews, plans, summaries, transcripts, task state, and execution results.
Expose CLI ergonomics that make the above enforceable in ordinary workflows instead of only in demo scripts.

The important pattern is sequencing: bootstrap first, then traceability surfaces, then governance gates, then optional automation hardening. This order keeps early feedback tight and avoids building policy on a brittle foundation.

Implementation detail that mattered most:

Schema-first execution: enforce artifacts by shape before they become policy.
Explicit ambiguity states: avoid pretending uncertainty is binary; represent confidence and blockers directly.
Boundary-preserving CLI: keep control-plane actions explicit (commands, arguments, explicit outputs), so automation remains inspectable.

Consequences

Positive

Artifacts became durable evidence: plans are no longer ephemeral notes.
The system is easier to hand over because context survives between execution phases.
Debugging improved because failures can be traced to originating assumptions instead of only runtime symptoms.
Governance became testable; quality constraints are now part of the workflow, not a later review ritual.

Negative

The baseline workflow now has more structured steps than a casual todo list.
Friction appears on first use until operators internalize the new flow.
Integration points increase the operational surface that needs patching, monitoring, and versioning.

Tradeoffs

Speed vs confidence: stronger checks reduce accidental defects but add latency before task completion.
Flexibility vs audibility: permissive inputs speed experimentation; strict typing protects downstream correctness.
Visibility vs complexity: explicit cross-phase linkage improves audits, but requires disciplined naming, IDs, and retention semantics.
Centralized governance vs team autonomy: standardised gates help scale reliability, but can feel constraining to small experiments unless opt-out/override paths are deliberate.

The chosen tradeoff is to bias toward audibility once the system enters multi-step execution, while preserving lower-friction paths during bootstrap and exploration.

Risks

Overfitting governance to current patterns. If checkers become too opinionated, they can freeze useful deviation. The mitigation is incremental gate escalation and explicit bypass logging.
Dependency drift. Integrations evolve independently; stale adapters become a hidden source of failure. This is a product-level risk, handled with explicit dependency review cadences.
Semantic debt in generated artifacts. If upstream tasks are vague, downstream traceability remains formal but weak. The mitigation is stronger ambiguity policies and periodic prompt-structure review.
Operational fatigue. If every command is too heavy, adoption collapses. The mitigation is to keep the happy path keyboard-friendly and document explicit defaults.

Representative commits

fe46795 — planning and execution workflow scaffolding.
95bbb0c — local refs plus team-mode planning, matching the multi-actor ambiguity model.
dcacd3a — iterative planning workflow used before hardening state transitions.
9f06382 — implementation-side command integration for planning-to-execution continuity.
23df544 — Personas and context models that influenced ambiguity handling.
0d1bd5f — initial commit baseline for the historical sequence.

Fagan inspection: design review by commit evidence

I run each retrospective article through a Fagan-style inspection checklist so claims are supported by history, not vibes.

Inspection scope

Inputs: linked commits in this article, architecture claims in prose, and explicit design trade-offs.
Objective: separate intentional architecture from incidental implementation details.
Exit condition: no major claim remains unlinked to a commit trail or clear constraint.

What I inspected

Problem framing — was the failure mode explicit and specific?
Decision rationale — was the reason for each structural choice clear?
Contract boundaries — are state transitions, validation, and permissions explicit?
Verification posture — are risks paired with tests, gates, or operational safeguards?
Residual risk — what is still uncertain and where is next evidence needed?

Findings

Pass condition: each design direction is defensible as a trade-off, not preference.
Pass condition: at least one linked commit backs every architectural claim.
Pass condition: failure modes are named with mitigation decisions.
Risk condition: any unsupported claim becomes a follow-up inspection item.

How I design things (Fagan-oriented)

Start with a concrete failure, not a feature idea.
Define invariants before interface details.
Make state and lifecycle transitions explicit.
Keep observability at decision points, not only at failures.
Treat governance as a design constraint, not a post hoc process.

Next design action

Turn this inspection into a backlog trail: each remaining risk maps to one upcoming commit with acceptance evidence.