How I built Alpaca Chat: from demo chat client to controlled multi-model delivery system

I build AI products with failure as a first-class design case: better an explicit fallback than a mysterious happy path.

Alpaca Chat started as a demo: one chat surface, multiple model providers, fast UX progress.
The deeper lesson was that a product’s reliability in production is not produced by more UI features; it is produced by explicit systems decisions.
This retrospective follows the classic pattern: identify the problem, list the forces, apply a constrained solution, then document structure and tradeoffs.

Problem

A working chat interface can make early adoption easy while still failing silently in real usage.
The early implementation made the wrong thing easy to reason about: the current screen, not the delivery pipeline.

In practice, we observed three production failure classes:

Model changes changed behavior without clear intent.
Context and policy drifted as chat creation, model selection, and routing logic were spread across UI, server handlers, and provider adapters.
Conversation continuity was fragile.
Session metadata and message history had incomplete authority and weak id hygiene, so retries were often equivalent to best-effort replays.
Errors were visible but not explainable.
Troubleshooting required context we did not keep.
When latency or provider failures happened, operators lacked a stable path for tracing why a specific request took a route and how state changed.

The thesis of this effort became: if multi-model routing is treated as a feature toggle, behavior is unstable; if it is treated as architecture, behavior is governable.

Forces

Delivery speed and correctness were both mandatory.
We needed to move quickly without regressing user trust when retries or failures happened.
Provider heterogeneity was non-negotiable.
Different models gave us useful flexibility, but introduced variation in schemas, error modes, and performance.
State had to be durable, not implicit.
A chat is a stream over time; an in-memory or inferred state model failed under load and replays.
Operational burden had to remain bounded.
Each additional path should add clarity for on-call triage, not just one more log entry.

Solution

I moved from a UI-led flow to a small explicit platform flow:

Make conversation lifecycle a domain boundary: create, route, progress, fail, resume.
Persist canonical conversation data as the source of truth, not just a cache for rendering.
Centralize model routing decisions and constrain them to supported, reviewable options.
Surface status and id handling as explicit transitions so retries are safe and reproducible.
Add narrowly scoped reliability and routing checks around chat-critical paths before expanding feature breadth.

The sequencing matters. Reliability came first; enhancements were added where the new structure made them safe.

Resulting structure

The final structure is still simple in shape, but explicit in behavior:

Presentation layer accepts user input and displays state.
Service layer orchestrates conversation operations and policy-based routing.
Persistence layer stores canonical messages and state transitions.
Model routing layer constrains which agents/models are eligible and why.
Observability layer records boundaries (creation, routing, completion, failure) with durable identifiers.

This replaced ambiguous control flow with explicit transitions that can be reasoned about without stepping through every component.

Tradeoffs

I accepted extra engineering surface area to reduce ambiguity:

Increased complexity: more state transitions and invariants to keep accurate.
Higher initial cost: adding persistence and routing discipline slows raw feature tempo initially.
Lower operational noise: incidents became faster to diagnose and recover from because behavior was attributable.
Stronger product control: model selection became intentional, auditable, and less surprising.

The net result is not less agility; it is agility with guardrails.
The system now fails in known ways and therefore recovers in predictable ways.

Production lessons

Choose invariants first, abstractions second.
A clean abstraction over conversation state is only useful if message lifecycle and id semantics are already settled.
Constrain flexibility until your observability catches up.
Fewer supported paths initially can reduce incident complexity while the team matures.
Model failure as expected behavior.
If recovery is part of the design, users perceive fewer regressions and support gets fewer “random” bugs.

Representative commits

Fagan inspection: design review by commit evidence

I run each retrospective article through a Fagan-style inspection checklist so claims are supported by history, not vibes.

Inspection scope

Inputs: linked commits in this article, architecture claims in prose, and explicit design trade-offs.
Objective: separate intentional architecture from incidental implementation details.
Exit condition: no major claim remains unlinked to a commit trail or clear constraint.

What I inspected

Problem framing — was the failure mode explicit and specific?
Decision rationale — was the reason for each structural choice clear?
Contract boundaries — are state transitions, validation, and permissions explicit?
Verification posture — are risks paired with tests, gates, or operational safeguards?
Residual risk — what is still uncertain and where is next evidence needed?

Findings

Pass condition: each design direction is defensible as a trade-off, not preference.
Pass condition: at least one linked commit backs every architectural claim.
Pass condition: failure modes are named with mitigation decisions.
Risk condition: any unsupported claim becomes a follow-up inspection item.

How I design things (Fagan-oriented)

Start with a concrete failure, not a feature idea.
Define invariants before interface details.
Make state and lifecycle transitions explicit.
Keep observability at decision points, not only at failures.
Treat governance as a design constraint, not a post hoc process.

Next design action

Turn this inspection into a backlog trail: each remaining risk maps to one upcoming commit with acceptance evidence.