llm-quant: Engineering Retrospective

llm-quant: Where Language Models Meet Financial Markets

The premise of llm-quant was straightforward: use an LLM as the reasoning layer in a quantitative trading system. Rather than encoding trading logic in handcrafted rules or training a narrow ML model on price history, the idea was to let a language model read market context — news, earnings summaries, macro indicators, price action narratives — and produce a trade decision. The project ran through a paper trading phase before any real capital was involved, and the gap between how the model performed in paper trading and what you’d actually trust with live money turned out to be the most instructive part of the whole build.

What Changed

The architecture split into three layers: a data ingestion pipeline that pulled market data and converted it into natural language summaries the model could reason about, a prompt layer that framed the current market state as a decision problem, and an execution layer that translated the model’s output into actual order parameters. The hardest part was the middle layer. LLMs are confident narrators — they produce well-structured, plausible-sounding analyses regardless of whether those analyses are actually correct. Constructing prompts that forced the model to express uncertainty rather than paper over it required more iteration than the rest of the system combined.

Numerical data presented a specific challenge. Price series, volume, volatility measures — these are not the kind of information language models are built to reason about precisely. The solution was to avoid feeding raw numbers directly and instead summarize them: “price is up 4% over the past five days, above the 20-day average, with volume 30% above normal.” That framing gave the model workable signal without expecting it to do arithmetic. It worked better than expected, but it also meant the model’s reasoning was only as good as the summarization layer, which introduced its own failure modes.

Why It Mattered

The project answered a question worth asking: can an LLM serve as a useful component in a trading system, or does the combination of overconfidence and poor numerical intuition make it a liability? The honest answer is that it depends on what you ask it to do. For synthesizing qualitative information — news sentiment, earnings tone, macro narrative — it adds genuine value. For anything that requires precise numerical reasoning or consistent probabilistic calibration, it needs to be surrounded by hard constraints and treated as one signal among several rather than the decision-maker.

Paper trading made this concrete. The model’s hit rate on directional calls was better than chance but not dramatically so, and it had a consistent failure mode: it would construct convincing narratives for trades that were actually driven by noise, and those narratives were persuasive enough that it was easy to rationalize not overriding them. That’s a worse failure mode than a model that’s obviously wrong — a clearly wrong model is easy to reject.

What Held Up / What Didn’t

The data pipeline and execution layer were solid and required minimal rework. The prompt layer needed constant attention because market regimes change and prompts that worked well in one environment could perform badly in another. The core insight that survived the project: LLMs are useful in trading systems as synthesis engines for qualitative data, not as primary decision-makers on quantitative signals. The tension between model confidence and financial risk never fully resolved — that’s probably the right outcome, because the moment you stop treating a model’s confidence as suspect is the moment it costs you real money.