Skip to content
CK/SYSTEMS
Project: Portarium Source: Article active
· 7 min read

How to Add Safety Guardrails to Claude API Tool Calls

Validation middleware prevents Claude from executing harmful tool calls. Here is how to build it with Zod schema enforcement and approval gates.

article ai-safety claude-api typescript portarium

Tool calls are where Claude stops generating text and starts taking actions in the world. File writes, API requests, database mutations, shell commands — the moment you wire up tools, you are no longer running a chatbot. You are running an agent with real side effects. Most developers add guardrails as an afterthought. That is the wrong order.

This post covers how to intercept Claude tool calls before execution, enforce parameter schemas with Zod, and hold calls for human approval when the risk profile demands it. Every pattern here is implemented in Portarium.

Why Tool Calls Are the High-Risk Surface

Text completions are reversible. A bad summary can be regenerated. A bad tool call that deletes a file, sends an email, or charges a card cannot be unsent.

The attack surface is not just prompt injection or jailbreaks — it is the ordinary case where the model correctly understood intent but the intent itself was underspecified. “Delete old records” means something very different at 10 rows vs 10,000. The model does not know which one you meant. It executes and you find out.

Three failure modes appear consistently across agentic systems:

Wrong tool. The model calls send_email when the user asked it to “draft” something. Email sending and draft saving are both valid tools, but the model chose the one with side effects.

Right tool, wrong parameters. The model calls create_database_record with a null foreign key, or passes a date string where an ISO timestamp is required. The tool crashes, or worse, silently corrupts data.

Right tool, right parameters, wrong timing. The model executes a bulk operation during a maintenance window. Or it deletes a record that another process was mid-write on. Timing is not something Claude can observe without explicit tooling — and most codebases do not give it that context.

All three of these are preventable at the middleware layer.

Intercepting Tool Calls: The Middleware Pattern

Claude’s tool call response looks like this before execution:

// Raw tool call from the API response
interface ToolUseBlock {
  type: "tool_use";
  id: string;
  name: string;
  input: Record<string, unknown>;
}

Before you pass input to your actual tool handler, it should pass through a validation pipeline. The pattern is a simple middleware chain — each step either passes the call forward, rejects it with a reason, or holds it for human review.

type ToolCallMiddleware = (
  call: ToolUseBlock,
  next: () => Promise<ToolCallResult>
) => Promise<ToolCallResult>;

function buildMiddlewareChain(
  middlewares: ToolCallMiddleware[],
  handler: (call: ToolUseBlock) => Promise<ToolCallResult>
): (call: ToolUseBlock) => Promise<ToolCallResult> {
  return middlewares.reduceRight(
    (next, middleware) => (call) => middleware(call, () => next(call)),
    handler
  );
}

You compose it like Express middleware — schema validation first, approval gate second, audit logging wrapping both. Each layer is independently testable and replaceable.

Zod Schema Validation on Tool Parameters

Schema validation is the cheapest guardrail you can add and it eliminates the entire “right tool, wrong parameters” failure class.

Define your tool schemas with Zod, and validate before execution:

import { z } from "zod";

const toolSchemas: Record<string, z.ZodSchema> = {
  create_database_record: z.object({
    table: z.enum(["users", "orders", "products"]),
    data: z.record(z.unknown()),
    dryRun: z.boolean().default(false),
  }),
  send_email: z.object({
    to: z.string().email(),
    subject: z.string().min(1).max(200),
    body: z.string().max(10_000),
    replyTo: z.string().email().optional(),
  }),
  delete_records: z.object({
    table: z.string(),
    where: z.record(z.unknown()),
    limit: z.number().int().positive().max(100), // hard cap on bulk deletes
    confirm: z.literal(true), // model must explicitly pass confirm: true
  }),
};

const schemaValidationMiddleware: ToolCallMiddleware = async (call, next) => {
  const schema = toolSchemas[call.name];

  if (!schema) {
    return {
      type: "error",
      error: `Unknown tool: ${call.name}. Registered tools: ${Object.keys(toolSchemas).join(", ")}`,
    };
  }

  const result = schema.safeParse(call.input);

  if (!result.success) {
    return {
      type: "error",
      error: `Schema validation failed for ${call.name}: ${result.error.message}`,
    };
  }

  // Replace raw input with the validated + defaulted version
  return next({ ...call, input: result.data });
};

The confirm: z.literal(true) pattern on destructive operations is worth noting. The model must explicitly pass confirm: true in its tool call. That means the system prompt must tell Claude when confirmation is required. If the model calls delete_records without it, the schema rejects it before execution. It is a cheap forcing function that makes the model’s intent explicit in the call itself.

Approval Gates: Hold, Surface, Wait

Not every tool call can be auto-approved. For operations above a risk threshold — sending external communications, making financial transactions, modifying production data — you want a human in the loop before execution.

The approval gate pauses execution and waits for a signal:

const HIGH_RISK_TOOLS = new Set(["send_email", "delete_records", "charge_payment"]);

const approvalGateMiddleware: ToolCallMiddleware = async (call, next) => {
  if (!HIGH_RISK_TOOLS.has(call.name)) {
    return next(call);
  }

  const approvalRequest = {
    id: crypto.randomUUID(),
    toolName: call.name,
    input: call.input,
    requestedAt: new Date().toISOString(),
    status: "pending" as const,
  };

  // Persist to your approval queue (database, Redis, etc.)
  await approvalQueue.insert(approvalRequest);

  // Surface to human via webhook, Slack, email, etc.
  await notifyApprover(approvalRequest);

  // Poll for decision (or use a webhook callback pattern)
  const decision = await waitForApproval(approvalRequest.id, {
    timeoutMs: 30 * 60 * 1000, // 30 minute window
  });

  if (decision.status === "rejected") {
    return {
      type: "error",
      error: `Tool call rejected by ${decision.reviewerEmail} at ${decision.decidedAt}. Reason: ${decision.reason ?? "no reason given"}`,
    };
  }

  return next(call);
};

The 30-minute timeout is intentional. Agent loops should not block indefinitely on human approval — if no one responds, the call expires and the agent gets a clear error it can report back. You can tune this per tool: a Slack message might timeout in 10 minutes; a billing operation might wait 4 hours.

Full Audit Trail

Every tool call attempt — approved, rejected, or failed schema validation — should be logged before the decision is made. Do not log only successful executions. The rejected calls are where you find your actual failure patterns.

const auditMiddleware: ToolCallMiddleware = async (call, next) => {
  const entry = {
    id: crypto.randomUUID(),
    toolName: call.name,
    input: call.input,
    attemptedAt: new Date().toISOString(),
    outcome: null as "executed" | "rejected" | "error" | null,
    durationMs: null as number | null,
    errorDetail: null as string | null,
  };

  const start = Date.now();

  try {
    const result = await next(call);
    entry.outcome = result.type === "error" ? "error" : "executed";
    entry.errorDetail = result.type === "error" ? result.error : null;
    return result;
  } catch (err) {
    entry.outcome = "error";
    entry.errorDetail = err instanceof Error ? err.message : String(err);
    throw err;
  } finally {
    entry.durationMs = Date.now() - start;
    await auditLog.append(entry); // fire-and-forget, do not await in hot path
  }
};

After a few weeks in production, the audit log tells you which tools the model calls most often, which ones fail schema validation and why, which approvals get rejected, and where the latency is. That data drives every meaningful improvement to your tool definitions.

This Is What Portarium Does

The patterns above are not theoretical. They are the implementation inside Portarium — a validation middleware layer built specifically for Claude tool call pipelines.

Portarium ships with Zod-based schema enforcement, configurable approval gates, and structured audit logging that feeds into whatever observability stack you are already running. It sits between the Claude API response and your tool handlers, and you compose it in whatever combination your risk tolerance requires.

If you want this running inside your own workflow — one bounded use case, governed, production-grade — that is exactly what the consulting engagement covers.

Composing the Full Pipeline

To put it together:

const safeToolExecutor = buildMiddlewareChain(
  [
    auditMiddleware,        // outermost: logs everything including errors
    schemaValidationMiddleware, // rejects bad params before approval gate sees them
    approvalGateMiddleware, // holds high-risk calls for human sign-off
  ],
  executeToolHandler       // innermost: your actual tool implementations
);

// In your agent loop:
for (const block of response.content) {
  if (block.type === "tool_use") {
    const result = await safeToolExecutor(block);
    toolResults.push({ tool_use_id: block.id, content: JSON.stringify(result) });
  }
}

The order matters. Audit wraps everything so failed schema validation is still logged. Schema validation runs before the approval gate so you do not waste a reviewer’s time on a malformed call. Approval gate runs before execution so you never accidentally execute before sign-off.

Three lines of composition. Every tool call your agent makes is now validated, governed, and fully auditable.

Newsletter

Short notes on building AI agents in production.

One email when something worth sharing ships. No fluff, no daily cadence, no recycled growth-thread noise.

Primary use: consulting updates, governed AI workflow lessons, and major project writeups.

Newsletter

Short notes on building AI agents in production.

One email when something worth sharing ships. No fluff, no daily cadence, no recycled growth-thread noise.

Primary use: consulting updates, governed AI workflow lessons, and major project writeups.