Building AI Agents with Structured Outputs

How to build reliable AI agents that return structured, validated data using Zod schemas and tool calling — moving beyond fragile prompt engineering to robust system design.

22 January 2026 · 4 min read ·

The biggest challenge with LLM-powered applications isn’t getting the model to say the right thing — it’s getting it to return data in a format your application can actually use. Free-form text is great for chatbots, but when you’re building agents that take actions, you need structure.

I’ve spent the last year building AI agents for production systems, and the approach that’s proven most reliable is combining tool calling with schema validation. Here’s how I do it.

The Problem with Prompt-Based Parsing

The naive approach to getting structured data from an LLM looks like this:

const prompt = `
  Extract the following from this email and return as JSON:
  - sender_name (string)
  - subject (string)
  - priority (high, medium, low)
  - action_items (array of strings)

  Email: ${emailContent}
`;

const response = await llm.complete(prompt);
const data = JSON.parse(response); // 🤞 hope for the best

This works… sometimes. But it fails in frustrating ways:

The model might wrap the JSON in markdown code fences
It might add a conversational preamble before the JSON
Field names might be slightly different (sender vs sender_name)
Enum values might not match (High vs high vs HIGH)
The JSON might just be malformed

You end up writing increasingly brittle regex-based extraction logic, and every edge case you fix reveals two more.

Tool Calling: Let the Model Fill in a Form

Modern LLM APIs support tool calling (also called function calling), where you define a schema and the model returns data that conforms to it. This shifts the problem from “parse unstructured text” to “validate structured data.”

Here’s the same email extraction, but using tool calling with Zod schema validation:

import { z } from "zod";
import Anthropic from "@anthropic-ai/sdk";

const EmailAnalysis = z.object({
  senderName: z.string().describe("Full name of the email sender"),
  subject: z.string().describe("Email subject line"),
  priority: z
    .enum(["high", "medium", "low"])
    .describe("Urgency level based on content and tone"),
  actionItems: z
    .array(
      z.object({
        task: z.string().describe("What needs to be done"),
        owner: z.string().optional().describe("Who should do it"),
        deadline: z.string().optional().describe("When it's due, if mentioned"),
      }),
    )
    .describe("Concrete action items from the email"),
});

type EmailAnalysis = z.infer<typeof EmailAnalysis>;

The Zod schema serves triple duty: it defines the TypeScript type, provides the JSON Schema for the LLM API, and validates the response at runtime. One source of truth, three uses.

Building the Agent Loop

A real agent doesn’t just extract data — it reasons about what to do and takes actions in a loop. The pattern I use looks like this:

import { z } from "zod";

// Define the tools the agent can use
const tools = {
  searchKnowledgeBase: {
    description: "Search the internal knowledge base for relevant documents",
    schema: z.object({
      query: z.string(),
      maxResults: z.number().default(5),
    }),
    execute: async ({ query, maxResults }) => {
      return await knowledgeBase.search(query, maxResults);
    },
  },

  createTicket: {
    description: "Create a support ticket in the ticketing system",
    schema: z.object({
      title: z.string(),
      description: z.string(),
      priority: z.enum(["p0", "p1", "p2", "p3"]),
      assignee: z.string().optional(),
    }),
    execute: async (params) => {
      return await ticketSystem.create(params);
    },
  },

  sendReply: {
    description: "Send a reply email to the customer",
    schema: z.object({
      to: z.string().email(),
      subject: z.string(),
      body: z.string(),
    }),
    execute: async (params) => {
      return await emailService.send(params);
    },
  },
};

The agent loop processes tool calls iteratively:

async function runAgent(
  initialPrompt: string,
  maxIterations: number = 10,
): Promise<string> {
  const messages: Message[] = [{ role: "user", content: initialPrompt }];

  for (let i = 0; i < maxIterations; i++) {
    const response = await anthropic.messages.create({
      model: "claude-sonnet-4-20250514",
      max_tokens: 4096,
      system: SYSTEM_PROMPT,
      tools: formatToolsForAPI(tools),
      messages,
    });

    // If the model responds with text only, we're done
    if (response.stop_reason === "end_turn") {
      return extractTextContent(response);
    }

    // Process each tool call
    const toolResults = [];
    for (const block of response.content) {
      if (block.type === "tool_use") {
        const tool = tools[block.name];
        if (!tool) throw new Error(`Unknown tool: ${block.name}`);

        // Validate input against schema
        const parsed = tool.schema.safeParse(block.input);
        if (!parsed.success) {
          toolResults.push({
            type: "tool_result",
            tool_use_id: block.id,
            content: `Validation error: ${parsed.error.message}`,
            is_error: true,
          });
          continue;
        }

        // Execute the tool
        try {
          const result = await tool.execute(parsed.data);
          toolResults.push({
            type: "tool_result",
            tool_use_id: block.id,
            content: JSON.stringify(result),
          });
        } catch (err) {
          toolResults.push({
            type: "tool_result",
            tool_use_id: block.id,
            content: `Error: ${err.message}`,
            is_error: true,
          });
        }
      }
    }

    // Feed results back into the conversation
    messages.push({ role: "assistant", content: response.content });
    messages.push({ role: "user", content: toolResults });
  }

  throw new Error("Agent exceeded maximum iterations");
}

There are a few things to note about this pattern:

Schema validation before execution — even though the LLM API returns typed data, we still validate with Zod. Defence in depth.
Error results go back to the model — when a tool fails, we tell the model what went wrong so it can try a different approach.
Iteration limit — agents can get stuck in loops. Always set a ceiling.

Structured Output for Final Results

Sometimes you want the agent’s final answer to be structured too, not just its tool calls. You can achieve this by using a “respond” tool that the agent calls when it’s ready to give its final answer:

const ResponseSchema = z.object({
  summary: z.string().describe("Brief summary of what was done"),
  actionsPerformed: z.array(
    z.object({
      action: z.string(),
      result: z.enum(["success", "failure", "skipped"]),
      details: z.string().optional(),
    }),
  ),
  requiresFollowUp: z.boolean(),
  followUpReason: z.string().optional(),
});

This gives you a machine-readable summary of what the agent did, which you can log, display in a UI, or feed into downstream systems.

Error Boundaries and Graceful Degradation

Production agents need to handle failure gracefully. I wrap the agent loop in an error boundary that distinguishes between recoverable and fatal errors:

async function safeAgentRun(prompt: string): Promise<AgentResult> {
  try {
    const result = await runAgent(prompt);
    return { status: "success", result };
  } catch (error) {
    if (error instanceof RateLimitError) {
      // Retry with exponential backoff
      return retryWithBackoff(() => runAgent(prompt));
    }

    if (error instanceof ValidationError) {
      // Schema mismatch — log for debugging, return partial result
      logger.warn("Agent output validation failed", { error, prompt });
      return { status: "partial", error: error.message };
    }

    // Unknown error — don't retry, escalate
    logger.error("Agent run failed", { error, prompt });
    return { status: "error", error: error.message };
  }
}

What I’ve Learned

After building a dozen of these systems, my key takeaways are:

Schema-first design — define your Zod schemas before writing any agent logic. The schema is the contract between your agent and your application.
Small, focused tools — agents work better with many small tools than a few large ones. Each tool should do one thing.
Validate everything — trust but verify. Even with structured outputs, validate at the boundary.
Log the full conversation — when an agent misbehaves, you need the complete message history to debug it. Structured logging of every turn is essential.
Test with adversarial inputs — the agent will encounter inputs you didn’t anticipate. Build tests around edge cases and malformed data.

The combination of tool calling, schema validation, and iterative agent loops gives you a robust foundation for building AI-powered systems that actually work in production. The key insight is treating the LLM as a reasoning engine that fills in structured forms, not as a text generator you have to parse.