Artificial Intelligence

From Chaos to Production: How to Orchestrate Deterministic Agents with Vercel AI SDK

Discover how to orchestrate AI agents professionally using universal connectors, Zod-typed outputs, and tool calling for real-world systems.

Eby Equipe BlueprintblogJun 2010 min read

From Chaos to Production: How to Orchestrate Deterministic Agents with Vercel AI SDK

Language models are probabilistic. Software is deterministic. Almost everything that is hard about putting AI into production lives in the gap between these two sentences.

In a demo, the gap doesn't appear — there is a human in the loop, reading the response and forgiving the rough edges. In production, what comes after the model is software: a database that expects a number, a function that expects an exact format, a UI that expects a field that always exists. Free text doesn't cut it. The work is engineering: closing this gap without pretending the model has become deterministic.

This article is the ladder to close it, step by step: a universal connector, typed outputs, tool calling, and, when tool calling isn't enough, isolation via sub-agents. I use the Vercel AI SDK as a concrete tool — but the patterns are what matter, not the brand.

The chaos where you start

The first version of any AI feature usually starts out messy. You install a provider's SDK, write the stream parsing by hand, pray for the model to return valid JSON, and couple the business logic to a specific API. Then the provider releases a better model — or goes down — and you discover that switching means rewriting everything.

Every provider has a different API, a different stream format, a different way of asking for structured output. The result is disposable boilerplate and lock-in: you are stuck with the first vendor you chose, even when a better one appears.

The naive approach vs. a unified layer

One SDK per provider

The naive approach

Parsing de stream na mão, tipagem fraca e trocar de provedor exige reescrever a lógica de negócio.

A single API

A unified layer

O mesmo código invoca qualquer LLM, com stream e tipos consistentes. Trocar de modelo é trocar uma linha.

Why stitching providers together by hand doesn't scale.

The universal connector

The first step is to stop speaking to each provider in their own language. A unified layer — the Vercel AI SDK is the most widely adopted in the JavaScript world — abstracts the differences: the same function generateText talks to OpenAI, Anthropic, Google, or xAI, and switching models is just changing one line.

import { generateText } from "ai";import { anthropic } from "@ai-sdk/anthropic";// import { openai } from "@ai-sdk/openai"; // trocar de provedor = trocar 1 linhaconst { text } = await generateText({  model: anthropic("claude-sonnet-4-6"),  prompt: "Resuma este relatório em três frases.",});

On top of this lives the AI Gateway: a single endpoint that adds resilience without perceptible latency. You reference the model as a string — anthropic/claude-sonnet-4.6 — and gain automatic fallback (if one provider goes down, another takes over), OIDC authentication (without managing keys), and telemetry. Routing takes about 20 ms.

None of this is magic, and the AI SDK is not the only option — Mastra and LangChain cover similar ground, and Cloudflare and AWS compete for the same integration layer. But the pattern is what counts: a provider-agnostic interface, so your architecture doesn't marry a single model.

Order from chaos: structured outputs

Now the gap, concretely. Ask a model for data in free text and you get something like “Hmm, let me see… the name is John Doe and I think he is 30 years old…”. Nice for a human to read, useless for software to consume. The model is probabilistic; the if that comes after is deterministic.

The functions generateObject and streamObject close this. You pass a Zod schema, and the SDK forces the model to adhere to that structure — typed and validated end-to-end. It’s not “almost JSON”; it’s the object you declared.

import { generateObject } from "ai";import { anthropic } from "@ai-sdk/anthropic";import { z } from "zod";const { object } = await generateObject({  model: anthropic("claude-sonnet-4-6"),  schema: z.object({    nome: z.string(),    idade: z.number(),    prioridade: z.enum(["baixa", "media", "alta"]),  }),  prompt: "Extraia os dados do cadastro: ...",});object.prioridade; // tipado e validado — o TypeScript conhece a forma

If you read the Data Refinery article, this is the contract in action: the schema is the frontier between the model's probabilistic output and the typed system that consumes it. Ideal for extraction, classification, and generative UIs.

Giving AI hands: tool calling

A model, no matter how smart, has no agency. It doesn't look up today's weather, it doesn't check a client's balance in your database, it doesn't send an email. It is trapped in its own box, frozen at the training cutoff, isolated from the real world — and no clean JSON changes that.

Tool calling solves this. The model doesn't execute code; it emits an intention (“call fetchOrder with orderId=A-1042”). The SDK executes your TypeScript function, returns the result to the model, and repeats the cycle until the task is finished.

Here lies the first important update — and where many tutorials are outdated. In AI SDK 4, you controlled the loop with maxSteps. In AI SDK 5 (July 2025), this was removed: the loop is now controlled by stopWhen, and tools declare inputSchema, no longer parameters.

import { generateText, tool, stepCountIs } from "ai";import { anthropic } from "@ai-sdk/anthropic";import { z } from "zod";const { text } = await generateText({  model: anthropic("claude-sonnet-4-6"),  stopWhen: stepCountIs(5),          // antes era `maxSteps: 5`  tools: {    buscarPedido: tool({      description: "Busca um pedido pelo ID",      inputSchema: z.object({ pedidoId: z.string() }), // antes era `parameters`      execute: async ({ pedidoId }) => db.pedidos.find(pedidoId),    }),  },  prompt: "Qual o status do pedido A-1042?",});

to adjust context and model at each step.

The wall: context collapse

Tool calling is excellent for point-in-time actions. But when the task requires exploring a lot of information — reading dozens of files, scanning a database, traversing logs — each tool result goes back entirely into the main agent's context. And then three things happen at once: token consumption explodes, latency spikes, and the agent forgets the original instruction and loses coherence.

It is the paradox of naive tool calling: the more the tool works, the more cognitive junk it dumps back into the agent. The solution is not a better tool; it is a boundary.

The agentic frontier: isolation via sub-agents

A sub-agent is an autonomous agent wrapped as a tool. The main agent calls it as it would any tool — sends a task, receives a result — but the sub-agent runs with its own context window, from scratch. It does the heavy lifting in isolation and returns only a focused summary.toModelOutputThe detail that closes the argument is

import { tool, generateText, stepCountIs } from "ai";import { anthropic } from "@ai-sdk/anthropic";import { z } from "zod";// Um subagente é um agente autônomo invocado como ferramenta.const pesquisaProfunda = tool({  description: "Pesquisa profunda e independente sobre um tópico",  inputSchema: z.object({ tarefa: z.string() }),  execute: async ({ tarefa }) => {    // janela de contexto própria: queima 100k tokens explorando...    const { text } = await generateText({      model: anthropic("claude-sonnet-4-6"),      stopWhen: stepCountIs(20),      tools: { /* buscar, ler, etc. */ },      prompt: `Investigue a fundo e resuma: ${tarefa}`,    });    return text;  },  // ...mas o orquestrador só vê o resumo de ~1k tokens  toModelOutput: ({ output }) => ({ type: "text", value: output }),});

. It separates what the tool produces from what the main model sees: the sub-agent can burn 100k tokens exploring, but the orchestrator consumes only the 1k summary. The user follows all progress in streaming; the main model sees only the distillate. The main agent's context remains clean and coherent.

It is the orchestrator-worker pattern: a central agent divides the task, sub-agents execute in parallel and in isolation, and the central one synthesizes. Each sub-agent starts with a clean context — that is precisely what allows it to explore freely without bloating the main conversation.

Choosing the right architecture

A sub-agent is not the answer to everything — they add latency and complexity, and over-engineering is a form of failure just as real as the lack of it. The AI SDK lets you compose all strategies on the same foundation; the work is matching the architecture to the task, looking at two axes: the volume of data to explore and the need for independence.

Traditional tool calling

Choose based on exploration volume and the need for independence.

Pouca exploração, baixa complexidade. Ex.: buscar o clima, calcular um frete.

Multi-step tool calling

Point-in-time action

Vários passos encadeados, mas ainda no contexto do agente principal.

Embeddings & RAG

Sequential orchestration

Muito dado para varrer, baixa complexidade de decisão. Recupere o relevante, não tudo.

Sub-agent architecture

Semantic search

Muita exploração E independência: delegue para agentes isolados que devolvem resumos.

Delegation strategy matrix

Start with the lowest step that solves the problem. A point-in-time action is a tool; a simple sequence is multi-step; a search in a corpus is RAG. Move up to sub-agents only when the task requires deep and independent exploration — and the gain of clean context pays for the extra latency.

Putting into production: resilience and observabilityproviderOptions.gatewayMulti-agent architectures break in ways a demo never shows, and they need infrastructure to match. The AI Gateway and the order parameter manage operational chaos: automatic model fallback if a provider fails (only, sort for cost, latency, or throughput), Zero Data Retention per call for privacy, and OIDC authentication that eliminates key management.

And there is the invisible cost. Autonomous systems spend in silence — a sub-agent that triggers twenty steps can cost much more than the orchestrator. That is why observability is not a luxury: you need traceability by function and by provider, with costs segregated by model, token metrics, and Time to First Token (TTFT). Without this, you discover the bill at the end of the month.

The architecture, assembled

Put the steps together and the final piece appears: a main orchestrator (a strong model) receives the request and decides; it calls specialized tools and sub-agents (perhaps other models, perhaps cheaper ones), which explore in isolation and return summaries; the data passes through a Zod schema and becomes typed JSON; and everything goes back to the user in streaming. One SDK, crossing provider, model, and security boundaries.

Final Orchestration System Architecture

The arc is this: from scattered APIs to a universal connector; from probabilistic text to secure typing; from isolated chat boxes to systems with agency; from context collapse to sub-agent orchestration. Fragile prompt engineering becomes software architecture.

And note what hasn't changed along the way: the model remains probabilistic. You haven't tamed it. You have built deterministic software around it — a connector that doesn't lock you in, a schema that guarantees the shape, a loop that knows when to stop, a boundary that protects the context. Intelligence is probabilistic; the architecture is not. That is where, in this carefully stitched gap, the difference between a demo and a product lives.

Written in June 2026. The API references — loop control with stopWhen/stepCountIs (AI SDK 5, July 2025), tools with inputSchema, the sub-agent pattern with toModelOutput, and AI Gateway features (fallbacks, ZDR, OIDC, BYOK, and observability) — reflect the state of the Vercel AI SDK at that date. The AI SDK is not the only way to implement these patterns.

#AI #LLM #OpenAI #Claude