← Back to Cookbook

Monitor your AI agent costs and calls in Next.js

Connect Sentry to your Vercel AI SDK app and get instant visibility into token usage, LLM costs, tool calls, and agent traces.

Features
SDKs
Category Monitoring
Share
Time
15–20 minutes
Difficulty
Beginner
Steps
6 steps

Before you start

SDKs & packages
Accounts & access
Knowledge
  • Basic familiarity with Next.js API routes or Route Handlers
  • Basic understanding of the Vercel AI SDK (streamText, generateText)

1
Install the Sentry Next.js SDK

Add the Sentry SDK to your project using the wizard, which handles the instrumentation files automatically. Run this in your project root.

Next.js SDK setup guide
npx @sentry/wizard@latest -i nextjs

2
Enable tracing and add the Vercel AI integration

Open sentry.server.config.ts and make sure tracesSampleRate is greater than zero. Then import vercelAIIntegration from @sentry/nextjs and add it to the integrations array.

AI agent monitoring for Next.js
import * as Sentry from '@sentry/nextjs';
import { vercelAIIntegration } from '@sentry/nextjs';

Sentry.init({
  dsn: process.env.SENTRY_DSN,
  tracesSampleRate: 1.0,
  integrations: [
    vercelAIIntegration(),
  ],
});

3
Enable telemetry on your AI SDK calls

In any Route Handler that uses the Vercel AI SDK, add the experimental_telemetry option to your streamText or generateText call. Set isEnabled: true and provide a functionId — this label appears as the span name in Sentry traces. You can also set recordInputs and recordOutputs to capture the full prompts and completions. Disable these on routes that handle sensitive user data.

AI agent monitoring for Next.js
import { streamText } from 'ai';
import { openai } from '@ai-sdk/openai';

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = await streamText({
    model: openai('gpt-4o'),
    messages,
    experimental_telemetry: {
      isEnabled: true,
      functionId: 'chat-route',
      recordInputs: true,
      recordOutputs: true,
    },
  });

  return result.toDataStreamResponse();
}

4
Explore agent traces in Insights

Trigger a request in your app, then head to Insights → Agents in Sentry. You'll see pre-built widgets for LLM calls, token usage, and tool calls. Click into any trace to see the full agent workflow — the system prompt, model output, and each tool call nested in sequence.

AI Agents Insights documentation
Sentry Insights → Agents dashboard showing LLM Calls by Model, Tokens Used, and Tool Calls charts, with a Traces table listing agent runs with LLM call counts, tool calls, total tokens, and cost

5
Check LLM costs and model breakdown

From the Agents dashboard, click Models to get a breakdown of costs, token usage, and token types (input vs. output vs. cached) by model. This view makes it easy to spot which models are driving the most spend and how effectively cached tokens are being used. You can also build custom dashboards to combine this data with other application metrics like error rates or latency.

AI Agents Insights documentation
Sentry Agents → Models tab showing Model Cost, Tokens Used, and Token Types charts, with a Models table breaking down requests, errors, average latency, cost, input tokens, cached tokens, and output tokens per model

6
Drill into the full trace

From any agent trace, click View Full Trace to open the Trace Explorer. Here you'll see the entire request lifecycle — from the page load or API call all the way down to each LLM request and tool execution. This gives you the full-stack context to understand whether a slow tool call is a database issue, a network problem, or an AI provider timeout.

Trace Explorer documentation
Sentry Trace Explorer showing an AI Agent Chat Session trace with AI Spans tab selected, listing nested agent invocations, chat spans with token counts and costs, tool executions, and agent handoffs — with a detail panel showing model, token usage, cost, and the full system prompt

That's it.

You know exactly what your agents cost.

Every token, tool call, and model request shows up in Sentry with full trace context — so you can optimize cost and debug performance without switching tabs.

  • Configured Sentry tracing in your Next.js app
  • Added the Vercel AI integration for automatic LLM instrumentation
  • Enabled telemetry on AI SDK calls to capture prompts and outputs
  • Explored agent traces, token usage, and tool calls in Insights → Agents
  • Viewed per-model cost and token breakdowns

Pro tips

  • 💡 Use a meaningful functionId for each streamText or generateText call (e.g. 'chat-route', 'summarize-document') — it becomes the span name in Sentry and makes traces much easier to filter.
  • 💡 Set tracesSampleRate to a lower value like 0.2 in production to reduce volume, then raise it temporarily when debugging a specific issue.
  • 💡 Use Sentry.setConversationId() to link spans across multi-turn conversations so you can analyze complete conversation flows rather than individual requests.
  • 💡 Tool calls that hit your database appear as child spans inside the agent trace — if a tool is slow, check whether the database span is the actual bottleneck.

Common pitfalls

  • ⚠️ Forgetting to set tracesSampleRate greater than zero — AI spans are only captured when tracing is active.
  • ⚠️ Not wrapping tool calls in Sentry spans — if your agent calls external APIs or runs database queries as tools, they won't appear in the trace unless instrumented. Use the Sentry SDK or auto-instrumented libraries so tool performance shows up alongside your LLM spans.
  • ⚠️ Miscounting cached tokens: gen_ai.usage.input_tokens should be the total including cached tokens, with the cached portion reported separately as input_tokens.cached. Reporting only the non-cached portion causes Sentry to calculate negative costs.
  • ⚠️ Leaving recordInputs: true on routes that handle sensitive user data — prompts are captured by default, so opt out explicitly where needed.

Frequently asked questions

The Vercel AI SDK integration automatically instruments all providers supported by the AI SDK, including OpenAI, Anthropic, Google Gen AI, and others. For unsupported providers, you can add custom spans using the gen_ai.request operation.
No. If you're using OpenAI, Anthropic, LangChain, or another supported library directly, Sentry has dedicated integrations for each. The experimental_telemetry option is specific to the Vercel AI SDK.
No. Sentry uses asynchronous, non-blocking transport. Spans are batched and sent in the background with negligible overhead on your request latency.
Traces are billed based on volume. Sentry's free tier includes enough spans to get started — and you can lower tracesSampleRate to control exactly how many traces you send.
Yes — set recordInputs: true and recordOutputs: true in the experimental_telemetry config. For privacy-sensitive routes, set these to false to capture only metadata like token counts and model names.

Fix it, don't observe it.

Get started with the only application monitoring platform that empowers developers to fix application problems without compromising on velocity.