Build a Token-Efficient Review Agent
Build a Token-Efficient Review Agent
Goal: Build a review agent that drafts routine answers with a cheap model and asks a stronger Advisor model only after the executor compresses uncertainty into a short packet.
Outcome: Your app sends normal review tasks to a low-cost executor, adds one compact plan-reviewer Advisor entry when a second model is worth the spend, and logs whether Advisor was offered without storing prompts or advice.
Want your coding agent to add this workflow to your app? Copy this prompt.
Advisor is a beta server tool. It runs an inner model call, so it can add
cost and latency. This recipe keeps that call behind a budget gate and sends a
compact review prompt instead of forwarding the full transcript. Use returned
usage.cost when present, or estimate spend from the selected advisor model’s
current pricing before widening the gate.
Before you start
You need:
- Node.js 20 or newer
- An OpenRouter API key in
OPENROUTER_API_KEY - A review, planning, or agent workflow that already calls OpenRouter
- A cheap executor model for routine work
- A stronger advisor model for compact second opinions
If you are starting a new TypeScript agent, use the Agent SDK callModel API for the executor loop. The sample below uses Chat Completions so the server-tool request shape is visible, but the budget-gate pattern is the same inside an Agent SDK workflow.
Use these references for exact schemas:
- Advisor server tool
- Agent SDK
callModeloverview - Create a chat completion
- Create a response
- TypeScript SDK Chat reference
What you’re building
This recipe builds a tiny budget-aware implementation-plan reviewer.
The executor model handles the normal response and writes most plans by itself. The app only offers the plan-reviewer Advisor when the task has uncertainty signals like a large diff, missing tests, a schema change, or unfamiliar ownership. When Advisor is available, the executor can ask it for one focused second opinion before writing the final answer.
For this workflow, give the Advisor the decision being reviewed, the changed files, the uncertainty signals, and a short diff summary.
1. Define the compact review packet
Start with the smallest object the agent needs. This is the data your app already knows before it calls OpenRouter.
Keep this packet small. Full diffs, raw conversation history, logs, and customer data belong in your own review UI or trace system, not in the Advisor prompt by default.
2. Add the cheap executor and budget-gated Advisor
The sample below shows the routing pattern and Chat Completions request shape. Adapt the same budget gate and tools shape inside your existing OpenRouter call, including Agent SDK callModel if that is your agent loop.
Send requestBody through the request path your app already uses. For routine tasks, tools is omitted and the request stays on the cheap executor model. For uncertain tasks, the request offers one named Advisor tool:
The executor chooses whether to call plan-reviewer. It passes only prompt in the tool-call arguments because the advisor model is pinned in parameters.model.
3. Make the Advisor prompt earn its tokens
The cost control comes from 3 choices:
- The app decides whether a second model is worth the spend.
- The executor stays cheap.
- The Advisor sees a compact prompt, not the whole transcript.
In this workflow, forward_transcript: false is deliberate. The Advisor receives the executor’s compact prompt argument, plus its own instructions. If you set forward_transcript: true, the Advisor can see the full parent conversation, which is useful for some agents but often defeats the token-saving goal.
Use the system prompt to tell the executor what belongs in the Advisor call:
That keeps the expensive model focused on the part where it changes the outcome.
4. Add specialist reviewers only when uncertainty splits
If the executor can identify different kinds of uncertainty, give it separate Advisor entries. Each entry is its own tool. Do not use a nested parameters.advisors roster.
The executor chooses the matching tool without passing a name argument. At most one Advisor entry can omit name; that unnamed entry becomes the default Advisor.
If you replay the conversation across requests, keep Advisor entries in a stable order. Advisor identity comes from each entry’s index in the tools array, so reordering or inserting entries can make schema-reviewer and cost-reviewer reconstruct each other’s memory.
5. Log cost and routing, not private content
Add telemetry where your app already records model calls. This recipe’s sample logs only model names, whether Advisor was enabled, finish reason, and usage keys.
Log:
executor_modeladvisor_modelor Advisor entry namedid_enable_advisorfinish_reasonusage.prompt_tokens,usage.completion_tokens,usage.total_tokens, andusage.costwhen returned- route or feature name, such as
budgeted_plan_review
Do not log:
- API keys
- cookies
- full prompts
- full advisor advice
- raw diffs
- user content unless your product already has an explicit retention policy
If your app uses the Responses API and your UI benefits from showing the second opinion as it arrives, set stream: true on that Advisor entry. It streams advice deltas for Responses clients, then still returns the completed advice item. Chat Completions ignores stream, and Messages streaming is planned.
Check your work
Validate the workflow through the path you’re adding.
Confirm:
- Routine tasks send no Advisor tool, or leave Advisor disabled by your budget gate.
- Advisor-worthy tasks include one
openrouter:advisorentry namedplan-reviewer. - The Advisor entry uses flat
parameters, notparameters.advisors. - The executor model is the cheap model from your config.
- The Advisor model is separately configurable.
- The Advisor prompt contains a compact uncertainty packet, not full diffs or the full conversation.
- Logs include model names, Advisor enablement, finish reason, and usage fields when present.
- Logs do not include
OPENROUTER_API_KEY, raw prompts, cookies, full diffs, or full generated content.
After you wire the pattern into your app, log routing telemetry like this. Treat usage_keys as provider-dependent. Assert the routing fields, then check that usage includes the billing fields your app depends on. finish_reason is typically stop after the server-side tool call resolves, but don’t treat the literal value as a fixed contract.
Token counts, cost, answer text, and provider-specific usage detail keys vary by model and prompt. Treat the routing fields and redaction boundary as the contract.
Next steps
- Read the Advisor reference for exact parameters, multiple-advisor rules, memory, streaming, and API-surface details.
- Add Web Search as an Advisor sub-tool when the reviewer needs current sources.
- Use Response Caching for repeated stable prefixes in the executor prompt.
- Add Human-in-the-Loop controls when a second opinion should pause for a person instead of another model.