11. Smart Model Switching: Save Cost Without Losing Quality

11. Smart Model Switching: Save Cost Without Losing Quality — Yaiment

Why You Don't Always Need the Best Model

Here's a question worth sitting with: if you have access to the most capable AI model, why wouldn't you use it for everything?

Cost. But not just the obvious kind.

In Claude Code, you work within a session usage limit. Opus, the most powerful model, burns through that limit much faster than Sonnet or Haiku. More capable models do more internal reasoning per task. That reasoning costs more of your budget, whether you measure it in money or session credits.

Now think about what actually happens during a typical Claude Code session. You search files, read output, write a few lines, call a tool. Then search again. Most of those steps are routine. The hard reasoning (understanding a complex bug, designing an architecture, making a tricky judgment call) is maybe 20% of the total work.

If you pay expert rates for 100% of the work, you're spending full price on tasks that didn't need the expert.

The core question isn't "which model is best?" It's "which model is necessary for this specific step?"

The Advisor Strategy Explained

Anthropic released something called the advisor strategy. It's a way to answer that question automatically.

You designate a cheap model (Haiku or Sonnet) as the executor. It does the actual work: searching, reading, writing, calling tools. When it encounters something it judges to be genuinely hard, it calls the advisor (Opus) for guidance. Opus advises. The executor continues with that input.

For simple tasks, Opus is never called. For hard tasks, it steps in. The rest of the time, you're running at Haiku or Sonnet prices.

Think of it like a junior consultant and a senior partner. The junior handles the day-to-day work: emails, standard reports, routine analysis. When the engagement hits a complex strategic question, they call the senior partner. Not for every question. Just the ones where that extra judgment changes the outcome.

The numbers back this up. In Anthropic's evaluations:

Sonnet with Opus as advisor scored 2.7 percentage points higher on SWE-bench Multilingual (a standard test where AI systems solve real coding problems from GitHub) compared to Sonnet alone
That same setup reduced cost per task by almost 12%
Haiku with Opus as advisor on a web browsing benchmark scored 41.2%, more than double Haiku's solo score of 19.7%

Info

Tip: The advisor strategy is formally a Messages API feature. The Messages API is the interface developers use to build apps on top of Claude. It is the layer beneath tools like Claude Code. In that context, you add a specific parameter to your request that enables automatic escalation between models. Not every Claude Code user needs to know the API details, but understanding the concept helps you use the Claude Code equivalent below.

How to Use It in Claude Code

Claude Code doesn't have a direct "advisor mode" button. But it has something that gives you the same benefit.

Run /model in Claude Code. You'll see the standard options: default, Sonnet, Sonnet with extended context, Haiku. There's also a less obvious option: opus plan.

Selecting opus plan tells Claude Code:

Use the latest Opus when you're in plan mode
Use the latest Sonnet everywhere else

As of mid-2026 those aliases resolve to Opus 4.8 and Sonnet 4.6. The names update over time, so opus plan always points at the current versions.

Plan mode is the thinking-before-doing phase. When you activate it, Claude asks you clarifying questions first. Then it writes a plan.md file: a step-by-step roadmap you can read and edit before anything is actually changed. Once you approve the plan (or edit it), Claude switches to execution: writing files, running commands, making changes.

That two-step structure is the key. Planning uses Opus, because that's where the hard reasoning lives. Execution uses Sonnet, because following a clear plan is less demanding work.

# In Claude Code, type:
/model opus plan
 
# Status bar will show: Sonnet (current mode)
# Switch to plan mode → status bar switches to: Opus
# Switch back → status bar returns to: Sonnet

This is manual routing, not automatic escalation. But it gives you the same core benefit: expensive model for hard thinking, cheaper model for execution. In practice, this alone can make your session last significantly longer.

Info

Tip: Check your status bar to see which model is active. If you've configured it, the model name shows up there in real time. It's a useful sanity check when you switch modes.

Effort Levels: How Hard Opus Works

The latest Opus models have five effort levels. They control how much reasoning the model applies to each task.

The default effort depends on the Opus version. Opus 4.7 introduced xhigh, a tier sitting between high and max, and made it the default. Opus 4.8 defaults to high instead. Either way, you do not need to set it explicitly. The model applies its default unless you choose something else.

Low / Medium    Fast and cheap. Good for simple, well-defined tasks where you
                are cost-sensitive or need quick turnaround.
 
High            Balanced intelligence and cost. The default on Opus 4.8. Good
                when running multiple concurrent sessions and you want to
                stretch usage without a noticeable quality drop.
 
xhigh           Strong autonomy for complex work -- API design, legacy code
                migration, large codebase review -- without the runaway token
                usage that max can produce in long sessions. The default on
                Opus 4.7.
 
Max             Everything the model has. Use for your hardest problems
                when cost is not a concern. But watch for diminishing
                returns -- max often overthinks routine tasks and produces
                more elaborate solutions than you actually needed.

Think of it like briefing a contractor on how thorough to be. Low effort works fine for a quick patch. Max effort on a simple task is like having a contractor pull up floorboards to fix a single squeaky step.

One thing the benchmarks confirm: Opus at medium or high effort uses roughly 50-65% fewer tokens than older Opus versions did at their defaults. You get better results and pay less. The effort knob is how you stay in that range rather than accidentally running at max all day.

You can switch effort levels mid-session. If a task is turning expensive, drop to high. If you hit a genuinely hard problem, push to max for that task and return to your default afterwards.

Use /effort in the CLI to change your level at any time. Type /effort low, /effort max, and so on.

One thing worth knowing: max effort is not sticky. It applies only to your current session. When you start a new session, Opus returns to its default effort automatically (xhigh on Opus 4.7, high on Opus 4.8). Every other level -- low, medium, high, xhigh -- is sticky. Set it once and it persists until you change it again.

Specifying Tasks for Opus

The latest Opus rewards well-specified task descriptions. For anything you want it to run autonomously -- designing an API, migrating a module, reviewing a large codebase -- include four things in your first message:

Intent: what you want to achieve
Constraints: what it must not break, change, or touch
Acceptance criteria: how you will know the task succeeded
File locations: which specific files are relevant

This structure cuts down on clarifying questions and improves the first attempt. It also lets Opus work longer without interrupting you.

Think of it like a job brief. "Fix the login bug" leaves Opus guessing. "The login endpoint (auth/session.js, line 47) throws a 500 on expired tokens -- it should return 401, not log the error to console, and the task is done when tests/auth.test.js passes without any other tests breaking" is a brief Opus can act on immediately.

For long-running tasks where you have provided full context and trust the model to execute safely, auto mode removes the usual check-ins. It is still a research preview. A safety classifier vets each action before it runs, and Claude keeps going until done or blocked. Auto mode is available on all plans. It needs a recent model (Opus 4.6 or later, or Sonnet 4.6), and on Team and Enterprise an admin has to switch it on first. In the CLI, press Shift-tab to cycle to auto mode. In Claude Desktop or VS Code, pick it from the permission-mode selector.

Info

Tip: Ask for a notification when the task finishes. The latest Opus on a well-specified task can run autonomously for many minutes at a stretch. You do not need to watch.

What Changed in Opus 4.7

Opus 4.7 changed how the model behaves in four ways. Opus 4.8 builds on 4.7 and keeps all four. They are worth knowing before you start.

Adaptive thinking replaces fixed budgets. Opus 4.6 supported extended thinking with a fixed token budget -- you could specify exactly how much reasoning it did before answering. Opus 4.7 replaced this with adaptive thinking. The model decides when to reason carefully and when to skip it. A simple file lookup gets a fast answer. A complex architectural decision gets careful reasoning before any action. You cannot set a fixed budget, but you can prompt with natural language: "think through this carefully before starting" or "quick answer is fine here."

Responses are shorter by default. Opus 4.6 tended toward verbose output regardless of task complexity. Opus 4.7 calibrates length to the task. Simple lookups get short answers. Open-ended analysis gets longer ones. If you want more explanation, ask for it explicitly.

Opus reasons before reaching for tools. By default, Opus 4.7 reasons more and calls tools less. This generally improves quality on complex problems. But if you want it to read specific files before answering, say so. It will not always reach for them on its own.

Subagent spawning is more conservative. Opus 4.7 spawns fewer parallel subagents by default. If your task involves independent work that could run in parallel -- scanning multiple files, running multiple checks -- explicitly tell Opus to spawn subagents for each one.

When to Use It (And When to Test First)

The advisor strategy is not a setting you flip on and forget.

Before using it in any real workflow or production system, test it against your actual tasks. Run the same prompts through different setups (Haiku alone, Sonnet alone, Haiku with Opus in plan mode) and compare the results. Not 10 tests. Hundreds, if this matters enough to put into production.

In practice, escalation behavior isn't always predictable. During demo testing, the same question sent to Haiku was handled solo without calling Opus. The same question sent to Sonnet triggered an Opus call. Whether Haiku underestimated the difficulty or Sonnet was being appropriately cautious is hard to say. Different models calibrate their own confidence differently.

The demo also showed cases where the cheaper setup produced a better response, more specific and more useful to the end user, than the more expensive one. Opus isn't always better for every task. Sometimes Sonnet or Haiku is cleaner.

Saving tokens is only worth it if quality holds. Test first. Optimize second.

Summary

Most AI work is not uniformly hard. The advisor strategy routes easy tasks to cheap models and only calls Opus when the task genuinely needs it. In Anthropic's evaluations, this improved quality and reduced cost. In Claude Code, /model opus plan gives you a manual version: the latest Opus for planning, Sonnet for execution.

When Opus is doing the work, effort level matters. The default depends on the version -- xhigh on Opus 4.7, high on Opus 4.8 -- and both handle complex work without burning tokens unnecessarily. Reserve max for your genuinely hardest problems, and drop to high when running concurrent sessions.

Specify tasks with intent, constraints, acceptance criteria, and file locations. The more complete your brief, the less Opus needs to interrupt you. Adaptive thinking handles reasoning depth automatically -- you do not need to set a budget, but you can guide it with natural language.

Test your specific workload before committing to any setup. The benchmarks are encouraging, but your tasks are yours.

Next up: 12. Plan Mode: Let Claude Think Before It Touches Anything.

Discussion

Why You Don't Always Need the Best Model

The Advisor Strategy Explained

How to Use It in Claude Code

Effort Levels: How Hard Opus Works

Specifying Tasks for Opus

What Changed in Opus 4.7

When to Use It (And When to Test First)

Summary