10. Choosing Models: /model, Effort Levels, and /fast

10. Choosing Models: /model, Effort Levels, and /fast — Yaiment

You Are Not Stuck With One Model

By default, Claude Code uses Sonnet for almost everything. That is a good default. But Claude Code ships with three model families, and you can switch between them inside a session with one command:

/model

Type that and Claude opens a small picker. Pick a different model and the rest of the session runs on it. Type /model again to change back.

Switching matters more than people think. The wrong model is slow when it should be fast, expensive when it should be cheap, or shallow when it should be careful. The right model fits the task.

The Three Families

There are three Claude model families. The latest at the time of writing:

Opus 4.8 is the heavyweight. Best for hard reasoning, large refactors (rewriting existing code without changing what it does), security analysis, and anything where being right matters more than being fast. Most expensive per token.
Sonnet 4.6 is the daily driver. Good code, good reasoning, balanced cost. This is the Claude Code default and where most of your sessions should live.
Haiku 4.5 is the speed runner. Fast and cheap. Use it for small mechanical tasks: rename a variable, format a file, write a one-line script.

A useful analogy: Opus is the senior architect, Sonnet is the senior developer, Haiku is the helpful intern. You do not call the architect to format a JSON file. You do not ask the intern to redesign your login system.

Info

Tip: When you do not know which model to pick, stay on Sonnet. It is the default for a reason.

Effort Levels on Opus 4.8

Opus 4.8 has the fullest set of effort levels. You tell it how hard to think, and it adjusts how much it reasons through a problem before answering. Think of it as desk time you authorize. At low, the model answers quickly from the top of its head. At max, it works through the problem step by step, checking its own reasoning before replying. There are five levels. Sonnet 4.6 takes an effort setting too, but the top two, xhigh and max, are Opus only.

low      → quick answer, tight scope. Good for simple tasks where speed matters.
medium   → a little more thought. Useful when you care about cost but want decent quality.
high     → balanced. Anthropic's minimum recommendation for work where quality counts.
xhigh    → extra reasoning for harder coding and agentic work. (New in 4.7.)
max      → deep, careful reasoning for the hardest problems.

The model is the same. The thinking time it spends is different. Higher effort costs more tokens (the small chunks of text the model reads and writes) and more real-world wait time. So it is a dial, not a default.

One thing to know: at max, the model sometimes overthinks. It can reach a correct answer and then second-guess itself, which wastes time without improving the result. That is what people mean when they say effort has diminishing returns at the top end. If xhigh solves your problem, max probably will not do better.

A few things have shifted since 4.6. Effort matters more than it used to, so it pays to tune it. For everyday coding, high is a sensible starting point. It is what Claude Code uses by default on Opus 4.8. Push to xhigh on harder problems. The model also respects effort strictly at the low end: at low and medium it scopes its work tightly to what you asked, where older models would sometimes go above and beyond on their own. If you see shallow reasoning at low or medium on a complex problem, raise effort first; prompt around it second.

Switch effort levels through /model or by adding the setting to your session config. Live on high or xhigh for most coding work. Drop to medium or low when you care about speed or cost more than depth. Reach for max only when a wrong answer is genuinely expensive.

/fast on Opus

There is one more lever. Claude Code has a /fast mode that runs Opus with faster output. It does not downgrade to a smaller model. It runs the same Opus brain but with different generation settings that prioritize speed. Fast mode works on Opus 4.8, 4.7, and 4.6.

/fast        → toggle fast mode on or off

Use /fast when you are in an Opus session and you want answers quicker. The output is the same Opus model, just delivered back to you faster as it is generated. Available on Opus 4.8, 4.7, and 4.6.

Info

Warning: /fast is not the same as switching to Haiku. The model is still Opus. The cost is still Opus cost. You are buying speed, not savings.

A Simple Decision Rule

Most days you do not need to think about this at all. When you do, here is a rule of thumb:

Small mechanical task (rename, format, one-liner) → Haiku 4.5
Normal coding work, the usual stuff               → Sonnet 4.6
Hard reasoning, refactor, security, migration     → Opus 4.8 (xhigh)
Worst-case wrong answer is very costly            → Opus 4.8 (max)
You need speed and you are already on Opus         → /fast

If you find yourself switching models several times a day, that is a sign your workflow is mature. If you stay on Sonnet forever and never feel pain, that is also fine. The point is that the lever exists when you need it.

Why You Might Pick Haiku Even When Sonnet Is Free

A common mistake: "I have a plan that includes Sonnet, so why use Haiku?" Two reasons.

One, speed. Haiku replies fast. For repetitive work where you are running the same kind of prompt across many files, the time difference adds up.

Two, conversation length. Every message in your session takes up space in the model's context window, which is the amount of text it can hold in memory at once. Think of it like a whiteboard: once it fills up, the oldest parts get erased. Haiku is cheaper per token, so you can keep a longer conversation alive on the same budget before anything gets erased. If you are doing a long session, Haiku stretches the runway.

But there is a catch. Haiku is genuinely worse at hard reasoning. Do not use it to design an architecture or debug a subtle race condition (a bug that appears only when two things happen in an unpredictable order). The cost saving is not worth a wrong answer.

Looking Ahead: The Advisor Strategy

This page covers the basics of switching models. There is a more advanced pattern called the advisor strategy, where you let a stronger model plan and a cheaper model execute. It saves cost without giving up quality on the parts of the work that matter.

That deeper workflow is the next page, 11. Smart Model Switching: Save Cost Without Losing Quality. Master the /model switch first. The advisor strategy builds on top of it.

Summary

/model is one of the most useful commands in Claude Code, and most users never touch it. Three model families, an effort knob on Opus 4.8, and a /fast switch on Opus. Match the model to the task: Haiku for mechanical, Sonnet for daily, Opus for hard. When the cost of being wrong is high, lift the effort level. When you are tired of waiting, switch lanes.

Next up: 11. Smart Model Switching: Save Cost Without Losing Quality.