Cutting Claude Usage by 80% via Delegation

I was on Claude Max 20x for a few months. Bumped up from 5x because I kept brushing the quota during longer Claude Code sessions. Felt productive at the time. Then I actually checked my usage one day and… I never came close to the 20x ceiling. Not once. I was paying for headroom I never used.

The 5x bumps were annoying though. Always at the wrong time. So either I stay at 20x and overpay, or I go back to 5x and rage-quit every time it cuts off mid-refactor. Neither felt right.

So I sat with it. What was Claude even doing that was burning tokens this fast?

I scrolled my recent sessions. The answer was kinda obvious in retrospect. Claude was spending most of its tokens TYPING. Writing out file contents. Pasting boilerplate. Filling in CRUD endpoints. The interesting parts (planning what to change, finding bugs, reviewing the output) were maybe 20% of what flowed through.

Which made me wonder. What if Claude only did the interesting parts, and something cheaper did the typing?

The hypothesis

The thinking part is what frontier models are uniquely good at. Reading existing code, deciding what to change, reviewing the muscle’s output for hallucinations, running tests, iterating on failures. The parts where you need taste and judgment.

The writing part is interchangeable. A cheap model is fine for this. DeepSeek V4 Flash specifically is wild for the price. It’s on OpenCode Go for $10/month, which gets you about $60 worth of usage. Stupid cheap.

So I wired up Claude as the brain and DeepSeek V4 Flash as the muscle. Brain reads the files, plans the changes, writes a structured brief, hands it to the muscle. Muscle emits the file contents. Brain applies them and runs the tests. If something fails, brain writes a targeted follow-up.

A 300-line bash shim handles all the plumbing. Delimited multi-file output (the muscle emits <<<FILE: path>>>...<<<END FILE>>> blocks, parser writes each one). Multi-turn sessions for KV cache reuse. Secret scanning so I don’t accidentally leak credentials to the muscle. 429 retry with backoff. The usual.

The benchmarks

Then I had to actually test it. Because “this should work” and “this works” aren’t the same thing.

I ran 10 benchmarks across roughly 30 arm variations. Mostly toy projects (FastAPI items API, URL shortener, Go slugifier) but with real tests and real pass/fail signals. Compared Claude alone (subagent with full tools) versus Claude as brain plus DeepSeek as muscle. Tried different brief shapes (vague vs structured). Tested context sizes up to 100 files inlined.

Numbers came out roughly:

Claude alone for a typical 30k-token task: about 30k Claude tokens
Brain + muscle pattern, same task: about 4-5k Claude tokens, plus 10k DeepSeek tokens

So Claude’s share dropped about 85%. Which translates to “fits inside a Max 5x quota easily” if you’re on subscription, or a way smaller invoice if you’re on API. Same outcome. All tests pass. Code quality matches.

The actually interesting finding

Going in, my hypothesis was that DeepSeek V4 Flash isn’t actually worse at writing code, it just needs the brain to hand it a clearer brief. After 10 benchmarks, that held up better than I expected.

On clean spec-driven tasks, flash and Opus produced near-identical code. The differences were stylistic. Optional[str] vs str | None. Explicit alias vs convention. Both pass the same tests, both pass mypy strict, both ship.

Where flash falls down is exactly where the brief is sloppy. The “tricky rules” test, flash got the case-sensitive scheme bug wrong because the spec didn’t emphasize case. The “vague brief” test, flash hallucinated a bug that wasn’t even there. Both cases, a structured FAIL / SUSPECT / FIX follow-up fixed it in one round.

So the takeaway isn’t “DS is bad, Claude is good”. It’s “don’t blame the cheap model. Tighten the brief first.” If you write a structured brief that maps the bug to a file and gives a one-sentence fix direction, flash produces the same fix as Opus would. The brain (Claude) maps that in seconds because it has the file tools. The muscle doesn’t need to re-derive it from scratch.

I also tested DeepSeek V4 Pro out of curiosity, figuring if Flash was already this good, Pro might close whatever small gap remained. It didn’t. Pro came out either identical to Flash on most runs, or slightly worse on a few. I don’t fully understand why, but it was consistent enough that I stopped second-guessing it. Flash stays.

KV cache hits across sessions

DeepSeek’s cache crystallizes after 2-3 requests through the same prefix. If your system message and opening brief shape stay consistent (and they do, because I keep using the same patterns), the prefix gets cached. I measured 37-63% cache hit rates on the bigger benchmarks. Free input token discount.

What it is, on GitHub

maestrode. MIT licensed. 300 LoC bash wrapper. 23 unit tests. Runnable demo at examples/quickstart.sh that fixes a deliberately buggy Python file end-to-end in about 30 seconds.

There’s a SKILL.md for Claude Code users so you can drop it into ~/.claude/skills/maestrode/SKILL.md and just say “maestrode on” in any session. Claude will start delegating code-writing to the cheap muscle and you don’t have to think about it.

To try it:

curl -fsSL https://raw.githubusercontent.com/doedja/maestrode/main/install.sh | bash
nvim ~/.config/maestrode/env  # add your API key
maestrode "say pong"

If you don’t have an API key, OpenCode Go is the cheapest path I’ve found. That’s my referral link, fair warning. The non-ref URL is opencode.ai/go, no pressure.

Honest take

I’m back on Max 5x now. The muscle calls are basically free at OpenCode Go pricing. Quota anxiety is gone.

But I want to be careful about overselling this. The benchmarks are toy projects. I haven’t done a real production refactor under this pattern yet. Vague briefs still hallucinate. Cross-cutting refactors with weird API constraints still need careful review.

If you’re paying for Claude (subscription or API) and you do a lot of spec-driven coding work (TDD, clear requirements, multi-file features), this should cut your Claude usage substantially without losing quality. If you mostly use Claude for ambiguous problems where you don’t know what you want until you see it, skip this. Use Claude directly.

For my workflow it works. That’s enough for now.