RTK: The Rust Binary That Slashed My Claude Code Token Usage by 70%

March 1, 2026

15 min read

rtkclaude-codetoken-efficiencyserenamcprust

I ran git log --oneline in a Claude Code session. Fourteen lines of actual information. 4,200 tokens consumed. Commit hashes, author emails, GPG signatures, merge metadata, decorations — all of it dumped into the context window because that's what git log returns.

The next command, cat src/services/auth.service.ts, ate another 3,500 tokens. Then npm test output: 6,000 tokens of pass/fail noise where only the 3 failures mattered.

In 30 minutes, my session had burned through 150,000 tokens. Not on reasoning. Not on code generation. On reading command output.

RTK fixes this. One Rust binary, zero dependencies, and suddenly the same session costs 45,000 tokens. That's 70% less — for doing the exact same work.

The Hidden Tax: Why Command Output Is Killing Your Token Budget

Every time Claude Code runs a shell command, the full, unfiltered output goes straight into the context window. The model has to process all of it. And most of it is noise.

A Typical 30-Minute Claude Code SessionToken Efficiency

Without RTK

git log (full output)

4.2k0.8s

cat auth.service.ts

3.5k0.5s

npm test (all output)

6k12s

ls -la (recursive)

2.8k0.3s

grep across codebase

8.5k2.1s

~40 more commands...

125k—

Total Tokens150k

Cost (Sonnet)$2.25

Success Rate—

1500.0% of 10k token limit

With RTK

Best

rtk git log (compressed)

8400.8s

rtk read auth.service.ts

1.1k0.5s

rtk test npm test (failures only)

60012s

rtk ls (grouped)

5600.3s

rtk grep (deduplicated)

1.7k2.1s

~40 more commands...

40.3k—

Total Tokens45k

Cost (Sonnet)$0.68

Success Rate—

450.0% of 10k token limit

Token Savings: 70%

Better approach: With RTK

The math is simple: when 70% of your token budget goes to reading shell output, compressing that output is the single highest-leverage optimization you can make.

And this isn't about changing how you work. RTK operates as a transparent proxy — Claude Code sends commands through it, the output gets compressed, and the model sees only what matters. Same workflow, fraction of the cost.

What RTK Actually Does

RTK (Rust Token Killer) is a CLI proxy that sits between your AI agent and the shell. It intercepts command output and applies four compression strategies before the tokens hit the context window:

Rendering diagram...

💡 Drag to pan • Scroll to zoom • Click controls to zoom

1. Smart Filtering

Strips noise that has zero informational value to the model: comments in config files, blank lines, boilerplate headers, decorative separators, and metadata the agent will never act on.

2. Grouping

Aggregates similar items. Instead of listing 47 .tsx files individually, RTK groups them: components/ (47 .tsx files). The model knows there are 47 components without spending tokens on each filename.

3. Truncation

Preserves the head and tail of long outputs while cutting the redundant middle. Test output with 200 passing tests? You see the first few, a count, and the failures. That's all the model needs to decide what to fix.

4. Deduplication

Collapses repeated entries. Five identical warning lines become one line with (×5). Log files with thousands of repeated entries shrink to a handful.

git log Output

-Full git log with commit hashes, author emails, GPG signatures, timestamps, merge metadata, branch decorations — 4,200 tokens for 14 commits.

+RTK strips metadata, keeps commit messages and short hashes — 840 tokens for the same 14 commits. The model knows exactly what changed.

text

Installation: 60 Seconds to 70% Savings

Install RTK

Hook-First Setup (The Right Way)

RTK's hook-first architecture is the cleanest integration. A single init command installs a command hook that transparently rewrites operations — Claude Code doesn't even need to prefix commands with rtk.

Initialize RTK for Claude Code

After init, add the hook to your Claude Code settings. From there, every command flows through RTK automatically. No workflow changes, no command prefixes, no friction.

The Token Savings Breakdown

Not all commands are created equal. Some operations compress dramatically, others less so. Here's what I've measured across real sessions:

Operation	Raw Tokens	RTK Tokens	Savings
`ls` / `tree` (directory listing)	~2,800	~560	80%
`cat` / file reads	~3,500	~1,050	70%
`grep` / `rg` (search)	~8,500	~1,700	80%
`git status`	~1,200	~300	75%
`git log`	~4,200	~840	80%
`git diff`	~12,000	~960	92%
`npm test` / `pytest`	~6,000	~600	90%

The biggest wins come from test output and git diffs — exactly the operations that eat the most context in a typical coding session.

Tracking Your Savings

RTK has built-in analytics. After a few sessions, run:

RTK Token Savings Analytics

The rtk discover command analyzes your Claude Code session history and identifies commands you're running raw that RTK could compress. It even estimates the savings. This is how I found I was wasting 12,000 extra tokens per session on docker logs alone.

Why I'm Using RTK Right Now

I hit the wall. Not the technical wall — the quota wall.

When you use Claude Code daily for production work, token consumption adds up fast. I was running into quota limits mid-afternoon, right when I needed to push features. The frustrating part? Most of those tokens weren't going to code generation or reasoning. They were going to reading git diff output and test logs.

RTK gave me my afternoons back. Same sessions, same output quality, 70% less token consumption. I'm now consistently finishing full workdays within quota, and the sessions feel identical — because RTK is invisible. It's a hook. I don't prefix commands, I don't change my workflow, I don't think about it.

The second reason is context window quality. When the context window is bloated with raw command output, the model's attention is diluted. It has to process thousands of irrelevant tokens to find the signal. With RTK compressing outputs, the context stays lean — more room for actual code, actual reasoning, and actual conversation. The model performs noticeably better when it's not swimming through noise.

The Power Combo: RTK + Serena MCP

Here's where it gets interesting. RTK and Serena MCP attack the token efficiency problem from completely different angles, and together they create something greater than the sum of their parts.

Two Sides of Token Efficiency

-Using only one optimization layer: either you compress command output (RTK) or you navigate code surgically (Serena), but the other side still hemorrhages tokens.

+Stack both layers: Serena eliminates unnecessary file reads at the code navigation level. RTK compresses everything that still needs to go through the shell. Two-pronged attack on token waste.

text

What Serena Solves

Serena MCP gives your AI agent LSP-powered code navigation — the same "Go to Definition," "Find All References," and symbol-level editing that your IDE uses. Instead of cat-ing an entire 500-line file to find one function, the agent calls get_symbols_overview (200 tokens) and then reads only the target symbol body (50 tokens).

Serena eliminates unnecessary file reads at the source. The agent never asks to read a full file when a symbol lookup will do.

What RTK Solves

RTK compresses everything that still goes through the shell — test output, git operations, directory listings, search results, build logs. Even with Serena handling code navigation, there are dozens of shell commands per session that produce bloated output.

RTK compresses the output of commands the agent must still run.

The Combined Architecture

Rendering diagram...

💡 Drag to pan • Scroll to zoom • Click controls to zoom

Real-World Numbers: The Stack in Action

Here's a real refactoring session I ran last week — renaming a service method across 12 files:

Refactoring Session: Method Rename Across 12 FilesToken Efficiency

Vanilla Claude Code

cat 12 files to find method

42k18s

grep for all references

8.5k2.1s

git diff after changes

14k1.2s

npm test (full output)

6k15s

git log to check history

4.2k0.8s

Total Tokens74.7k

Cost (Sonnet)$1.12

Success Rate—

747.0% of 10k token limit

RTK + Serena MCP

Best

Serena: find_symbol + get_references

8001.2s

Serena: replace_symbol_body × 12

3.6k4.8s

RTK: compressed git diff

1.1k1.2s

RTK: test output (failures only)

60015s

RTK: compressed git log

8400.8s

Total Tokens7.0k

Cost (Sonnet)$0.10

Success Rate—

69.6% of 10k token limit

Token Savings: 91%

Better approach: RTK + Serena MCP

74,700 tokens vs 6,960 tokens. That's a 90.7% reduction for the exact same refactoring task. Same result, same code quality, ten times less cost.

The breakdown shows why neither tool alone is enough:

Serena eliminated the 42,000 tokens of unnecessary file reads — by far the biggest single savings
RTK compressed the remaining shell output from 32,700 tokens to 2,560 — a 92% compression on the operations that still needed the shell
Together, they brought the total from 74,700 to 6,960 — a savings that neither could achieve alone

Why They Don't Overlap

This is the key insight: RTK and Serena operate on completely different token sources. There's no redundancy.

Serena intercepts at the code navigation layer — replacing cat, grep-for-code, and full-file reads with LSP symbol queries. RTK intercepts at the shell output layer — compressing git, test, ls, docker, and every other command that produces text.

└── Token Sources in a Session

├── Code Navigation (Serena handles)

│ ├── file reads → symbol queries

│ ├── grep for code → find_symbol

│ └── full-file edits → replace_symbol_body

└── Shell Commands (RTK handles)

├── git status/log/diff → compressed

├── test output → failures only

├── ls/tree → grouped

├── docker/kubectl → filtered

└── build logs → truncated

They cover different surfaces. Stack them and you've sealed nearly every source of token waste in a coding session.

Setting Up the Full Stack

If you already have Serena MCP configured (and if you've read my previous post on Serena, you probably do), adding RTK takes 60 seconds:

Adding RTK to an Existing Serena + Claude Code Setup

That's it. Two tools, zero conflicts, complementary coverage. Every Claude Code session from this point forward runs at a fraction of the token cost.

Why You Should Be Using RTK

Let me be direct: if you use Claude Code (or any LLM-powered coding agent) and you're not compressing command output, you're wasting money and hitting quota limits unnecessarily.

Here's the case:

Zero workflow change. RTK is a hook. It's invisible after setup. You don't change how you work.
70% average token savings. Measured across real sessions, not synthetic benchmarks.
Longer sessions within quota. The same token budget covers 3x more work.
Better model performance. Less noise in the context window means the model focuses on what matters.
Single Rust binary. No dependencies, no runtime, no configuration files to maintain. It just works.

The cost of not using RTK is measured in tokens you burn every single session on output nobody — not you, not the model — actually needs to see.

The Decision

-Continue burning 150,000 tokens per session on uncompressed shell output. Hit quota limits. Pay more. Get worse model attention because the context window is full of git metadata and test boilerplate.

+Install one binary. Run one init command. Save 70% on every session going forward. Pair with Serena MCP for 90%+ total savings. Never think about it again.

text

What's Next

RTK is evolving fast. The rtk discover command already identifies optimization opportunities you're missing. The analytics (rtk gain --graph) let you track savings over time. And the command coverage keeps expanding — Docker, Kubernetes, GitHub CLI, linters, formatters, package managers.

The broader picture is this: token efficiency is the infrastructure layer of AI-assisted development. As models get more capable, we'll use them for longer, more complex sessions. The agents that win will be the ones with lean context windows — where every token carries signal, not noise.

RTK and Serena MCP are two pieces of that infrastructure. RTK compresses the shell. Serena compresses code navigation. Together, they've turned my Claude Code sessions from token-expensive sprints into all-day marathons.

Install RTK. Pair it with Serena. Your quota will thank you.

Resources

RTK (Rust Token Killer) — CLI proxy for LLM token compression
Serena MCP — LSP-based AI code navigation
Surgical Code Editing — Token efficiency patterns with symbolic operations
Serena MCP Deep Dive — Full guide to LSP-powered AI agents
Claude Code — Anthropic's terminal-based coding agent