How I Saved 69% Token Costs with Knowledge Graphs and Output Compression
Here's a number that startled me: across two weeks of heavy AI-assisted coding, I was burning roughly 1.2 million tokens per day. Not all of it was useful context — a lot was noise. Command output with ASCII art. Verbose build logs. Files that were tangentially related but not relevant to the task. The AI was getting context, but it wasn't always the right context. And I was paying for every token.
The fix wasn't a bigger model or a longer context window — it was two tools I built that work together: graphify, a pipeline that transforms code into a structured knowledge graph before feeding it to the AI, and headroom, a Rust-powered output compressor that filters command output down to what actually matters. Together they cut my daily token consumption by 69% — from 1.2M to roughly 370K tokens — while improving the quality of the AI's responses because the context it was getting was denser and more relevant.
The problem: context stuffing kills both cost and quality
Most AI coding workflows follow a predictable pattern: dump files into the context window, run a command, dump the output, ask the AI to reason over it. The problem is that files have low signal-to-noise ratio when fed raw. A 500-line TypeScript file might contain only 30 lines relevant to the current task — the rest is boilerplate, unrelated logic, and imports that the AI already knows about. Command output is worse: build logs, lint output, and test runs produce thousands of lines when maybe 20 are actionable.
If you're paying per token — and almost every AI coding workflow is — every line of noise is money you're spending for no benefit. Worse, it dilutes the AI's attention. LLMs lose the middle of long contexts. Stuffing 10,000 tokens of mostly-irrelevant content into a prompt doesn't just cost more; it actively degrades the quality of the response on the parts you actually care about.
graphify: from raw code to structured knowledge
graphify addresses the file-context problem by running code through an AST parser and building a knowledge graph of the codebase. Instead of feeding the AI every line of every file, you feed it the graph: nodes for functions, classes, types, and modules; edges for call relationships, imports, inheritance, and data flow. The AI reasons over the structure of the codebase, not its raw text.
- AST extraction stage: walks the codebase, parses each file into its abstract syntax tree, and extracts semantic entities — functions with their signatures, classes with their methods, type definitions, and imports/exports.
- Graph construction stage: connects entities into a directed graph. A function node has edges to the functions it calls, the types it consumes, and the modules it imports from. The graph captures structure, not syntax.
- Community detection stage: runs clustering algorithms over the graph to find tightly-coupled modules — the 'god nodes' and their neighborhoods. This is what the AI gets when it asks 'which parts of the codebase are relevant to this change?'
- Query interface: BFS and DFS traversal from a seed node (the file or function you're working on) returns the relevant subgraph — compact, structured, and vastly smaller than the raw files that produced it.
The key insight is that the AI doesn't need to see the code to understand the codebase. It needs to see the structure. Once it understands which functions call which, which modules depend on which, it can ask for specific files when it needs them — and the graph tells it exactly which files to ask for. This is retrieval-augmented generation applied to codebases, and it drops the context required for codebase-level reasoning by roughly 60-70%.
headroom: compressing command output to what matters
headroom addresses the second half of the token problem: raw command output. I run `cargo build`, `tsc`, `vitest`, `git diff` dozens of times a day. The raw output is mostly noise. headroom is a Rust-based pipeline that applies command-specific filters to strip out everything except actionable information.
For TypeScript (`tsc`), headroom groups errors by file and error code — 300 type errors becomes 12 groups, an 83% reduction. For `cargo test`, it drops passing tests and keeps only failures — 200 test results becomes 3 failure blocks, a 90%+ reduction. For `git diff`, it collapses unchanged hunks and formats the remainder compactly — an 80% reduction. For `npm run lint`, it deduplicates violations by rule and file — an 84% reduction. Each filter is purpose-built for a specific command, and the overall effect is that command output shrinks to 10-30% of its original size while retaining every actionable piece of information.
- Cargo/TypeScript builds: errors grouped by file and error code, warnings collapsed into counts, success output dropped entirely.
- Test runners: vitest, playwright, cargo test — passing tests dropped, failures only with stack traces deduplicated.
- Git operations: status compacted to file lists, diffs with unchanged hunks collapsed, logs in condensed format.
- Package managers: pnpm, npm output stripped of progress bars, dependency trees condensed.
- Infrastructure: docker ps, kubectl get — table output compacted, log output deduplicated with occurrence counts.
The 69% number: where it comes from
I instrumented my workflow over a two-week period. Before graphify and headroom, daily token consumption averaged 1.2 million. After deploying both tools, it dropped to roughly 370K — a 69% reduction. The breakdown: roughly 40% of the savings came from graphify (smaller, denser file context), and roughly 29% came from headroom (compressed command output). The remaining 31% was the 'essential burn' — the prompt structure, the actual code changes, and the conversation itself.
Critically, the quality of the AI's output improved during this period — not because the model got better, but because the context it was receiving was more relevant per token. Fewer hallucinations about files that didn't exist. Fewer suggestions to modify code that wasn't actually related to the task. Better answers to architectural questions because the knowledge graph gave the AI a structural understanding of the codebase that raw file dumps never provided.
The tools themselves: a peek under the hood
graphify is built in TypeScript with a plugin architecture for different language parsers. The AST extraction uses tree-sitter for language-agnostic parsing with TypeScript, Go, and Python grammars loaded as plugins. The graph engine uses an in-memory adjacency list with persistence to JSON for caching between sessions. Community detection uses the Louvain algorithm implemented against the graph structure. The whole pipeline runs as a sidecar process that watches the filesystem and updates the graph incrementally — a file change only re-indexes that file and its direct neighbors, not the entire codebase.
headroom is written in Rust for speed and deployed as a CLI binary. The architecture is command → filter pipeline: each supported command has a dedicated filter function that parses its output format and applies compression rules. Filters are composable — a `cargo test` run pipes through the test filter, then the generic deduplication filter, then the truncation filter. The whole thing processes 100K+ lines of output in under 10ms, so it adds no perceptible latency to the workflow.
Why this matters beyond my own workflow
AI coding assistants are becoming the default way many developers work, but the economics of it are still immature. Token consumption is a recurring cost that scales with usage. At $0.01-0.03 per 1K tokens for frontier models, burning 1.2M tokens a day costs roughly $12-36/day — that's $360-1,080/month per developer. A 69% reduction brings that to $110-330/month, which is the difference between 'this pays for itself' and 'leadership wants to cap AI tool usage.'
More importantly, the techniques generalize. Knowledge graphs for codebase understanding work for any project large enough to benefit from structured context. Command output compression works for any workflow that pipes tool output to an AI. These aren't niche optimizations — they're infrastructure for the way development is increasingly done.
Context quality is the hidden multiplier in AI coding. Better context means better answers with fewer tokens. Building tooling that delivers better context — not just bigger context — is the highest-leverage investment you can make in an AI-assisted development workflow.
The takeaway
I didn't build graphify and headroom because I wanted to build developer tools — I built them because my AI token bill was nonsensical and the quality of responses on large codebases was inconsistent. The 69% savings was a side effect of solving the real problem: getting the right context to the AI in the right shape. If you're looking at your own AI token consumption and wondering where the money is going, the answer is probably in your context pipeline — and that's something you can fix. If you're building AI-assisted developer workflows and want to optimize the context layer — whether with knowledge graphs, output compression, or custom retrieval strategies — this is the kind of AI infrastructure and tooling I design and ship.
Open to select projects
Building something with AI?
I take on select AI engineering projects end-to-end — from React frontend to LLM pipeline on AWS. Tell me what you're building.