Field Notes

The Build Log

Practical notes on agentic AI, RAG, LLM engineering, Go infrastructure, and shipping full stack AI products in production — from one engineer's desk.

AI OptimizationHire MeDeveloper Tools

I Cut AI Token Costs by 60% Last Month — Here's How I Can Do It for Your Engineering Org

Your AI coding assistants are burning money. I built a custom metering and optimization stack that cut token costs by 60% in one month — and I can build it for your team. Here's what's in the stack and how to get started.

Jun 25, 202610 min read

Agentic RAGAI ArchitectureLLM

Agentic RAG: Building AI Agents That Retrieve, Reason, and Act

Plain RAG answers questions. Agentic RAG answers questions, decides what to ask next, calls tools, and iterates until it has a grounded answer. Here's the architecture that makes it work in production.

Jun 23, 202611 min read

Developer ToolsAIOptimization

How I Saved 69% Token Costs with Knowledge Graphs and Output Compression

AI coding assistants burn tokens fast. Here's how a knowledge graph pipeline and an output compression layer together cut my token usage by 69% — without losing context quality.

Jun 20, 20269 min read

Vector RAGRetrievalProduction

Vector RAG in Production: Why Cosine Similarity Alone Is Costing You Quality

If your vector RAG is just cosine similarity on OpenAI embeddings, you're leaving half the retrieval quality on the table. Here's what production vector search actually needs: hybrid retrieval, reranking, and chunking that understands your data.

Jun 18, 20269 min read

Token OptimizerLLM CostProduction

The Token Optimizer Stack: 7 Levers That Cut LLM Costs by 50% or More

Reducing LLM costs isn't about picking a cheaper model — it's a stack of 7 levers, each contributing 10-30% savings. Pull all seven and you cut total spend in half. Here's every lever, measured from production.

Jun 15, 202610 min read

AI ConsultingHire MeAI/ML

What AI/ML Consulting Actually Delivers — A 4-Week Engagement Breakdown

Hiring an AI/ML consultant sounds abstract until you see the actual deliverables. Here's exactly what a 4-week AI consulting engagement produces — week by week, with real output at every stage.

Jun 10, 20268 min read

Agentic AILLMProduction

Agentic AI in Production: What Actually Works in 2026

Demos are easy; production agents are not. Here's what separates an agent you can trust with real users from a notebook that impresses on a Friday.

May 28, 20268 min read

RAGLLMRetrieval

RAG Is Not Dead — It Just Grew Up

Every time context windows grow, someone declares RAG dead. Then they get the bill, and the hallucinations, and they come back to retrieval.

May 12, 20267 min read

ObservabilityLLMProduction

Observability for LLM Pipelines: Tracing, Evaluation Metrics, and Per-Request Cost Attribution

You cannot improve what you cannot observe. LLM pipelines have unique observability needs — token cost, quality drift, and latency across external APIs that don't behave like your own services.

May 9, 202611 min read

GoInfrastructureAI

Why Go Is Quietly Becoming the Language of AI Infrastructure

Python owns the notebook. But the gateways, orchestrators, and high-throughput pipelines around your model? More and more of them are written in Go.

Apr 22, 20266 min read

GoArchitecturePatterns

Event-Driven Go in Practice: CQRS and Event Sourcing — When They Help and When They Hurt

CQRS and event sourcing are powerful patterns with real production benefits. They're also expensive to implement correctly, and most systems don't need them. Here's how to tell the difference.

Apr 17, 202612 min read

HiringFreelanceAI Engineering

Hiring a Freelance Full Stack AI Engineer: A Founder's Guide

You don't always need an AI team. Sometimes you need one engineer who can own the UI, the API, the infra, and the model layer — and actually finish.

Apr 3, 20266 min read

LLMCostProduction

Cutting LLM Costs in Production: Caching, Model Routing, and Graceful Fallbacks

The first LLM bill in production is always a surprise. Here are the specific techniques — semantic caching, model routing, fallback chains — that actually reduce it without making your product worse.

Mar 28, 202610 min read

SaaSAIArchitecture

Multi-Tenant SaaS Architecture for an AI Reporting Engine at 48K Req/Min

Building SaaS for enterprise means one tenant's burst traffic cannot become another tenant's outage. At 48K requests per minute across 6 tenants, noisy-neighbor control isn't optional — it's the product.

Mar 5, 202613 min read

ArchitectureScaleNode.js

Scaling a Translation Platform to 1M+ Requests/Day Across 70+ Languages

One million translation requests per day sounds like a scale problem. It is — but the harder problems are cache invalidation, language-pair cost asymmetry, and keeping p99 tolerable when a user submits a 50,000-word document.

Feb 11, 202611 min read

GoAWSArchitecture

Building a Real-Time Fraud-Detection Pipeline at 48K Events/Sec

Fraud doesn't wait for your system to warm up. Here's how we built a pipeline that processes 48,000 events every second and still responds in 12ms at the 99th percentile.

Jan 21, 202612 min read