The Build Log
Practical notes on agentic AI, RAG, LLM engineering, Go infrastructure, and shipping full stack AI products in production — from one engineer's desk.
Agentic AI in Production: What Actually Works in 2026
Demos are easy; production agents are not. Here's what separates an agent you can trust with real users from a notebook that impresses on a Friday.
RAG Is Not Dead — It Just Grew Up
Every time context windows grow, someone declares RAG dead. Then they get the bill, and the hallucinations, and they come back to retrieval.
Observability for LLM Pipelines: Tracing, Evaluation Metrics, and Per-Request Cost Attribution
You cannot improve what you cannot observe. LLM pipelines have unique observability needs — token cost, quality drift, and latency across external APIs that don't behave like your own services.
Why Go Is Quietly Becoming the Language of AI Infrastructure
Python owns the notebook. But the gateways, orchestrators, and high-throughput pipelines around your model? More and more of them are written in Go.
Event-Driven Go in Practice: CQRS and Event Sourcing — When They Help and When They Hurt
CQRS and event sourcing are powerful patterns with real production benefits. They're also expensive to implement correctly, and most systems don't need them. Here's how to tell the difference.
Hiring a Freelance Full Stack AI Engineer: A Founder's Guide
You don't always need an AI team. Sometimes you need one engineer who can own the UI, the API, the infra, and the model layer — and actually finish.
Cutting LLM Costs in Production: Caching, Model Routing, and Graceful Fallbacks
The first LLM bill in production is always a surprise. Here are the specific techniques — semantic caching, model routing, fallback chains — that actually reduce it without making your product worse.
Multi-Tenant SaaS Architecture for an AI Reporting Engine at 48K Req/Min
Building SaaS for enterprise means one tenant's burst traffic cannot become another tenant's outage. At 48K requests per minute across 6 tenants, noisy-neighbor control isn't optional — it's the product.
Scaling a Translation Platform to 1M+ Requests/Day Across 70+ Languages
One million translation requests per day sounds like a scale problem. It is — but the harder problems are cache invalidation, language-pair cost asymmetry, and keeping p99 tolerable when a user submits a 50,000-word document.
Building a Real-Time Fraud-Detection Pipeline at 48K Events/Sec
Fraud doesn't wait for your system to warm up. Here's how we built a pipeline that processes 48,000 events every second and still responds in 12ms at the 99th percentile.