Scaling a Translation Platform to 1M+ Requests/Day Across 70+ Languages
Most teams reach for a translation API, wrap it in a thin service, and move on. That works until you're paying per character at scale — and at 1M+ daily requests across 70+ languages, the bill from a naive implementation would have been prohibitive. The challenge wasn't just throughput; it was building a system where the architecture itself reduced cost, while keeping latency low enough that synchronous translation felt instant for users and asynchronous translation for large documents didn't eat into SLAs.
The stack we settled on mixed languages on purpose: a Node.js API gateway for orchestration and protocol translation, a .NET Core engine for the CPU-bound post-processing and formatting work where C# is genuinely faster, and a Next.js frontend that handled both the user-facing product and the internal dashboard. Running multiple language runtimes adds operational surface area, but it meant each layer was well-suited to its actual workload instead of fighting the wrong tool for the job.
The two request shapes: synchronous vs. asynchronous
Translation requests fall into two dramatically different shapes, and conflating them in a single code path is one of the earliest mistakes. Short-form requests — UI strings, product descriptions, headlines — are typically under 500 characters. Users expect them synchronously, usually within 200ms. Long-form requests — documents, legal contracts, technical manuals — can be 50,000 characters or more. These must be asynchronous. Handling them synchronously isn't just slow; it ties up connections and worker threads for seconds at a time, creating head-of-line blocking that degrades the entire service.
We set a hard threshold at 2,000 characters. Requests below it hit the synchronous path: Node.js orchestrator calls the translation provider, caches the result, returns inline. Requests above it enter the queue path: we return a job ID immediately, an SQS message is enqueued, a .NET Core worker picks it up, processes it in chunks (we chunk at 1,000 characters with overlap to preserve sentence context at boundaries), and writes the result to S3. The frontend polls a lightweight status endpoint — not the main API — and fetches the result from a signed S3 URL when ready. This decoupling was the single biggest architectural decision in the system.
Caching: three layers, each solving a different problem
The translation provider charges per character. Every cache hit is a cost saving and a latency win — the best kind of optimization. We built three independent caching layers, each targeting a different hit rate and TTL profile.
- L1: In-process LRU cache in Node.js (128MB limit) — catches repeated identical strings within the same process during a hot traffic burst. Hit rate ~18%. TTL: 10 minutes.
- L2: Redis cluster keyed by (source_text_hash, source_lang, target_lang) — the main shared cache across all Node.js instances. Hit rate ~61%. TTL: 7 days for stable content, 1 hour for dynamic content flagged at request time.
- L3: PostgreSQL content-addressed store for translations that have been verified by human reviewers — permanent, never evicted. Hit rate ~8% but these are the highest-value strings (marketing copy, legal text) where quality matters most.
- Cache key includes a model_version field — when we switch translation providers or model versions, we can selectively bust the cache without a full flush.
Combined hit rate across all three layers sat at about 87% on a typical day. That means only 13% of requests actually reached the translation provider — directly proportional to our API cost. The 87% hit rate required discipline: we normalized whitespace and punctuation before hashing (a trailing space shouldn't be a cache miss), lowercased language codes, and stripped HTML before translation while preserving the tag structure to reinsert afterward.
Language-pair cost asymmetry and provider routing
Not all language pairs are equal — in cost, quality, or availability. English-to-French from a major provider is cheap and excellent. English-to-Swahili may be expensive, lower quality, or handled by a specialist provider entirely. Treating all 70+ language pairs as identical in your routing logic is a mistake that becomes expensive quickly.
We built a provider router as a configuration table in Postgres, keyed by (source_lang, target_lang, content_type). Each row specifies the primary provider, a fallback provider, a cost coefficient, and a quality tier. The Node.js orchestrator reads this table (cached in Redis with a 5-minute TTL) and routes accordingly. When we onboarded a new specialist provider for low-resource languages, we updated the routing table without touching application code. This made A/B testing provider quality straightforward — split traffic at the routing layer, compare quality scores offline.
Queue architecture: SQS, dead-letter queues, and backpressure
The asynchronous path uses an SQS Standard queue with a visibility timeout of 5 minutes — long enough for a 50,000-character document to process, short enough that a crashed worker doesn't hide a message for too long. Each SQS message carries only the job ID and a pointer to the request metadata in DynamoDB; we never put the full document text in the SQS message to avoid hitting the 256KB message size limit and to keep message processing cheap.
After 3 failed processing attempts, SQS moves messages to a dead-letter queue. We alert on DLQ depth in DataDog and have a runbook for manual replay. The most common failure mode in production was a downstream provider returning 429s (rate limited) during burst traffic. The fix was exponential backoff with jitter inside the .NET Core worker, and a separate 'burst queue' with lower throughput limits that we route traffic to when the main queue depth exceeds a threshold — a simple form of backpressure that prevents us from amplifying a provider outage into a full queue meltdown.
Handling the polyglot frontend: Next.js and i18n at the edge
The Next.js frontend served both the product UI and the internal operations dashboard. For the product UI, we used Next.js's built-in i18n routing with locale subpaths. But we had a wrinkle: the platform's own UI needed to be available in 12 interface languages, independent of the content being translated. We handled this with a combination of static generation for the common strings (extracted at build time, baked into the bundle per locale) and runtime fetching for dynamic content like user-generated strings that had been translated and cached.
One non-obvious optimization: right-to-left language support (Arabic, Hebrew, Persian) required CSS direction changes that touched layout, not just text. We handled this by injecting a `dir` attribute on the `<html>` element based on the active locale and using logical CSS properties (`margin-inline-start` instead of `margin-left`) throughout the component library. It sounds like a small thing until your Arabic users report that the entire layout is mirrored and you realize every component needs auditing.
What I'd change on a second pass
The biggest architectural regret is how we handled cache invalidation for content that changed on the source side. We had no mechanism to detect that a source string had been edited and the cached translation was now stale. In practice, editors would update source copy and the old translation would serve for up to 7 days. We eventually built a content hash-based invalidation system, but it should have been in the design from day one — stale translations are a correctness problem, not just a freshness problem.
At 1M+ daily requests across 70+ languages, the architecture itself is the cost control. An 87% cache hit rate isn't just a performance win — it's the difference between a sustainable unit economics story and an API bill that scales linearly with growth.
If you're building a localization platform, an internationalization layer for an existing product, or any system where polyglot content delivery is a core requirement, the patterns here are transferable. This kind of architecture design and hands-on build is exactly the work I take on as select independent engagements — reach out if you want to talk through your specific constraints.
Open to select projects
Building something with AI?
I take on select AI engineering projects end-to-end — from React frontend to LLM pipeline on AWS. Tell me what you're building.