Building a Real-Time Fraud-Detection Pipeline at 48K Events/Sec
When we started this project the team had a batch fraud-scoring job running every 15 minutes. That 15-minute window was the enemy — it was enough time for a fraudster to drain an account, trigger chargebacks, and disappear. The business was absorbing losses that had climbed to a painful number. The mandate was clear: make the decision at the moment of the transaction, not a quarter-hour later.
The target we set ourselves — 48,000 events per second with a p99 response under 15 milliseconds — sounded aggressive, but it was grounded in peak load projections with headroom built in. What follows is an honest account of how we hit it, what broke along the way, and what I'd do differently on a second pass.
Why Redis Streams, not Kafka
The most common question I get is why we chose Redis Streams over Kafka. Kafka is the default choice for high-throughput event pipelines, and for most teams at that scale it's the right one. We picked Redis Streams for three specific reasons: our ops team already ran Redis for session caching and had deep operational familiarity; the sub-millisecond append latency Redis offered shaved a predictable amount off p99; and the consumer group model in Redis Streams is close enough to Kafka's that the programming model wasn't a step backward.
The tradeoff we accepted is durability. Redis persistence — even with AOF at everysec — is softer than Kafka's replication guarantees. We mitigated this by writing every event to a PostgreSQL audit table asynchronously before acknowledging to the upstream producer. In the event of a Redis crash and partial data loss, we could replay from Postgres. It added complexity but met the compliance requirement for a complete audit trail.
The consumer architecture: goroutine pools, not goroutine fans
Our first prototype spun up one goroutine per incoming event. This looked clean in tests and fell apart immediately under production-like load — goroutine counts climbed into the hundreds of thousands, the Go runtime's scheduler started thrashing, and p99 latency spiked to over 200ms. The fix was a fixed-size worker pool: 512 goroutines reading from an in-process buffered channel that the stream reader populated. The number 512 wasn't magic — it was empirically determined by load-testing in staging and finding the point where adding more workers stopped reducing latency.
Each worker follows a strict pipeline: deserialize the event from MessagePack (we switched from JSON early — it's roughly 30% smaller and faster to parse), run the rule engine, make exactly one Redis call to read and conditionally increment a rate-limit counter with INCR + EXPIRE, write the decision to a result stream, and ACK the source message. The entire critical path touches the network exactly twice: one read (rate-limit check) and one write (result). That discipline is what keeps p99 in single-digit milliseconds for the hot path.
The rule engine: deterministic before probabilistic
We run two layers of rules in sequence. The first layer is a deterministic rule engine written as a plain Go struct with a slice of rule functions. Each rule is a pure function that takes an event and a context (which contains the Redis-fetched rate counters) and returns a decision: allow, deny, or escalate. Rules are ordered by cheapest-to-evaluate first. If any rule returns deny, we short-circuit and never evaluate the rest. This layer catches roughly 70% of fraud with essentially zero compute cost.
The second layer is a probabilistic model — a gradient-boosted classifier that scores events the first layer escalated. This call goes to an AWS Lambda function backed by a pre-warmed model loaded from S3 at cold start. We keep Lambda concurrency reserved and above zero at all times to eliminate cold starts from the p99 budget. The model score, combined with a configurable threshold, produces the final decision. This two-layer approach means we're only paying for ML inference on the roughly 30% of traffic that needs it.
Event sourcing: the audit log is the system of record
Every decision — allow or deny — is appended as an immutable event to a PostgreSQL events table with a composite key of (account_id, event_timestamp, event_id). We never update rows. This gives us a complete, replayable history of every fraud decision ever made, which turned out to be invaluable for two reasons beyond compliance: we can replay historical events through an updated model to evaluate it offline before deployment, and when a customer disputes a block we can produce an exact, timestamped audit trail showing every rule that fired.
- Event schema is versioned with a `schema_version` column — consumers check this and handle migrations gracefully.
- Postgres write is done asynchronously in a separate goroutine pool to keep it off the critical path; we accept eventual consistency here.
- A separate read model (materialized view, refreshed every 30 seconds) powers the fraud dashboard without hitting the events table directly.
- We use ULID instead of UUID for event IDs — lexicographic sort order means Postgres index pages stay hot for recent events.
Achieving 12ms p99: what actually moved the needle
After the goroutine pool fix, our p99 was sitting around 40ms — better, but not where we needed it. The next biggest gain came from co-locating the Go service with Redis in the same AWS Availability Zone and explicitly binding the service to the AZ-local Redis replica. Cross-AZ latency on AWS is typically 1–3ms, which sounds small until you realize it appears in every single event's critical path at 48K events/second.
The second major gain came from pre-allocating byte buffers for MessagePack serialization and returning them to a sync.Pool after each use. The Go garbage collector is good, but at 48K events/second the allocation pressure was creating GC pauses that showed up as occasional 20–30ms spikes in p99. After pooling, those spikes disappeared from our DataDog latency histogram. The last 5ms came from tuning the Redis connection pool — we were seeing connection wait time because the pool was undersized relative to the worker count. Sizing the pool to 1.5x the goroutine count eliminated connection contention.
Failure modes and what we learned
The failure that hurt the most in the first month was a Redis failover during a primary promotion. The sentinel-managed failover took about 4 seconds — well within Redis's documented bounds — but our worker pool had no circuit breaker. All 512 goroutines blocked on Redis reads simultaneously, the in-process channel filled up, and the stream reader stopped ACKing messages. Redis Streams' pending entry list grew to several hundred thousand messages. Recovery was clean once Redis was healthy, but the 4-second blackout surfaced in our SLO reporting as a missed availability target for the month.
The fix was a simple circuit breaker around every Redis call, implemented with a state machine that opens after 5 consecutive errors and probes with a single call every 500ms. When the circuit is open, workers fall back to a local cache of recent rate-limit counters — stale by definition, but good enough to make a reasonable decision for four seconds rather than blocking entirely. This is a concrete example of the 'graceful degradation' principle that sounds obvious and is surprisingly easy to skip when you're racing to hit a deadline.
The 15-minute batch window was the fraud window. Closing it — by processing 48K events/sec at 12ms p99 — cut fraud losses by the equivalent of ~$12M per year. The architecture choices that made that possible weren't glamorous: fixed goroutine pools, buffer reuse, same-AZ colocation, and a circuit breaker around every external call.
What I'd do differently
I'd invest earlier in a proper load-testing harness that can sustain 1.5x peak load for 30 minutes, not just spike to it for 30 seconds. Most of our late-stage latency discoveries happened because our initial load tests were too short to trigger GC pressure and too low-volume to fill the Redis connection pool. I'd also evaluate Apache Flink or Kafka Streams for the rule engine if the team already ran Kafka — the exactly-once semantics are worth it for financial data, even at the operational cost.
If you're building a real-time risk or fraud system and want an independent technical review of the architecture — or an experienced engineer to lead the build — this is exactly the kind of engagement I take on. The patterns here transfer across payment systems, identity verification pipelines, and any system where the cost of a late decision is measurable in dollars.
Open to select projects
Building something with AI?
I take on select AI engineering projects end-to-end — from React frontend to LLM pipeline on AWS. Tell me what you're building.