Ben Tran - Software Engineer, Automation Architect, and Technical Consultant

In a distributed system, a rate limiter isn't just a counter—it's a critical reliability component. If it fails, your services are vulnerable to DDoS; if it’s slow, you break your SLA latency targets.

The Bottleneck: Why Local State Fails

My first approach used a local Python deque. It was fast (O(1)), but it suffered from state fragmentation. In a load-balanced environment, a single user is routed to different pods. Each pod only sees a fraction of the traffic, making it impossible to enforce a global limit per IP.

The Pivot: Server-Side Atomicity

To achieve global consistency, we move the state to Redis. However, a naive implementation (Read -> Calculate -> Write) creates a race condition: two concurrent requests could both read the count as "4" and both decide to allow the request, letting 6 requests through instead of 5.

We solve this using Lua Scripting. By sending the logic to the Redis server, we execute the check and the update as a single, atomic transaction.

-- Atomic check and increment
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
local count = redis.call('ZCARD', key)
if count < limit then
    redis.call('ZADD', key, now, now)
    return 1
end

Performance at Scale

By using Redis Sorted Sets (ZSET), we keep the eviction logic extremely efficient. The ZREMRANGEBYSCORE command allows us to prune expired data in O(log N) time, ensuring that even under a massive credential-stuffing attack, our latency remains well under a 50ms threshold.

This pattern is the foundation for how we manage risk at scale: push the logic to the data, keep the database lean, and always design for atomicity.