🎯 What You'll Learn

•Understand rate limiting concepts and sliding window algorithms
•Implement a rate limiter using FastAPI dependencies
•Track request counts per client using in-memory storage
•Return appropriate HTTP 429 responses when limits are exceeded

Rate Limiting with Dependencies

What You'll Learn

Understand why rate limiting is essential for API protection
Implement a sliding window rate limiter using FastAPI dependencies
Use in-memory storage to track request counts per client
Return proper HTTP 429 responses when rate limits are exceeded

Theory

Rate limiting restricts the number of requests a client can make to your API within a given time window. It protects your server from abuse, ensures fair usage, and prevents resource exhaustion.

Why Rate Limiting Matters

Without rate limiting, a single client can:

Overwhelm your server with too many requests
Consume all available resources, denying service to others
Scrape your data or abuse your API endpoints
Drive up infrastructure costs unexpectedly

Common Rate Limiting Algorithms

Fixed Window

Counts requests in fixed time intervals (e.g., 100 requests per minute). Simple but can allow burst traffic at window boundaries.

|---Window 1---|---Window 2---|
  90 requests     90 requests

Sliding Window

Tracks individual request timestamps and counts requests within a rolling time period. Smoother than fixed windows and prevents boundary bursts.

     |------60 second window------|
  t1  t2  t3  ...  tN  [new request]

Token Bucket

A bucket holds tokens that are consumed per request and refilled at a steady rate. Allows short bursts while maintaining a long-term average rate.

Implementing Rate Limiting with Dependencies

FastAPI's dependency injection system is ideal for cross-cutting concerns like rate limiting. A dependency function can:

Inspect the incoming request
Check against stored rate limit data
Raise an exception if the limit is exceeded
Return useful information (like remaining requests)

def rate_limiter(request: Request):
    client_ip = request.client.host
    # ... check and update rate limit state ...
    if over_limit:
        raise HTTPException(status_code=429, detail="Rate limit exceeded")
    return remaining_requests

The Sliding Window Approach

In this lesson, we use a sliding window algorithm with in-memory storage:

Store timestamps - Keep a list of request times per client IP
Clean expired entries - Remove timestamps older than the window
Check the count - If requests in the window exceed the max, reject
Record the request - Add the current timestamp to the list

request_counts: dict = {}

def rate_limiter(request: Request):
    client_ip = request.client.host if request.client else "unknown"
    current_time = time.time()
    window = 60      # 1 minute
    max_requests = 10

    if client_ip not in request_counts:
        request_counts[client_ip] = []

    # Remove expired timestamps
    request_counts[client_ip] = [
        t for t in request_counts[client_ip]
        if current_time - t < window
    ]

    if len(request_counts[client_ip]) >= max_requests:
        raise HTTPException(status_code=429, detail="Rate limit exceeded")

    request_counts[client_ip].append(current_time)
    return max_requests - len(request_counts[client_ip])

Using Dependencies for Cross-Cutting Concerns

Dependencies are powerful for implementing concerns that span multiple endpoints:

Concern	Dependency Pattern
Rate Limiting	Track requests, enforce limits
Authentication	Validate tokens, return user
Authorization	Check permissions, raise 403
Logging	Record request metadata
Caching	Check cache before processing

HTTP 429 Too Many Requests

The HTTP 429 status code tells clients they have sent too many requests. Best practice is to include information about when they can retry:

raise HTTPException(
    status_code=429,
    detail="Rate limit exceeded"
)

Key Concepts

Sliding Window - A rolling time period that smoothly tracks request rates
Depends() - FastAPI's mechanism for injecting dependency return values into endpoints
request.client.host - Accesses the client's IP address from the request
HTTP 429 - The standard status code for rate limit exceeded responses
In-Memory Storage - Using Python dictionaries to track state (suitable for single-process apps)
Window Expiration - Cleaning out old timestamps to maintain accurate counts

Best Practices

Always identify clients reliably (IP address, API key, or authentication token)
Return informative error messages with 429 responses so clients know when to retry
Clean expired entries on every request to prevent memory leaks
Use in-memory storage for development; consider Redis or similar for production
Apply rate limiting selectively: not every endpoint needs the same limits
Consider different rate limits for different user tiers or API key levels
Document your rate limits in your API documentation so consumers can plan accordingly
Test rate limiting behavior with concurrent requests to verify correctness

Additional Resources

💡 Hint

Create a dependency function that tracks request timestamps per IP in a dictionary. Clean expired entries, check the count against the limit, and raise HTTPException(status_code=429) if exceeded.