Advanced FastAPI Patterns • Lesson 6

Rate Limiting with Dependencies

Learn how to implement rate limiting using FastAPI's dependency injection system with in-memory request tracking.

🎯 What You'll Learn

  • Understand rate limiting concepts and sliding window algorithms
  • Implement a rate limiter using FastAPI dependencies
  • Track request counts per client using in-memory storage
  • Return appropriate HTTP 429 responses when limits are exceeded

Rate Limiting with Dependencies

What You'll Learn

  • Understand why rate limiting is essential for API protection
  • Implement a sliding window rate limiter using FastAPI dependencies
  • Use in-memory storage to track request counts per client
  • Return proper HTTP 429 responses when rate limits are exceeded

Theory

Rate limiting restricts the number of requests a client can make to your API within a given time window. It protects your server from abuse, ensures fair usage, and prevents resource exhaustion.

Why Rate Limiting Matters

Without rate limiting, a single client can:

  • Overwhelm your server with too many requests
  • Consume all available resources, denying service to others
  • Scrape your data or abuse your API endpoints
  • Drive up infrastructure costs unexpectedly

Common Rate Limiting Algorithms

Fixed Window

Counts requests in fixed time intervals (e.g., 100 requests per minute). Simple but can allow burst traffic at window boundaries.

|---Window 1---|---Window 2---|
  90 requests     90 requests

Sliding Window

Tracks individual request timestamps and counts requests within a rolling time period. Smoother than fixed windows and prevents boundary bursts.

     |------60 second window------|
  t1  t2  t3  ...  tN  [new request]

Token Bucket

A bucket holds tokens that are consumed per request and refilled at a steady rate. Allows short bursts while maintaining a long-term average rate.

Implementing Rate Limiting with Dependencies

FastAPI's dependency injection system is ideal for cross-cutting concerns like rate limiting. A dependency function can:

  1. Inspect the incoming request
  2. Check against stored rate limit data
  3. Raise an exception if the limit is exceeded
  4. Return useful information (like remaining requests)
def rate_limiter(request: Request):
    client_ip = request.client.host
    # ... check and update rate limit state ...
    if over_limit:
        raise HTTPException(status_code=429, detail="Rate limit exceeded")
    return remaining_requests

The Sliding Window Approach

In this lesson, we use a sliding window algorithm with in-memory storage:

  1. Store timestamps - Keep a list of request times per client IP
  2. Clean expired entries - Remove timestamps older than the window
  3. Check the count - If requests in the window exceed the max, reject
  4. Record the request - Add the current timestamp to the list
request_counts: dict = {}

def rate_limiter(request: Request):
    client_ip = request.client.host if request.client else "unknown"
    current_time = time.time()
    window = 60      # 1 minute
    max_requests = 10

    if client_ip not in request_counts:
        request_counts[client_ip] = []

    # Remove expired timestamps
    request_counts[client_ip] = [
        t for t in request_counts[client_ip]
        if current_time - t < window
    ]

    if len(request_counts[client_ip]) >= max_requests:
        raise HTTPException(status_code=429, detail="Rate limit exceeded")

    request_counts[client_ip].append(current_time)
    return max_requests - len(request_counts[client_ip])

Using Dependencies for Cross-Cutting Concerns

Dependencies are powerful for implementing concerns that span multiple endpoints:

ConcernDependency Pattern
Rate LimitingTrack requests, enforce limits
AuthenticationValidate tokens, return user
AuthorizationCheck permissions, raise 403
LoggingRecord request metadata
CachingCheck cache before processing

HTTP 429 Too Many Requests

The HTTP 429 status code tells clients they have sent too many requests. Best practice is to include information about when they can retry:

raise HTTPException(
    status_code=429,
    detail="Rate limit exceeded"
)

Key Concepts

  • Sliding Window - A rolling time period that smoothly tracks request rates
  • Depends() - FastAPI's mechanism for injecting dependency return values into endpoints
  • request.client.host - Accesses the client's IP address from the request
  • HTTP 429 - The standard status code for rate limit exceeded responses
  • In-Memory Storage - Using Python dictionaries to track state (suitable for single-process apps)
  • Window Expiration - Cleaning out old timestamps to maintain accurate counts

Best Practices

  • Always identify clients reliably (IP address, API key, or authentication token)
  • Return informative error messages with 429 responses so clients know when to retry
  • Clean expired entries on every request to prevent memory leaks
  • Use in-memory storage for development; consider Redis or similar for production
  • Apply rate limiting selectively: not every endpoint needs the same limits
  • Consider different rate limits for different user tiers or API key levels
  • Document your rate limits in your API documentation so consumers can plan accordingly
  • Test rate limiting behavior with concurrent requests to verify correctness

Additional Resources

💡 Hint

Create a dependency function that tracks request timestamps per IP in a dictionary. Clean expired entries, check the count against the limit, and raise HTTPException(status_code=429) if exceeded.

Ready to Practice?

Now that you understand the theory, let's put it into practice with hands-on coding!

Start Interactive Lesson