November 8, 2025

Scaling Real-Time Rental APIs to 10K Requests/Second

By CleverBusiness Engineering Team

When we first launched our rental management APIs, we were handling around 100 requests per second. Fast forward 8 months, and we're now processing over 10,000 requests/sec during peak hours. Here's how we scaled our AWS infrastructure to handle this growth.

The Challenge

Rental inventory systems require real-time availability checks. When a customer searches for available equipment, the API needs to:

  • Query availability across multiple locations
  • Check reservation conflicts
  • Calculate pricing based on duration and distance
  • Return results in <200ms

At scale, this becomes challenging. A single search might trigger 50+ database queries across different tables and services.

Architecture Evolution

Phase 1: Monolithic API (0-100 req/sec)

Initially, we ran everything on a single EC2 t3.medium instance with PostgreSQL on RDS. This worked fine for early testing but started showing latency issues around 80 req/sec.

Phase 2: Read Replicas + Caching (100-1K req/sec)

We added:

  • RDS Read Replicas: Offloaded all read queries to 2 read replicas
  • ElastiCache (Redis): Cached frequently accessed inventory data with 5-minute TTL
  • CloudFront: Cached static responses for common search patterns

Result: Average response time dropped from 450ms to 120ms. Cache hit rate: 68%.

Phase 3: Microservices + Event-Driven (1K-10K req/sec)

We split the monolith into specialized microservices:

  • Inventory Service: Handles availability queries (DynamoDB + ElastiCache)
  • Pricing Service: Calculate rates (Lambda + DynamoDB)
  • Booking Service: Reservations and conflicts (RDS PostgreSQL)
  • Search Service: Aggregates results (API Gateway + Lambda)

Communication between services uses SQS for async operations and direct API calls for real-time needs.

Key Technologies Used

Infrastructure:
- AWS EC2 Auto Scaling Groups (3-12 instances based on load)
- Application Load Balancer with health checks
- API Gateway for rate limiting (1000 req/sec per API key)

Databases:
- RDS PostgreSQL Multi-AZ (db.r5.xlarge)
- 2 Read Replicas (db.r5.large)
- DynamoDB for hot data (on-demand capacity)

Caching:
- ElastiCache Redis cluster (cache.r5.large, 3 nodes)
- CloudFront with custom cache policies

Monitoring:
- CloudWatch custom metrics every 1 minute
- X-Ray for distributed tracing
- Real-time dashboards in Grafana

Performance Metrics

Metric Before After
Avg Response Time 450ms 85ms
P99 Response Time 1.2s 280ms
Max Throughput 100 req/sec 10,500 req/sec
Monthly AWS Cost $180 $2,400

Lessons Learned

1. Cache aggressively, invalidate carefully: We saw a 70% reduction in database load after implementing multi-layer caching. The key is having a solid cache invalidation strategy.

2. DynamoDB for hot data works: Moving real-time inventory tracking from PostgreSQL to DynamoDB reduced query latency from 45ms to 8ms on average.

3. Auto-scaling needs tuning: Our first auto-scaling policy was too conservative. We now scale up at 60% CPU instead of 80%, preventing latency spikes.

4. Monitor everything: CloudWatch custom metrics saved us multiple times. We track API latency, cache hit rates, database connection pools, and queue depths in real-time.

Next Steps

We're currently working on:

  • Implementing GraphQL for more efficient queries
  • Moving to containerized deployments with ECS Fargate
  • Geographic distribution with multi-region RDS
  • ML-based demand prediction for auto-scaling

Interested in learning more about our infrastructure? Get in touch with our team.