Scaling Real-Time Rental APIs to 10K Requests/Second
When we first launched our rental management APIs, we were handling around 100 requests per second. Fast forward 8 months, and we're now processing over 10,000 requests/sec during peak hours. Here's how we scaled our AWS infrastructure to handle this growth.
The Challenge
Rental inventory systems require real-time availability checks. When a customer searches for available equipment, the API needs to:
- Query availability across multiple locations
- Check reservation conflicts
- Calculate pricing based on duration and distance
- Return results in <200ms
At scale, this becomes challenging. A single search might trigger 50+ database queries across different tables and services.
Architecture Evolution
Phase 1: Monolithic API (0-100 req/sec)
Initially, we ran everything on a single EC2 t3.medium instance with PostgreSQL on RDS. This worked fine for early testing but started showing latency issues around 80 req/sec.
Phase 2: Read Replicas + Caching (100-1K req/sec)
We added:
- RDS Read Replicas: Offloaded all read queries to 2 read replicas
- ElastiCache (Redis): Cached frequently accessed inventory data with 5-minute TTL
- CloudFront: Cached static responses for common search patterns
Result: Average response time dropped from 450ms to 120ms. Cache hit rate: 68%.
Phase 3: Microservices + Event-Driven (1K-10K req/sec)
We split the monolith into specialized microservices:
- Inventory Service: Handles availability queries (DynamoDB + ElastiCache)
- Pricing Service: Calculate rates (Lambda + DynamoDB)
- Booking Service: Reservations and conflicts (RDS PostgreSQL)
- Search Service: Aggregates results (API Gateway + Lambda)
Communication between services uses SQS for async operations and direct API calls for real-time needs.
Key Technologies Used
Infrastructure:
- AWS EC2 Auto Scaling Groups (3-12 instances based on load)
- Application Load Balancer with health checks
- API Gateway for rate limiting (1000 req/sec per API key)
Databases:
- RDS PostgreSQL Multi-AZ (db.r5.xlarge)
- 2 Read Replicas (db.r5.large)
- DynamoDB for hot data (on-demand capacity)
Caching:
- ElastiCache Redis cluster (cache.r5.large, 3 nodes)
- CloudFront with custom cache policies
Monitoring:
- CloudWatch custom metrics every 1 minute
- X-Ray for distributed tracing
- Real-time dashboards in Grafana
Performance Metrics
| Metric | Before | After |
|---|---|---|
| Avg Response Time | 450ms | 85ms |
| P99 Response Time | 1.2s | 280ms |
| Max Throughput | 100 req/sec | 10,500 req/sec |
| Monthly AWS Cost | $180 | $2,400 |
Lessons Learned
1. Cache aggressively, invalidate carefully: We saw a 70% reduction in database load after implementing multi-layer caching. The key is having a solid cache invalidation strategy.
2. DynamoDB for hot data works: Moving real-time inventory tracking from PostgreSQL to DynamoDB reduced query latency from 45ms to 8ms on average.
3. Auto-scaling needs tuning: Our first auto-scaling policy was too conservative. We now scale up at 60% CPU instead of 80%, preventing latency spikes.
4. Monitor everything: CloudWatch custom metrics saved us multiple times. We track API latency, cache hit rates, database connection pools, and queue depths in real-time.
Next Steps
We're currently working on:
- Implementing GraphQL for more efficient queries
- Moving to containerized deployments with ECS Fargate
- Geographic distribution with multi-region RDS
- ML-based demand prediction for auto-scaling
Interested in learning more about our infrastructure? Get in touch with our team.