Load Balancers (L4 vs L7) Explained
A load balancer distributes incoming requests across multiple backend servers. It’s the difference between “one server handling 1000 users” and “ten servers each handling 100.” It’s also how you survive a server crashing — the load balancer just stops sending traffic to it.
The two main types
Layer 4 (Transport-level)
Looks only at TCP/UDP headers. Distributes connections based on (source IP, source port, destination IP, destination port). Doesn’t understand HTTP or any other application protocol.
- Pros: very fast, low overhead, protocol-agnostic, works for anything (databases, raw TCP, SMTP)
- Cons: can’t make routing decisions based on URL, headers, cookies
- Examples: AWS NLB, HAProxy in TCP mode, Linux IPVS, F5 BIG-IP
Layer 7 (Application-level)
Understands HTTP. Routes based on URL path, headers, cookies, methods. Can rewrite headers, terminate TLS, return cached responses.
- Pros: smart routing (e.g., “/api/* to backend pool A, /static/* to pool B”), TLS termination, header manipulation
- Cons: slower (parses HTTP), per-request overhead, only works for HTTP
- Examples: AWS ALB, nginx, HAProxy in HTTP mode, Envoy, Traefik, Caddy
Common load balancing algorithms
| Algorithm | How it picks | When to use |
|---|---|---|
| Round robin | Each server in turn | All servers identical, all requests similar cost |
| Weighted round robin | More requests to bigger servers | Mixed server sizes |
| Least connections | Server with fewest active connections | Variable request duration (websockets) |
| Least response time | Fastest-responding server | Heterogeneous servers, performance matters |
| Hash (source IP) | Same client → same server | When session persistence matters |
| Random with two choices | Pick 2 random, send to less loaded | Often best balance simplicity/performance |
Health checks
Load balancers periodically probe each backend to see if it’s alive. Failed probe → remove from rotation until healthy again.
- TCP check — can I open a connection? (L4)
- HTTP check — does GET /health return 200? (L7)
- Custom — does the response body match expected text?
Tune intervals carefully — too aggressive and you mark healthy servers down; too lax and clients hit dead servers for minutes.
Sticky sessions (session affinity)
“Once a user lands on server A, keep them on server A.” Implemented by:
- Source IP hashing (works for short sessions, breaks with NAT)
- Cookie injection (LB sets a cookie naming the chosen backend)
- Application cookie awareness (LB reads YOUR session cookie)
Best practice: design stateless backends so you don’t need stickiness. Stickiness is a workaround for shared-nothing failure.
TLS termination
Common pattern: client connects via HTTPS to load balancer, load balancer decrypts and forwards plain HTTP to backends. Pros: backends don’t manage certs, central place to update them. Cons: traffic between LB and backend is unencrypted (use mTLS or private network).
Active-active vs active-passive
- Active-active — multiple LBs all serve traffic simultaneously, distribute via DNS round-robin or anycast
- Active-passive — one LB serves all traffic; standby takes over if primary fails (via VRRP, keepalived)
Cloud load balancers
| Provider | L4 | L7 |
|---|---|---|
| AWS | NLB (Network Load Balancer) | ALB (Application Load Balancer) |
| GCP | TCP/UDP Load Balancer | HTTP(S) Load Balancer |
| Azure | Azure Load Balancer | Application Gateway |
| Cloudflare | Spectrum | Load Balancer (HTTP) |
What to learn next
Forward and reverse proxies — two opposite jobs sharing one name. Up next.