Modern Load Balancing: L4 vs L7, Algorithms, and Health Checks

Load balancing distributes incoming traffic across many backend servers so no single one becomes a bottleneck. It’s also how you get high availability — if one server dies, the load balancer simply stops sending traffic to it.

Layer 4 vs Layer 7

Load balancers operate at one of two layers of the OSI model:

Layer 4 (Transport) Layer 7 (Application)
What it sees IPs and ports only Full HTTP, headers, paths, cookies
Speed Very fast (millions of conn/sec) Slower (TLS terminate, parse HTTP)
Routing decisions By IP/port hash By URL path, host, header, cookie
Examples AWS NLB, HAProxy TCP mode NGINX, Envoy, AWS ALB, Cloudflare

The classic algorithms

  • Round-robin: rotate through servers in order. Simple, fair when all servers are equal.
  • Least connections: send the next request to the server with the fewest open connections. Better when request times vary wildly.
  • IP hash: the client’s IP picks the server. Provides session stickiness without cookies.
  • Weighted: bigger servers get a higher share. Useful for heterogeneous fleets.
  • Power-of-two-choices: pick two servers at random, send to the less loaded. Surprisingly close to optimal at very low cost.

Health checks are everything

A load balancer is only as good as its health checks. The classic mistakes:

  • Checking / instead of a real /health endpoint — your homepage might 200 while the database is on fire.
  • Health endpoint checks downstream dependencies — one slow database takes the whole fleet out of rotation simultaneously.
  • Too aggressive thresholds — a single failed check pulls a node, causing thundering herds and cascading failures.

The sweet spot: a lightweight /health that checks only local liveness, plus a separate /ready for orchestrators.

Connection draining

When you remove a backend (deploy, scale-in), the load balancer should stop sending new connections but let existing ones finish. AWS calls this “deregistration delay,” NGINX calls it graceful shutdown. Always set this — without it, every deploy means dropped requests.

Global load balancing

One load balancer per region only takes you so far. For a global service you layer GeoDNS or Anycast on top — users hit the closest region, and inside that region a regional LB picks a server. Cloudflare, Fastly, and the major clouds all offer this as managed services.

What to learn next

Load balancers live in front of cloud VPCs and often pair with a CDN. Understanding TCP and TLS is essential to debug them.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *