Network Stack Primer (L3/L2)

System Design Fundamentals

Network Stack Primer (L3/L2)

Topics Covered

Introduction

IP Addressing and CIDR Blocks (IPv4)

How many addresses per prefix?

Private vs public address space

Planning a VPC CIDR

ARP (Address Resolution Protocol)

How ARP works (step by step)

What happens for off-subnet destinations?

ARP in cloud and Kubernetes

NAT (Network Address Translation)

How SNAT/PAT works

NAT types at a glance

NAT in cloud VPCs

Design considerations

Routing Basics (Route Tables & Default Gateways)

What's in a route entry?

Longest prefix match in action

Local vs off-subnet delivery

Cloud routing (AWS VPC)

Static vs dynamic routing

Why does a backend engineer need to know about network layers? Because every timeout, every "connection refused", every mysterious latency spike traces back to what happens below your HTTP call. Understanding L2 and L3 turns debugging from guesswork into diagnosis.

The network stack wraps your data in progressively lower-level envelopes. Your HTTP payload gets a TCP header (ports, sequence numbers), then an IP header (source/destination addresses, TTL), then an Ethernet frame (source/destination MACs). The receiving host strips each layer in reverse.

Think of it as Russian nesting dolls. The post office (IP) cares about the street address. The building lobby (Ethernet) cares about the apartment number (MAC). Both are needed to deliver the letter.

Key Insight

IP addresses are logical and can change when a VM migrates. MAC addresses are physical and burned into the NIC. You need both: IP for routing across networks, MAC for delivery on the local wire. This is why ARP exists: it bridges the two worlds.

This primer covers the four things that matter for system design interviews:

IP and CIDR: carving address space into routable subnets
ARP: resolving IP to MAC on the local network
NAT: translating private addresses to public at the edge
Routing: route tables, default gateways, and longest-prefix match

IP Addressing and CIDR Blocks (IPv4)

An IPv4 address is a 32-bit number with two parts: a network prefix (which subnet) and a host ID (which interface inside that subnet). CIDR notation tells you where the split is.

192.168.10.25/24 means 24 bits of network, 8 bits of host. That /24 gives you 256 total addresses (254 usable on classic LANs, 251 usable in AWS which reserves 5).

Why does this matter for system design? Because every VPC, every subnet, every pod CIDR starts with choosing the right prefix length. Pick too small and you run out of addresses. Pick too large and you waste space that could be used by other teams or peered VPCs.

How many addresses per prefix?

Prefix	Total	Usable (AWS)	Typical use
/16	65,536	65,531	VPC-level block
/20	4,096	4,091	AZ-sized subnet
/24	256	251	Standard subnet
/28	16	11	Small isolated segment

Rule of thumb: each additional bit in the prefix halves the address count. A /25 has 128 addresses, a /26 has 64.

Private vs public address space

Private (RFC 1918): 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16. Used inside VPCs and data centers. Not routable on the public internet.
Public: Globally unique, allocated by RIRs. Reachable from anywhere (subject to firewalls).

In cloud design, your VPC uses private space internally. To reach the internet, traffic passes through a NAT Gateway or Internet Gateway that translates between private and public addresses.

Interview Tip

When planning VPC CIDRs, never use 10.0.0.0/16 because every tutorial defaults to it, so peering with another team's VPC will almost certainly conflict. Pick something like 10.42.0.0/16 or 10.128.0.0/16 to avoid overlaps. CIDR conflicts cannot be fixed after VPCs are peered.

Planning a VPC CIDR

Pick a non-overlapping private block large enough for 2-3x growth
Standardize subnet sizes per AZ: /20 if dense, /24 if modest
Map subnets to security zones (public, private, data) and route tables
Reserve empty ranges for future expansion
Document every allocation and enforce with IaC linters

ARP (Address Resolution Protocol)

Why does ARP exist? Because IP and Ethernet solve different problems. IP tells you which host to reach (logical address). Ethernet tells the switch which port to forward to (physical MAC). When host A wants to send a packet to host B on the same subnet, it knows B's IP but not B's MAC. ARP bridges that gap.

How ARP works (step by step)

Host A wants to send to 10.0.1.20 and checks its ARP cache and finds no entry
Host A broadcasts an ARP Request to FF:FF:FF:FF:FF:FF: "Who has 10.0.1.20? Tell 10.0.1.10"
Every host on the subnet receives the broadcast. Only the host with 10.0.1.20 responds
Host B sends a unicast ARP Reply back to Host A: "10.0.1.20 is at AA:BB:CC:DD:11:22"
Host A caches the mapping (10.0.1.20 -> AA:BB:CC:DD:11:22) with a TTL (typically 20-300 seconds)
Host A constructs an Ethernet frame with B's MAC as destination and sends the packet

What happens for off-subnet destinations?

If Host A wants to reach 203.0.113.50 (different subnet), it does not ARP for that IP. Instead:

Host A checks its route table, finds the default gateway 10.0.1.1
Host A ARPs for the gateway's MAC (not the destination's)
Host A sends the frame to the gateway's MAC, with the IP destination still set to 203.0.113.50
The gateway routes the packet to the next hop, repeating the ARP+forward process

This is the key insight: MAC addresses are hop-by-hop, IP addresses are end-to-end. At each router, the MAC header is stripped and rebuilt for the next segment, but the IP header stays the same.

ARP in cloud and Kubernetes

AWS VPC: The fabric is L3-routed, but instances still see ARP semantics. When sending to another subnet, you ARP for the VPC router at .1 (e.g., 10.0.1.1). AWS handles the underlying mapping.
Kubernetes: Pods ARP for the node bridge or CNI gateway when reaching outside their pod CIDR. Overlay networks (VXLAN, Geneve) encapsulate the inner Ethernet frame inside an outer UDP packet, but the ARP concept remains the same at the virtual layer.

NAT (Network Address Translation)

Why does NAT exist? Because IPv4 only has about 4.3 billion addresses, and the internet has far more devices. NAT lets hundreds of private hosts share a handful of public IPs by rewriting packet headers at the network edge.

How SNAT/PAT works

Private host 10.0.1.25:5151 sends a TCP SYN to 203.0.113.50:443
The NAT device intercepts the packet and creates a mapping: 10.0.1.25:5151 <-> 203.0.113.5:40001
NAT rewrites the source to 203.0.113.5:40001 and forwards to the internet
The server replies to 203.0.113.5:40001
NAT looks up the mapping, rewrites the destination back to 10.0.1.25:5151, and delivers

The key: each internal flow gets a unique port on the public IP, so the NAT can demultiplex replies. This is Port Address Translation (PAT), and it is how home routers, AWS NAT Gateways, and corporate firewalls all work.

NAT types at a glance

Type	Direction	What changes	Typical use
SNAT	Outbound	Source IP/port to public	Private hosts reaching internet
PAT	Outbound	Many flows share 1 public IP	Home routers, NAT Gateways
DNAT	Inbound	Dest IP/port to internal	Port forwarding to internal service
1:1	Either	Full IP mapping	Dedicated public IP per server

NAT in cloud VPCs

Private subnet: Route 0.0.0.0/0 -> NAT Gateway. Instances have no public IPs. They can initiate outbound connections (OS updates, API calls) but cannot be reached from the internet.
Public subnet: Route 0.0.0.0/0 -> Internet Gateway. Instances get public or Elastic IPs and are directly reachable (subject to security groups).
No route, no reach: If there is no IGW or NAT route, instances are completely isolated.

Common Pitfall

NAT is stateful: it maintains a mapping table for every active flow. AWS NAT Gateway supports up to 55,000 simultaneous connections per destination IP. If your service makes many outbound connections to the same endpoint, you can exhaust the port space and see connection failures. Monitor the ErrorPortAllocation CloudWatch metric.

Design considerations

Breaks end-to-end addressing: Protocols embedding IP/port (SIP, P2P) need STUN/TURN/ICE to traverse NAT
Hairpinning: Calling your own public IP from inside the VPC adds an extra NAT hop. Use internal DNS or NLB instead.
Cost and throughput: NAT Gateways are billed per GB processed and have throughput limits. Size for your egress volume.
Logging: NAT obscures the original source IP. Enable VPC Flow Logs to trace connections back to internal hosts.

Routing Basics (Route Tables & Default Gateways)

Routing is how packets cross network boundaries. At every hop, the router (or host) consults a route table and picks the next hop using one simple rule: longest prefix match (LPM).

What's in a route entry?

Each entry has two fields:

Destination: a CIDR block (e.g., 10.0.12.0/24, 0.0.0.0/0)
Target: where to send matching packets (a gateway IP, an interface, or a cloud resource like an IGW)

Longest prefix match in action

A router has three routes:

0.0.0.0/0 -> Gateway C (default route)
10.0.0.0/16 -> Gateway B
10.0.12.0/24 -> Gateway A

A packet arrives destined for 10.0.12.77. All three routes match this IP, but the /24 is the most specific (longest prefix), so the packet goes to Gateway A. If the destination were 10.0.5.99, only the /16 and /0 match, and the /16 wins. For 8.8.8.8, only the default /0 matches.

Key Insight

Longest prefix match is the single most important concept in routing. It is how VPCs distinguish local traffic from internet-bound traffic, how Kubernetes routes pod-to-pod vs pod-to-service, and how BGP selects paths across the internet. Every routing decision everywhere uses LPM.

Local vs off-subnet delivery

Local (same subnet): Host checks route table, finds the destination matches its connected subnet. It ARPs for the destination's MAC and sends the frame directly.
Off-subnet: No connected route matches. The default route (0.0.0.0/0) sends the packet to the default gateway. The host ARPs for the gateway's MAC and sends the frame there. The gateway repeats the process for the next hop.

The default gateway is the "router of last resort." If no specific route matches, the packet goes there.

Cloud routing (AWS VPC)

Each subnet is associated with a route table you control:

Public subnet: 10.0.0.0/16 -> local, 0.0.0.0/0 -> IGW
Private subnet: 10.0.0.0/16 -> local, 0.0.0.0/0 -> NAT Gateway
Peering/VPN: Add routes for remote CIDRs pointing to VPC Peering Connection, Transit Gateway, or VPN Gateway

The local route is automatic and immutable. It ensures all intra-VPC traffic stays within the VPC fabric. You add routes for everything else.

Static vs dynamic routing

Static: Manually configured entries. Used in most VPC setups. Simple but doesn't adapt to failures.
Dynamic (BGP): Routers exchange route advertisements and automatically update tables when links fail. Used for VPN connections, Direct Connect, and Transit Gateway. More complex but self-healing.

Course

System Design Fundamentals

Networking & APIs

Storage & Data Modeling

Partitioning, Replication & Consistency

Caching & Edge

Messaging & Streaming

Reliability & Operability

Security & Privacy

Common Interview Scenarios