Network Performance Tuning for Engineers

Reducing Round-Trip Time (RTT) and packet loss is critical for responsive applications, and often the bottleneck isn’t where you expect.

Let’s watch a packet’s journey. Imagine a request originating from client-app on server-A destined for db-service on server-B.

graph LR
    A[server-A: client-app] --> B(router-1);
    B --> C(router-2);
    C --> D(server-B: db-service);
    D --> C;
    C --> B;
    B --> A;

Here’s what’s happening under the hood. When client-app sends a packet, it’s handed off to the operating system’s network stack. This stack performs a series of operations: it adds IP headers, then Ethernet headers. The packet then traverses the physical network interface, hits the first router (router-1), then the next (router-2), and finally arrives at server-B. The return journey is a mirror image. Every hop, every processing step, adds latency. Packet loss can occur at any point if a device is overloaded or the link is saturated.

The most surprising thing about RTT and loss is how much of it is often not the network links themselves, but the endpoints and intermediate processing.

Consider this TCP handshake. When client-app initiates a connection to db-service, it sends a SYN packet. server-B responds with SYN-ACK, and client-app sends ACK. Each of these packets experiences RTT. If the RTT is high, this handshake alone can take tens or hundreds of milliseconds, impacting perceived connection establishment time.

sequenceDiagram
    participant C as client-app
    participant S as server-B
    C->>S: SYN (Seq=100)
    S-->>C: SYN-ACK (Seq=500, Ack=101)
    C->>S: ACK (Seq=101, Ack=501)

The problem this solves is slow application performance. Users perceive slowness when requests take a long time to complete. This slowness can be due to high RTT (the time it takes for a signal to travel to a point and back) or packet loss (packets failing to reach their destination).

Let’s break down the levers you control.

1. Network Path Optimization:

Problem: Suboptimal routing, high latency links, or congested intermediate devices.
Diagnosis: Use traceroute (or mtr) to identify hops with high latency or packet loss.
```
traceroute -n 192.168.1.2
```
Look for jumps in latency or consistent loss percentages.
Fix: Reroute traffic if possible, perhaps via a different ISP, a peering exchange, or by adjusting routing policies. If a specific router is the bottleneck, investigate its configuration or hardware. For instance, if router-2 shows high latency:
```
# On router-2 (example Cisco IOS)
show processes cpu sorted
show interfaces gigabitethernet0/1
```
This helps identify if CPU is maxed out or if the interface is experiencing errors or discards.
Why it works: Reduces the number of hops or selects faster links, directly decreasing the physical distance and processing time for packets.

2. Endpoint Tuning:

Problem: Slow processing of network packets on the client or server. This could be due to a busy CPU, inefficient kernel network stack, or application-level bottlenecks.

Diagnosis: On the server (server-B), monitor CPU usage and network buffer statistics.

# On server-B (Linux)
top -Hn 1 # Check per-thread CPU usage, look for high kernel threads
netstat -s | grep -i 'receive|drop' # Check for dropped packets at the kernel level
sysctl net.ipv4.tcp_rmem # Check TCP receive buffer sizes

On the client (server-A), similar checks for its network stack.

Fix:
- Increase buffer sizes: For high-bandwidth, high-latency links, TCP buffers might be too small.
```
# On server-B (Linux) - dynamically
sudo sysctl -w net.ipv4.tcp_rmem='4096 87380 6291456'
sudo sysctl -w net.ipv4.tcp_wmem='4096 16384 4194304'
```
  These values represent (min, default, max) for socket buffer sizes. The max values are increased here.
- Tune interrupt coalescing: Reduce the frequency of network interrupts to the CPU.
```
# On server-B (Linux) - find NIC and tune
ethtool -c eth0 # View current settings
sudo ethtool -C eth0 rx-usecs 100 tx-usecs 100 # Set to 100 microseconds
```
- Application-level optimizations: Profile the application to ensure it’s processing incoming/outgoing data efficiently.
Why it works: Larger buffers allow the network stack to hold more data when latency is high, preventing it from having to retransmit or slow down. Interrupt coalescing reduces CPU overhead from constant network interrupts.

3. Congestion Control:

Problem: The network path is saturated, leading to packet drops.

Diagnosis: Observe TCP retransmissions and zero-window probes.

# On server-A (Linux)
netstat -s | grep -i 'retransmit'
# Use tcpdump to capture packets and analyze
sudo tcpdump -i eth0 'tcp port 5432' -w tcp_analysis.pcap # Replace 5432 with DB port

Tools like Wireshark can then analyze tcp_analysis.pcap for excessive retransmissions.

Fix:

Use a more aggressive congestion control algorithm: Modern kernels support algorithms like BBR.
```
# On server-B (Linux)
sudo sysctl -w net.ipv4.tcp_congestion_control=bbr
```

Implement Quality of Service (QoS): Prioritize critical traffic.

# Example using tc on Linux (complex, illustrative only)
sudo tc qdisc add dev eth0 root handle 1: htb default 12
sudo tc class add dev eth0 parent 1: classid 1:1 htb rate 100mbit
sudo tc class add dev eth0 parent 1:1 classid 1:12 htb rate 10mbit # Lower priority class
sudo tc filter add dev eth0 parent 1: protocol ip prio 1 u32 match ip dst 192.168.1.2 flowid 1:12 # Match traffic to DB

Why it works: BBR aims to maximize throughput and minimize latency by explicitly measuring bandwidth and Round-trip delay. QoS ensures that important packets are less likely to be dropped when congestion occurs.

4. Link Layer Issues:

Problem: Physical problems with network cables, transceivers, or switch ports.

Diagnosis: Check interface error counters.

# On server-A or server-B
ifconfig eth0 # Look for errors, dropped packets
# On a switch
show interfaces gigabitethernet1/0/1 counters errors

Specifically, look for CRC errors, frame errors, input errors, output errors.

Fix: Replace faulty cables, transceivers, or switch ports. Ensure duplex settings are consistent.

# On server-A (Linux) - ensure auto-negotiation is on or set manually
sudo ethtool -s eth0 autonegotiation on
# Or set manually if needed (e.g., 1000Mbps Full Duplex)
# sudo ethtool -s eth0 speed 1000 duplex full

Why it works: Corrects physical transmission errors that corrupt packets before they can even be processed by higher layers.

A common pitfall is focusing solely on the WAN or inter-datacenter links when the majority of RTT and loss might be occurring within the server’s own network stack or the local network segment.

After addressing RTT and loss, the next challenge is often optimizing TCP throughput for high-latency, high-bandwidth connections, which might involve further tuning of congestion control algorithms and buffer sizes.