Wednesday, January 4, 2017

Tcp Performance Tuning on 10g Ethernet


Network is a critical component in the end-to-end storage system performance tuning.




The default TCP socket round trip latency is not ideal.  I did a basic test on two mainstream servers (24 cores, 64GB ram, 10gbe). My test flow is:
- client host sends 64 byte data to simulate a command frame via send()
- server host blocks on recv() until getting 64 bytes data, then immediately sends 4K bytes data
- client host blocks on recv() until it gets 4K bytes data
- repeat the above steps for 10 seconds.

The result is as following:
avg latency = 107 us,  50% lat = 83, 90% lat = 163, 99% lat = 178, max lat = 3703, bandwidth = 38 MB/s

Need to tune tcp performance.


Tcp parameter tuning

Maximum receive socket buffer size (size of BDP)
# sysctl -w net.core.rmem_max=134217728

Maximum send socket buffer size (size of BDP)
# sysctl -w net.core.wmem_max=134217728

Minimum,initial,and max TCP Receive buffer size in Bytes
# sysctl -w net.ipv4.tcp_rmem="4096 87380 134217728"

Minimum, initial, and max buffer space allocated
# sysctl -w net.ipv4.tcp_wmem="4096 65536 134217728"

Maximum number of packets queued on the input side
# sysctl -w net.core.netdev_max_backlog=300000

Auto Tuning
# sysctl -w net.ipv4.tcp_moderate_rcvbuf=1


These settings take effect immediately but do not persist over a reboot. To make these values
permanent, add the following to "/etc/sysctl.conf"

net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.core.netdev_max_backlog = 300000
net.ipv4.tcp_moderate_rcvbuf =1


To see what settings are in effect, type sysctl followed by the parameter name, for example:
# sysctl net.core.rmem_max


Jumbo Frames

The default MTU value is 1500, and most 10G ports support up to 64KB MTU values. An MTU value of 9000 was adequate to improve performance and make it more consistent. Jumbo frames have been supported in Oracle VM Server for x86 starting with Release 3.1.1. Remember that if you change the MTU, the changed value must set be set on all devices (like routers) between the communicating hosts.

Note that some switch configurations preclude use of jumbo frames (such as QoS, trunking or even
VLANs), depending on switch vendor or model.


NIC Offload Features

Offload features supported by the NIC can reduce CPU consumption and lead to a significant
performance improvements. The settings below show useful values. Note that Large
Receive Offload (LRO) must be left in its default state. If turned on, it will be automatically set to off
when a port is added as a bridge interface. An example of these settings is shown below.

# ethtool -k ethX
Offload parameters for ethX:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off


Use ethtool to change NIC parameters at runtime

To see number of send/recv descriptors available on system:

$ ethtool -g <dev>  

Increase this value:
$ ethtool -G <dev>  rx 4096  tx 4096





No comments:

Post a Comment