修改网卡缓冲以提高吞吐量

在单机并发上万的时候,发现 sendto 函数比较容易出现返回错误 EAGAIN,对应的错误消息为 Resource temporarily unavailable,这种情况下认为是应用层写入过快,导致网卡缓冲满了

使用这个命令来查看缓冲区现有大小

# ethtool -g eth1
Ring parameters for eth1:
Pre-set maximums:
RX:		4096
RX Mini:	0
RX Jumbo:	0
TX:		4096
Current hardware settings:
RX:		256
RX Mini:	0
RX Jumbo:	0
TX:		256

使用这个命令来修改缓冲区大小

# ethtool -G eth1 rx 2048
# ethtool -G eth1 tx 2048

修改之后再查看一下

# ethtool -g eth1
Ring parameters for eth1:
Pre-set maximums:
RX:		4096
RX Mini:	0
RX Jumbo:	0
TX:		4096
Current hardware settings:
RX:		2048
RX Mini:	0
RX Jumbo:	0
TX:		2048

但是发现这样修改之后,没有起作用,sendto 函数的失败率还是很高,继续看,发现 socket 族函数本身有一个 setsockopt 的函数,在代码里面可以设置,写法如下

int iOriginalSendBufferSize = 0;
socklen_t iIntLen = sizeof(int);
getsockopt(iSocket, SOL_SOCKET, SO_SNDBUF, &iOriginalSendBufferSize, &iIntLen);
LOG_INFO("before set, send buffer is %d", iOriginalSendBufferSize);

int nSendBuf=32*1024;
setsockopt(iSocket,SOL_SOCKET,SO_SNDBUF,(const  char*)&nSendBuf,sizeof(int));  

int iFinalSendBufferSize = 0;
getsockopt(iSocket, SOL_SOCKET, SO_SNDBUF, &iFinalSendBufferSize, &iIntLen);
LOG_INFO("after set, send buffer is %d", iFinalSendBufferSize);

这里可能会觉得多此一举,设完还要看,有啥好看的,但是,看输出

before set, send buffer is 8388608
after set, send buffer is 65536

神奇吧,设的明明是 32k,出来居然 64k 了,买一送一啊这是

查资料看到这里,http://blog.csdn.net/c35971943…

Author:阿冬哥

Created:2013-4-17

Blog:http://blog.csdn.net/c359719435/

Copyright 2013 阿冬哥 http://blog.csdn.net/c359719435/

使用以及转载请注明出处

1 设置socket tcp缓冲区大小的疑惑
疑惑1:通过setsockopt设置SO_SNDBUF、SO_RCVBUF这连个默认缓冲区的值,再用getsockopt获取设置的值,发现返回值是设置值的两倍。为什么?
通过网上查找,看到linux的内核代码/usr/src/linux-2.6.13.2/net/core/sock.c,找到sock_setsockopt这个函数的这段代码:

case SO_SNDBUF:
/* Don’t error on this BSD doesn’t and if you think
about it this is right. Otherwise apps have to
play ‘guess the biggest size’ games. RCVBUF/SNDBUF
are treated in BSD as hints */

if (val > sysctl_wmem_max)//val是我们想设置的缓冲区大小的值
val = sysctl_wmem_max;//大于最大值,则val值设置成最大值

sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
if ((val * 2) < SOCK_MIN_SNDBUF)//val的两倍小于最小值,则设置成最小值 sk->sk_sndbuf = SOCK_MIN_SNDBUF;
else
sk->sk_sndbuf = val * 2;//val的两倍大于最小值,则设置成val值的两倍

/*
* Wake up sending tasks if we
* upped the value.
*/
sk->sk_write_space(sk);
break;

case SO_RCVBUF:
/* Don’t error on this BSD doesn’t and if you think
about it this is right. Otherwise apps have to
play ‘guess the biggest size’ games. RCVBUF/SNDBUF
are treated in BSD as hints */

if (val > sysctl_rmem_max)
val = sysctl_rmem_max;

sk->sk_userlocks |= SOCK_RCVBUF_LOCK;
/* FIXME: is this lower bound the right one? */
if ((val * 2) < SOCK_MIN_RCVBUF) sk->sk_rcvbuf = SOCK_MIN_RCVBUF;
else
sk->sk_rcvbuf = val * 2;
break;
从上述代码可以看出:(1)当设置的值val > 最大值sysctl_wmem_max,则设置为最大值的2倍:2*sysctl_wmem_max;
(2)当设置的值的两倍val*2 > 最小值,则设置成最小值:SOCK_MIN_SNDBUF;
(3)当设置的值val < 最大值sysctl_wmem_max,且 val*2 > SOCK_MIN_SNDBUF, 则设置成2*val。

查看linux 手册:
SO_RCVBUF:
Sets or gets the maximum socket receive buffer in bytes.
The kernel doubles this value (to allow space for bookkeeping overhead) when it is set using setsockopt(2),
and this doubled value is returned by getsockopt(2).
The default value is set by the /proc/sys/net/core/rmem_default file,
and the maximum allowed value is set by the /proc/sys/net/core/rmem_max file.
The minimum (doubled) value for this option is 256.
查看我的主机Linux 2.6.6 :/proc/sys/net/core/rmem_max:
4194304 //4M
查看/proc/sys/net/core/wmem_max:
8388608 //8M
所以,能设置的接收缓冲区的最大值是8M,发送缓冲区的最大值是16M。

疑惑2:为什么要有2倍这样的一个内核设置呢?我的理解是,用户在设置这个值的时候,可能只考虑到数据的大小,没有考虑数据封包的字节开销。所以将这个值设置成两倍。

注:overhead,在计算机网络的帧结构中,除了有用数据以外,还有很多控制信息,这些控制信息用来保证通信的完成。这些控制信息被称作系统开销。

2 tcp缓冲区大小的默认值
建立一个socket,通过getsockopt获取缓冲区的值如下:
发送缓冲区大小:SNDBufSize = 16384
接收缓冲区大小:RCVBufSize = 87380

疑惑3:linux手册中,接收缓冲区的默认值保存在/proc/sys/net/core/rmem_default,发送缓冲区保存在/proc/sys/net/core/wmem_default。
[root@cfs_netstorage core]# cat /proc/sys/net/core/rmem_default
1048576
[root@cfs_netstorage core]# cat /proc/sys/net/core/wmem_default
512488

可知,接收缓冲区的默认值是:1048576,1M。发送缓冲区的默认值是:512488,512K。为什么建立一个socket时得到的默认值是87380、16384???

进一步查阅资料发现, linux下socket缓冲区大小的默认值在/proc虚拟文件系统中有配置。分别在一下两个文件中:
/proc/sys/net/ipv4/tcp_wmem
[root@cfs_netstorage core]# cat /proc/sys/net/ipv4/tcp_wmem
4096 16384 131072 //第一个表示最小值,第二个表示默认值,第三个表示最大值。
/proc/sys/net/ipv4/tcp_rmem
[root@cfs_netstorage core]# cat /proc/sys/net/ipv4/tcp_rmem
4096 87380 174760

由此可见,新建socket,选取的默认值都是从这两个文件中读取的。可以通过更改这两个文件中的值进行调优,但是最可靠的方法还是在程序中调用setsockopt进行设置。通过setsockopt的设置,能设置的接收缓冲区的最大值是8M,发送缓冲区的最大值是16M(Linux 2.6.6中)。

原来还真的是买一送一

于是在这个地方,把 SNDBUF 这个选项改成 8m 或者更大,sendto 的错误就没有了

———————

2014-2-28 17:31:44 update 原来对于 tcp 和 udp,这个缓冲区默认大小还不不一样

#include <stdio.h>
#include <assert.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
#include <unistd.h>
int tcpSrv(short iListenPort)
{
    int iListenFd = socket(AF_INET, SOCK_STREAM, 0);
    assert(iListenFd > 0);
    
    int iOriginalSendBufferSize = 0;
    socklen_t iIntLen = sizeof(iOriginalSendBufferSize);
    getsockopt(iListenFd, SOL_SOCKET, SO_SNDBUF, &iOriginalSendBufferSize, &iIntLen);
    printf("default send buffer size %d\n", iOriginalSendBufferSize);
    
    struct sockaddr_in stServerAddr;
    stServerAddr.sin_family = AF_INET;
    stServerAddr.sin_addr.s_addr = htonl(INADDR_ANY);
    stServerAddr.sin_port = htons(iListenPort);
    assert(bind(iListenFd, (struct sockaddr*)&stServerAddr, sizeof(stServerAddr)) == 0);
    assert(listen(iListenFd, 1024) == 0);
    printf("listening...\n");
    while (1) {
        int iAcceptFd = accept(iListenFd, (struct sockaddr*)NULL, NULL);
        assert(iAcceptFd > 0);
        char* pStrResponse = "HTTP/1.1 200 OK\r\nConnection: close\r\n\r\n";
        int iResponseLen = strlen(pStrResponse);
        int iSendByte = send(iAcceptFd, pStrResponse, iResponseLen, 0);
        assert(iSendByte == iResponseLen);
        close(iAcceptFd);
    }
    close(iListenFd);
    return 0;
}


int udpSrv(short iListenPort)
{
    int iListenFd = socket(AF_INET, SOCK_DGRAM, 0);
    assert(iListenFd > 0);
    
    int iOriginalSendBufferSize = 0;
    socklen_t iIntLen = sizeof(iOriginalSendBufferSize);
    getsockopt(iListenFd, SOL_SOCKET, SO_SNDBUF, &iOriginalSendBufferSize, &iIntLen);
    printf("default send buffer size %d\n", iOriginalSendBufferSize);
    
    struct sockaddr_in stServerAddr;
    stServerAddr.sin_family = AF_INET;
    stServerAddr.sin_addr.s_addr = htonl(INADDR_ANY);
    stServerAddr.sin_port = htons(iListenPort);
    assert(bind(iListenFd, (struct sockaddr*)&stServerAddr, sizeof(stServerAddr)) == 0);
    printf("listening...\n");
    while (1) {
        char buffer[BUFSIZ] = {0};
        struct sockaddr_in stClientAddr;
        socklen_t stClientAddrLen = sizeof(stClientAddr);
        int iReceivedLen = recvfrom(iListenFd, buffer, sizeof(buffer), 0, (struct sockaddr*)&stClientAddr, &stClientAddrLen);
        int iSendByte = sendto(iListenFd, &buffer, iReceivedLen, 0, (struct sockaddr*)&stClientAddr, stClientAddrLen);
    }
    close(iListenFd);
    return 0;
}


int main()
{
    udpSrv(8000);
    return 0;
}

tcp 的

default send buffer size 16384
listening...

udp 的

default send buffer size 8388608
listening...

2 thoughts on “修改网卡缓冲以提高吞吐量

Leave a Reply to ZRJ Cancel reply

Your email address will not be published. Required fields are marked *