The issue was not only impacting Apache but all TCP services like SSH, hence it was not a particular service issue/misconfiguration.
The Chrome console on a MacBook Pro showed that the initial connection took about 10s, on the other hand a Win7 client in the same LAN had no problem at all.
After some digging on the client and server side, I found out that the client needs to send 8 SYN packets before the server replies with a SYNACK which explain why the connexion is so slow. Once the SYNACK is send back to the client, the communication speed is back to normal.
One hour headache later, it turn out that I enabled some Sysctl TCP tunning values that somehow introduced the issue.
I disabled the net.ipv4.tcp_tw_recycle and net.ipv4.tcp_tw_reuse features and everything went back to normal.
I think the problem comes from the net.ipv4.tcp_tw_reuse option, but as the issue impacted a production service (and is really hard to reproduce) I didn't try to re-enable tcp_tw_recycle.
Some posts advice to disable window scaling, I strongly disencourage this as it would result in poor network performances.
Hope that helps !
Below the tcpdump output that shows the 8 client's SYN packets before the SYNACK is sent back. Test was performed on SSH service as you can see, the TCP handshake took 10 secondes.
# SYN 1
15:57:26.303076 IP (tos 0x0, ttl 53, id 9488, offset 0, flags [DF], proto TCP (6), length 64)
client_ip.49316 > server_ip.ssh: Flags [S], cksum 0xdf5f (correct), seq 2356956535, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 835124724 ecr 0,sackOK,eol], length 0
# SYN 2
15:57:27.306416 IP (tos 0x0, ttl 53, id 37141, offset 0, flags [DF], proto TCP (6), length 64)
client_ip.49316 > server_ip.ssh: Flags [S], cksum 0xdb71 (correct), seq 2356956535, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 835125730 ecr 0,sackOK,eol], length 0
15:57:28.315804 IP (tos 0x0, ttl 53, id 2415, offset 0, flags [DF], proto TCP (6), length 64)
# SYN 3
client_ip.49316 > server_ip.ssh: Flags [S], cksum 0xd785 (correct), seq 2356956535, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 835126734 ecr 0,sackOK,eol], length 0
15:57:29.330233 IP (tos 0x0, ttl 53, id 62758, offset 0, flags [DF], proto TCP (6), length 64)
# SYN 4
client_ip.49316 > server_ip.ssh: Flags [S], cksum 0xd398 (correct), seq 2356956535, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 835127739 ecr 0,sackOK,eol], length 0
15:57:30.335779 IP (tos 0x0, ttl 53, id 29003, offset 0, flags [DF], proto TCP (6), length 64)
# SYN 5
client_ip.49316 > server_ip.ssh: Flags [S], cksum 0xcfa9 (correct), seq 2356956535, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 835128746 ecr 0,sackOK,eol], length 0
15:57:31.345254 IP (tos 0x0, ttl 53, id 5246, offset 0, flags [DF], proto TCP (6), length 64)
# SYN 6
client_ip.49316 > server_ip.ssh: Flags [S], cksum 0xcbba (correct), seq 2356956535, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 835129753 ecr 0,sackOK,eol], length 0
15:57:33.382242 IP (tos 0x0, ttl 53, id 5958, offset 0, flags [DF], proto TCP (6), length 64)
# SYN 7
client_ip.49316 > server_ip.ssh: Flags [S], cksum 0xc3dc (correct), seq 2356956535, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 835131767 ecr 0,sackOK,eol], length 0
15:57:37.881881 IP (tos 0x0, ttl 53, id 21274, offset 0, flags [DF], proto TCP (6), length 48)
# SYN 8
client_ip.49316 > server_ip.ssh: Flags [S], cksum 0x5c3d (correct), seq 2356956535, win 65535, options [mss 1460,sackOK,eol], length 0
15:57:37.881907 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 48)
# SYNACK (at last !!!)
server_ip.ssh > client_ip.49316: Flags [S.], cksum 0x7a12 (correct), seq 3228952474, ack 2356956536, win 14600, options [mss 1460,nop,nop,sackOK], length 0
15:57:37.885362 IP (tos 0x0, ttl 53, id 62772, offset 0, flags [DF], proto TCP (6), length 40)
# ACK
client_ip.49316 > server_ip.ssh: Flags [.], cksum 0xdfde (correct), seq 1, ack 1, win 65535, length 0
No comments:
Post a Comment