/sys/net Adventures: Performances

Showing posts with label Performances. Show all posts

Wednesday, January 29, 2014

Postfix : 451 Error in processing Number of messages exceeds maximum per connection

If you have deferred mails with the following reason :

 451 Error in processing Number of messages exceeds maximum per connection

You might have an issue with your default_destination_recipient_limit option which is set to 50 by default.

In my case it was a bit more complicated, I had a custom transport for this destination (@hsbc.fr) with the destination_recipient_limit set to 5 and even with a setting of 2, I still had the same error coming from their servers...

The solution was to turn off on demand smtp connection caching, which is activated by default.

The postfix documentation says :
"Temporarily enable SMTP connection caching while a destination has a high volume of mail in the active queue. With SMTP connection caching, a connection is not closed immediately after completion of a mail transaction. Instead, the connection is kept open for up to $smtp_connection_cache_time_limit seconds. This allows connections to be reused for other deliveries, and can improve mail delivery performance."
http://www.postfix.org/postconf.5.html#smtp_connection_cache_on_demand

"high volume of mail" is unfortunately not specified. In my case, I had to send about 6K mails, the destination_recipient_limit was effective for this destination but Postfix reused the same SMTP connection which triggered HDBC mailserver limit.

Conclusion :

You can either disable SMTP connection caching globally in main.cf :

 smtp_connection_cache_on_demand = no

Or by smtp transport in master.cf (NOT TESTED) :

 smtpslow unix  -    -    n    -    -    smtp  
 ...  
 ...  
  -o destination_recipient_limit=5  
  -o smtp_connection_cache_on_demand=off

Hope that helps !

Friday, October 11, 2013

Linux server sends SYNACK packet only after receiving 8 SYN

Got a really weird issue recently, in some rare case (mostly Mac and Mobile phone clients), the connection to a linux server was really really slow (about 12s).

The issue was not only impacting Apache but all TCP services like SSH, hence it was not a particular service issue/misconfiguration.

The Chrome console on a MacBook Pro showed that the initial connection took about 10s, on the other hand a Win7 client in the same LAN had no problem at all.

After some digging on the client and server side, I found out that the client needs to send 8 SYN packets before the server replies with a SYNACK which explain why the connexion is so slow. Once the SYNACK is send back to the client, the communication speed is back to normal.

One hour headache later, it turn out that I enabled some Sysctl TCP tunning values that somehow introduced the issue.

I disabled the net.ipv4.tcp_tw_recycle and net.ipv4.tcp_tw_reuse features and everything went back to normal.

I think the problem comes from the net.ipv4.tcp_tw_reuse option, but as the issue impacted a production service (and is really hard to reproduce) I didn't try to re-enable tcp_tw_recycle.

Some posts advice to disable window scaling, I strongly disencourage this as it would result in poor network performances.

Hope that helps !

Below the tcpdump output that shows the 8 client's SYN packets before the SYNACK is sent back. Test was performed on SSH service as you can see, the TCP handshake took 10 secondes.

 # SYN 1  
 15:57:26.303076 IP (tos 0x0, ttl 53, id 9488, offset 0, flags [DF], proto TCP (6), length 64)  
   client_ip.49316 > server_ip.ssh: Flags [S], cksum 0xdf5f (correct), seq 2356956535, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 835124724 ecr 0,sackOK,eol], length 0  
 # SYN 2  
 15:57:27.306416 IP (tos 0x0, ttl 53, id 37141, offset 0, flags [DF], proto TCP (6), length 64)  
   client_ip.49316 > server_ip.ssh: Flags [S], cksum 0xdb71 (correct), seq 2356956535, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 835125730 ecr 0,sackOK,eol], length 0  
 15:57:28.315804 IP (tos 0x0, ttl 53, id 2415, offset 0, flags [DF], proto TCP (6), length 64)  
 # SYN 3  
   client_ip.49316 > server_ip.ssh: Flags [S], cksum 0xd785 (correct), seq 2356956535, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 835126734 ecr 0,sackOK,eol], length 0  
 15:57:29.330233 IP (tos 0x0, ttl 53, id 62758, offset 0, flags [DF], proto TCP (6), length 64)  
 # SYN 4  
   client_ip.49316 > server_ip.ssh: Flags [S], cksum 0xd398 (correct), seq 2356956535, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 835127739 ecr 0,sackOK,eol], length 0  
 15:57:30.335779 IP (tos 0x0, ttl 53, id 29003, offset 0, flags [DF], proto TCP (6), length 64)  
 # SYN 5  
   client_ip.49316 > server_ip.ssh: Flags [S], cksum 0xcfa9 (correct), seq 2356956535, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 835128746 ecr 0,sackOK,eol], length 0  
 15:57:31.345254 IP (tos 0x0, ttl 53, id 5246, offset 0, flags [DF], proto TCP (6), length 64)  
 # SYN 6  
   client_ip.49316 > server_ip.ssh: Flags [S], cksum 0xcbba (correct), seq 2356956535, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 835129753 ecr 0,sackOK,eol], length 0  
 15:57:33.382242 IP (tos 0x0, ttl 53, id 5958, offset 0, flags [DF], proto TCP (6), length 64)  
 # SYN 7  
   client_ip.49316 > server_ip.ssh: Flags [S], cksum 0xc3dc (correct), seq 2356956535, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 835131767 ecr 0,sackOK,eol], length 0  
 15:57:37.881881 IP (tos 0x0, ttl 53, id 21274, offset 0, flags [DF], proto TCP (6), length 48)  
 # SYN 8  
   client_ip.49316 > server_ip.ssh: Flags [S], cksum 0x5c3d (correct), seq 2356956535, win 65535, options [mss 1460,sackOK,eol], length 0  
 15:57:37.881907 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 48)  
 # SYNACK (at last !!!)  
   server_ip.ssh > client_ip.49316: Flags [S.], cksum 0x7a12 (correct), seq 3228952474, ack 2356956536, win 14600, options [mss 1460,nop,nop,sackOK], length 0  
 15:57:37.885362 IP (tos 0x0, ttl 53, id 62772, offset 0, flags [DF], proto TCP (6), length 40)  
 # ACK  
   client_ip.49316 > server_ip.ssh: Flags [.], cksum 0xdfde (correct), seq 1, ack 1, win 65535, length 0

Monday, July 15, 2013

Emulate bad or WAN network performances from a particular IP on a Gigabit LAN network

If you're developing Web or mobile applications, you'll certainly be confronted to poor network conditions.

The problem now is "how can I test my application under bad network conditions". Well you could rent a forein internet connection or use tools that reports performance from various remote countries however this is not a good debugging environment.

The solution is to use TC and NetEM on your front development server (typically Web or reverse proxy server), then use filters so only one client station (the debugging station) is impacted.
Don't forget to use filter otherwise all your clients will be impacted.

Below an example on how to emulate a network with :

1Mbps bandwidth
400ms delay
5% packet loss
1% Corrupted packet
1% Duplicate packet

The debugging client IP is 192.168.0.42 (i.e the IP impacted by the bad network performance);
The following commands need to be executed on the front developement server, please set the appropriate NIC for you environment (eth0 used below) :

 # Clean up rules  
   
 tc qdisc del dev eth0 root  
   
 # root htb init 1:  
   
 tc qdisc add dev eth0 handle 1: root htb  
   
 # Create class 1:42 with 1Mbps bandwidth  
   
 tc class add dev eth0 parent 1:1 classid 1:42 htb rate 1Mbps  
   
 # Set network degradations on class 1:42  
   
 tc qdisc add dev eth0 parent 1:42 handle 30: netem loss 5% delay 400ms duplicate 1% corrupt 1%  
   
 # Filter class 1:42 to 192.168.0.42 only (match destination IP)  
   
 tc filter add dev eth0 protocol ip prio 1 u32 match ip dst 192.168.0.42 flowid 1:42  
   
 # Filter class 1:42 to 192.168.0.42 only (match source IP)  
   
 tc filter add dev eth0 protocol ip prio 1 u32 match ip src 192.168.0.42 flowid 1:42

To check that the rules are properly set use the following commands :

 tc qdisc show dev eth0  
 tc class show dev eth0  
 tc filter show dev eth0

Once you're done with the testing, cleanup the rules with the command :

 tc qdisc del dev eth0 root

There is many other options you can use (correlation, distribution, packet reordering, etc), please check the documentation available at :

http://www.linuxfoundation.org/collaborate/workgroups/networking/netem

If this setup fits your requirements, I advice you to create a shell script so you can start/stop the rules with custom values. Be aware that you can also make filters based on source/destination ports, etc.

If you have more complex requirements, you can try WANem, which is a live Linux Distribution with a graphical interface on top of NetEM. Please be aware that this requires route modifications on your client and server (or any other routing tricks).

http://wanem.sourceforge.net/
http://sourceforge.net/projects/wanem/files/Documents/WANemv11-Setup-Guide.pdf

I didn't had the opportunity to try it, please let me know if you have any feedback.

Thursday, April 4, 2013

Choosing RAID Level / Stripe Size

Below interesting articles on how to choose your RAID level / stripe size.

Good litterature on RAID :
http://www.fccps.cz/download/adv/frr/hdd/hdd.html

RAID Level Explained :
http://www.techrepublic.com/blog/datacenter/choose-a-raid-level-that-works-for-you/3237

RAID Stripe Explained :
http://www.anandtech.com/show/788/5

RAID Benchmarks :
https://raid.wiki.kernel.org/index.php/Performance

RAID Calculator :
http://www.z-a-recovery.com/art-raid-estimator.htm

In any case, always plan your workload type (Read/Writes, Sequential/Random, Large/Small File, Number of concurrent access).