Unexpected connection drop between SAP application and NetWeaver

This document (000020518) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server for SAP Applications 15
SUSE Linux Enterprise Server for SAP Applications 12
 

Situation

SAP application servers are facing unexpected network connection drop with SAP Netweaver server. The SAP work process traces report connection drop at the same time. A tcpdump has been captured on both directions of the TCP traffic on port 3201, which is the port the SAP enqueue server listens. The connection from App server to NW appears to be active until 01:17:59:
 64280 01:17:59.358124 192.168.4.153 → 192.168.4.94 TCP 408 26674 → 3201 [PSH, ACK] Seq=4006 Ack=3221 Win=35840 Len=340 TSval=2330272576 TSecr=554184432
 64281 01:17:59.358421 192.168.4.94 → 192.168.4.153 TCP 492 3201 → 26674 [PSH, ACK] Seq=3221 Ack=4346 Win=24192 Len=424 TSval=554184432 TSecr=2330272576
 64282 01:17:59.358450 192.168.4.153 → 192.168.4.94 TCP 408 26674 → 3201 [PSH, ACK] Seq=4346 Ack=3645 Win=36992 Len=340 TSval=2330272576 TSecr=554184432
 64283 01:17:59.358711 192.168.4.94 → 192.168.4.153 TCP 492 3201 → 26674 [PSH, ACK] Seq=3645 Ack=4686 Win=25216 Len=424 TSval=554184432 TSecr=2330272576
 64289 01:17:59.395805 192.168.4.153 → 192.168.4.94 TCP 68 26674 → 3201 [ACK] Seq=4686 Ack=4069 Win=

Afterwards the connection is on idle(inactive state, because there is no further data to be exchanged between SAP application server and Netweaver. However after around 35 minutes later both SAP worker process and NW are reporting connection drop at the same time:

Work process trace on application server:
M ***LOG Q0I=> NiIRead: P=192.168.4.94:3201; L=192.168.4.153:12198: recv (110: Connection timed out) [/bas/749_REL/src/base/ni/nixxi.cpp 5420]
M *** ERROR => NiIRead: SiRecv failed for hdl 20/sock 15
(SI_ECONN_BROKEN/110; I4; ST; P=192.168.4.94:3201; L=192.168.4.153:12198) [nixxi.cpp 5420]
M {root-id=0050569E39B51EDBAD8693D6DE4A26BF}_{conn-id=00000000000000000000000000000000}_0
M *** WARNING => EncCliReq: close connection (broken). See SAP note 1943531 [enccli.c 679]
E *** ERROR => EnsaCliDoRequest: DoRequest failed (rc=-1): 488 bytes on con. 0 in layer 0 (EncCli) (1/1) [ensacli.c 408]

Enqueue server traces:
[Thr 139732915848256] ***LOG Q0I=> NiIRead: P=192.168.4.153:12198; L=192.168.4.94:3201: recv (104: Connection reset by peer) [/bas/749_REL/src/base/ni/nixxi.cpp 5420]
[Thr 139732915848256] *** ERROR => NiIRead: SiRecv failed for hdl 6289/sock 816
(SI_ECONN_BROKEN/104; I4; ST; P=192.168.4.153:12198; L=192.168.4.94:3201) [nixxi.cpp 5420]

While on tcpdump there is no indication of connection drop (no RST or FIN packages), furthermore at a later point of time the application server is trying to send some data:
180569 02:48:43.983725 192.168.4.153 → 192.168.4.94 TCP 576 26674 → 3201 [PSH, ACK] Seq=4686 Ack=4069 Win=38016 Len=508 TSval=2331633733 TSecr=554184432

Followed by  TCP Retransmissions, as the enqueue server never send an ACK, despite of many attempts:
180575 02:48:44.187686 192.168.4.153 → 192.168.4.94 TCP 576 [TCP Retransmission] 26674 → 3201 [PSH, ACK] Seq=4686 Ack=4069 Win=38016 Len=508 TSval=2331633784 TSecr=554184432
180576 02:48:44.391685 192.168.4.153 → 192.168.4.94 TCP 576 [TCP Retransmission] 26674 → 3201 [PSH, ACK] Seq=4686 Ack=4069 Win=38016 Len=508 TSval=2331633835 TSecr=554184432
180577 02:48:44.799689 192.168.4.153 → 192.168.4.94 TCP 576 [TCP Retransmission] 26674 → 3201 [PSH, ACK] Seq=4686 Ack=4069 Win=38016 Len=508 TSval=2331633937 TSecr=554184432
180578 02:48:45.615691 192.168.4.153 → 192.168.4.94 TCP 576 [TCP Retransmission] 26674 → 3201 [PSH, ACK] Seq=4686 Ack=4069 Win=38016 Len=508 TSval=2331634141 TSecr=554184432 

While on the enqueue server side, the last packet received is from the time when the connection was still active:
01:08:02.468032 192.168.4.153 → 192.168.4.94 TCP 68 12296 → 3201 [ACK] Seq=2097 Ack=1745 Win=32512 Len=0 TSval=170547114 TSecr=553587501

This indicates the presence of a stateful firewall which is dropping the connection after it becomes idle/inactive, while both sides of  server and client TCP stack assume that the connection is still idle, and as expected because of "net.ipv4.tcp_keepalive_time = 7200",  7200 seconds later the Netweaver TCP server is sending Keep-Alive
packets and resetting the connection in the end as there was no response to the  Keep-Alive packets:
185978  03:17:59.396252 192.168.4.94 → 192.168.4.153 TCP 68 [TCP Keep-Alive] 3201 → 26674 [ACK] Seq=4068 Ack=4686 Win=25216 Len=0 TSval=561384470 TSecr=2330272586
186102  03:19:14.396454 192.168.4.94 → 192.168.4.153 TCP 68 [TCP Keep-Alive] 3201 → 26674 [ACK] Seq=4068 Ack=4686 Win=25216 Len=0 TSval=561459470 TSecr=2330272586

197134  03:29:14.396256 192.168.4.94 → 192.168.4.153 TCP 68 3201 → 26674 [RST, ACK] Seq=4069 Ack=4686 Win=25216 Len=0 TSval=562059470 TSecr=2330272586 

Resolution

The systems are on separate /25 subnets, connected via a stateful firewall appliance, which keep track of all connections that pass through and dropping the idle/inactive connections after a certain time. This issue can be addressed by changing the firewall appliance configuration. 

Another option would be to decrease the TCP Keep-Alive from 7200s to 300s on both sides of TCP:
# /etc/sysctl.conf
--------------------------
net.ipv4.tcp_keepalive_time = 300 

Cause

The systems are connected via a stateful firewall appliance, which keep track of all connections that pass through and dropping the idle/inactive connections after a certain time. 

Additional Information

The existence of a stateful firewall can be probed by using TCP ACK scan with nmap tool, which sends packets with ACK bit set (indicating an already established connection):
# nmap -sA <destination>
..All 1000 scanned ports on ... are filtered  

A stateful firewall keeps track of every connection, and as it has no information about the connection that was probed by "nmap -sA" it will immediately drop the package. Ports that does not respond are marked as "filtered", which indicates the existence of a stateful firewall. In case there is no stateful firewall appliance, both open and closed ports  respond with a RST package, and marked as "unfiltered".

Referencing nmap man page:
 -sA (TCP ACK scan) 
     This scan is different than the others discussed so far in that it
     never determines open (or even open|filtered) ports. It is used to
     map out firewall rulesets, determining whether they are stateful or
     not and which ports are filtered.

     The ACK scan probe packet has only the ACK flag set (unless you use
     --scanflags). When scanning unfiltered systems, open and closed
     ports will both return a RST packet. Nmap then labels them as
     unfiltered, meaning that they are reachable by the ACK packet, but
     whether they are open or closed is undetermined. Ports that don't 
     respond, or send certain ICMP error messages back (type 3, code 0,
     1, 2, 3, 9, 10, or 13), are labeled filtered.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000020518
  • Creation Date: 15-Dec-2021
  • Modified Date:21-Dec-2021
    • SUSE Linux Enterprise Server for SAP Applications

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center