System hangs with a large number of tasks in uninterruptible sleep waiting for fanotify events

This document (000019761) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 12 SP5
SUSE Linux Enterprise Server 12 SP4
SUSE Linux Enterprise Server 12 SP3

Situation

System hangs with high load because a large number of tasks are blocked on uninterruptible sleep waiting for
fanotify event/responses which are being polled by McAfee related processes.
crash> sys|grep LOAD
LOAD AVERAGE: 52.38, 39.16, 18.12

crash> ps -S
RU: 3
IN: 844
UN: 52

crash> foreach UN bt | grep "#2\|#3 " | awk '{print $3 }' | sort | uniq -c|sort -rn
50 fsnotify
50 fanotify_handle_event
1 wait_for_completion_killable
1 schedule_timeout
1 rwsem_down_read_failed
1 call_rwsem_down_read_failed
Almost all the tasks have the same stack trace, for example:
PID: 5393 TASK: ffff933e2c669340 CPU: 0 COMMAND: "sapstartsrv"
#0 [ffffacec476d3958] __schedule at ffffffffae716042
#1 [ffffacec476d39e0] schedule at ffffffffae716662
#2 [ffffacec476d39f0] fanotify_handle_event at ffffffffae2ac0e7
#3 [ffffacec476d3a60] fsnotify at ffffffffae2a8a56
#4 [ffffacec476d3b40] do_dentry_open at ffffffffae260b1b
#5 [ffffacec476d3b80] path_openat at ffffffffae2722ed
#6 [ffffacec476d3c58] do_filp_open at ffffffffae274d1e
#7 [ffffacec476d3d60] do_sys_open at ffffffffae262156
#8 [ffffacec476d3db0] mfe_aac_sys_openat at ffffffffc08d1169 [mfe_aac_100606122]
McAfee module is loaded and tainting the kernel:
crash> mod -t
NAME TAINTS
mfe_aac_100606122 OE
All tasks blocked on uninterruptible sleep are waiting on fanotify_handle_event(), except for "Collect FA Evnt" and "nfsd", both of them also have the longest time on UN state. It's interesting, if we check the stack trace of "Collect FA Evnt" task, which is responsible for collecting/validating the fanotify events, it's on a blocked state while waiting for a rw_semaphore lock:
crash> bt ffff933de84f8500
PID: 13567 TASK: ffff933de84f8500 CPU: 1 COMMAND: "Collect FA Evnt"
#0 [ffffacec510a7ae8] __schedule at ffffffffae716042
#1 [ffffacec510a7b70] schedule at ffffffffae716662
#2 [ffffacec510a7b80] rwsem_down_read_failed at ffffffffae7194ef
#3 [ffffacec510a7be0] call_rwsem_down_read_failed at ffffffffae3c2704
#4 [ffffacec510a7c28] down_read at ffffffffae718a63
#5 [ffffacec510a7c30] lookup_slow at ffffffffae26f966
#6 [ffffacec510a7c88] walk_component at ffffffffae27127f
#7 [ffffacec510a7ce0] path_lookupat at ffffffffae2718d9
#8 [ffffacec510a7d38] filename_lookup at ffffffffae2741bc
#9 [ffffacec510a7e48] vfs_statx at ffffffffae269204
#10 [ffffacec510a7e98] SYSC_newstat at ffffffffae269af6
#11 [ffffacec510a7f30] do_syscall_64 at ffffffffae003954
#12 [ffffacec510a7f50] entry_SYSCALL_64_after_hwframe at ffffffffae80009a
RIP: 00007f5b31955525 RSP: 00007f5aeaff3778 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 00007f5aeaff3780 RCX: 00007f5b31955525
RDX: 00007f5aeaff4b10 RSI: 00007f5aeaff4b10 RDI: 00007f5aeaff3780
RBP: 00007f5aeaff4b10 R8: 000000000000000f R9: 00007f5aeaff46b8
R10: 0000000000000007 R11: 0000000000000246 R12: 00007f5aeaff4830
R13: 00007f5aeaff4830 R14: 000055eaa137d690 R15: 00007f5aeaff8c10
ORIG_RAX: 0000000000000004 CS: 0033 SS: 002b
The semaphore lock is being held/owned by nfsd task:
crash> struct rw_semaphore.owner ffff933a7c4741d0
owner = 0xffff9339a9f31040
While "nfsd" task itself is blocked while waiting on fanotify_handle_event() while trying to access an NFS exported file on /usr/sap/trans:
crash> bt ffff9339a9f31040
PID: 48565 TASK: ffff9339a9f31040 CPU: 0 COMMAND: "nfsd"
#0 [ffffacec54ca3990] __schedule at ffffffffae716042
#1 [ffffacec54ca3a18] schedule at ffffffffae716662
#2 [ffffacec54ca3a28] fanotify_handle_event at ffffffffae2ac0e7
#3 [ffffacec54ca3a98] fsnotify at ffffffffae2a8a56
#4 [ffffacec54ca3b78] do_dentry_open at ffffffffae260b1b
#5 [ffffacec54ca3bb8] dentry_open at ffffffffae261e24
#6 [ffffacec54ca3be8] nfsd_open at ffffffffc078d8fe [nfsd]
#7 [ffffacec54ca3c20] nfs4_get_vfs_file at ffffffffc07a79fa [nfsd]
#8 [ffffacec54ca3cd0] nfsd4_process_open2 at ffffffffc07ac296 [nfsd]
#9 [ffffacec54ca3da8] nfsd4_open at ffffffffc079b707 [nfsd]
#10 [ffffacec54ca3e00] nfsd4_proc_compound at ffffffffc079bd78 [nfsd]
#11 [ffffacec54ca3e48] nfsd_dispatch at ffffffffc07891dc [nfsd]
#12 [ffffacec54ca3e78] svc_process_common at ffffffffc0705447 [sunrpc]
#13 [ffffacec54ca3ed0] svc_process at ffffffffc07064e4 [sunrpc]
#14 [ffffacec54ca3ef0] nfsd at ffffffffc0788c83 [nfsd]
#15 [ffffacec54ca3f10] kthread at ffffffffae0b0186
#16 [ffffacec54ca3f50] ret_from_fork at ffffffffae800235

crash> struct path.mnt ffffacec54ca3bf8
mnt = 0xffff933e79f3e560

crash> struct vfsmount.mnt_sb 0xffff933e79f3e560
mnt_sb = 0xffff933e73076800

crash> mount|grep ffff933e73076800
ffff933e79f3e540 ffff933e73076800 ext4 /dev/mapper/datavg00-usr_sap_trans_lv /usr/sap/trans

crash> struct dentry.d_name.name 0xffff933998ad80c0
d_name.name = 0xffff933998ad80f8 "KFS.LOB"

crash> struct dentry.d_name.name 0xffff933de0976cc0
d_name.name = 0xffff933de0976cf8 "tmp"

It seems we are in a sort of deadlock situations caused by the fanotify events, polled by McAfee related processes.
 

Resolution

This is a known issue, which has been addressed on newer version of McAfee ENSL. For more details about the version that contains the fix, it would be highly recommended to involve McAfee Support.

Cause

System hanged with high load because a large number of tasks are blocked in uninterruptible sleep waiting for fanotify event/responses which are being polled by McAfee related processes. This seems to be caused by the approach that some McAfee ENSL versions are handling fanotify events.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000019761
  • Creation Date: 27-Oct-2020
  • Modified Date:27-Oct-2020
    • SUSE Linux Enterprise Server
    • SUSE Linux Enterprise Server for SAP Applications

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center