System freezes with a large number of tasks waiting for gsch_scan() to return
This document (000019767) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server 12 SP4
Situation
PID: 16901 TASK: ffff897e6f7f9140 CPU: 7 COMMAND: "sshd" #0 [ffffa6a8c6063bd8] __schedule at ffffffffa07209e2 #1 [ffffa6a8c6063c60] schedule at ffffffffa0721002 #2 [ffffa6a8c6063c70] gsch_scan at ffffffffc0887e53 [gsch] #3 [ffffa6a8c6063d30] gsch_policy_handle_close at ffffffffc0884cbf [gsch] #4 [ffffa6a8c6063d80] gsch_redirfs_release at ffffffffc0883cc0 [gsch] #5 [ffffa6a8c6063dc8] rfs_postcall_flts at ffffffffc081903e [redirfs] #6 [ffffa6a8c6063de8] rfs_release at ffffffffc08121ca [redirfs] #7 [ffffa6a8c6063e88] __fput at ffffffffa0268902 #8 [ffffa6a8c6063ec8] task_work_run at ffffffffa00b09d8 #9 [ffffa6a8c6063f00] exit_to_usermode_loop at ffffffffa008ce79 #10 [ffffa6a8c6063f30] do_syscall_64 at ffffffffa0004a25 #11 [ffffa6a8c6063f50] entry_SYSCALL_64_after_hwframe at ffffffffa08000b6 RIP: 00007fb582a1d1d2 RSP: 00007fff0873aa68 RFLAGS: 00000202 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00007fb582a1d1d2 RDX: 00007fff0873ac20 RSI: 0000000000000001 RDI: 0000000000000008 RBP: 00007fff0873adc0 R8: 0000000000000000 R9: 00007fff0873a6b0 R10: 0000000000000008 R11: 0000000000000202 R12: 00007fff0873af50 R13: 0000000000000000 R14: 000055ecb8a56250 R15: 000055ecb6cbb1a0 ORIG_RAX: 0000000000000003 CS: 0033 SS: 002b
In more details, there are 1398 tasks on sleep state waiting for gsch_scan():
crash> foreach IN bt | grep "#2\|#3" | awk '{print $3 }' | sort | uniq -c|sort -rn 2713 do_wait 2707 sys_wait4 1398 gsch_scan 1376 gsch_policy_handle_pre_open 685 pipe_wait 685 pipe_read 254 kthread 196 futex_wait_queue_me 196 futex_wait 103 rescuer_thread 57 schedule_hrtimeout_range_clock
gsch_scan() and gsch_policy_handle_pre_open() are coming from the third party kernel module gsch (Trend Micro Deep Security Agent), which is also tainting the kernel:
crash> mod -t NAME TAINTS tmhook OE redirfs OE gsch OE bmhook OE
Just before the system hang the TrendMicro processes were failing to fork on system slice:
kernel: [59696.369225] cgroup: fork rejected by pids controller in /system.slice/ds_agent.service
Resolution
#/etc/systemd/system.conf ------------------------ [Manager] DefaultTasksMax=infinity
And reload the new settings by running:
# systemctl daemon-reload
In case the same system hang pattern happens without the "fail to fork" error, then we would recommend to involve Trend Micro support as they might recommend a workaround or possibly an updated version if this has been fixed in newer Trend Micro Deep Security Agent versions.
Cause
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000019767
- Creation Date: 30-Oct-2020
- Modified Date:13-Dec-2021
-
- SUSE Linux Enterprise Server
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com