Winbind and SAP application coredumps in __nscd_get_nl_timestamp()

This document (000019920) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 12
SUSE Linux Enterprise Server for SAP Applications 12
 

Situation

Initially nscd process receives SIGKILL:
 2021-02-10T09:47:39.720880+00:00 spps4hlapew03 systemd[1]: nscd.service: Main process exited, code=killed, status=9/KILL
 2021-02-10T09:47:39.730474+00:00 spps4hlapew03 systemd[1]: nscd.service: Failed with result 'exit-code'.

The nscd crash is followed by winbindd and SAP processes core dumps. While checking the winbind core dump, the crash point seems to be when trying to access the persistent database file ( map->head->nscd_certainly_running ) on frame#8, as the mapping address seems to have been unmapped during nscd abrupt stop:
#glibc-2.22/nscd/nscd_gethst_r.c
-------------------------------------
101 __nscd_get_nl_timestamp (void)
102 {
103   uint32_t retval;
104   if (__nss_not_use_nscd_hosts != 0)
105     return 0;
106
107   /* __nscd_get_mapping can change hst_map_handle.mapped to NO_MAPPING.
108    However, __nscd_get_mapping assumes the prior value was not NO_MAPPING.
109    Thus we have to acquire the lock to prevent this thread from changing
110    hst_map_handle.mapped to NO_MAPPING while another thread is inside
111     __nscd_get_mapping.  */
112   if (!__nscd_acquire_maplock (&__hst_map_handle))
113     return 0;
114
115   struct mapped_database *map = __hst_map_handle.mapped;
116
117   if (map == NULL
118       || (map != NO_MAPPING
119***>       && map->head->nscd_certainly_running == 0
120           && map->head->timestamp + MAPPING_TIMEOUT < time (NULL)))
121     map = __nscd_get_mapping (GETFDHST, "hosts", &__hst_map_handle.mapped);
122

 (gdb) bt
 #0  0x00007effbe9962a7 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55
 #1  0x00007effbe99767a in __GI_abort () at abort.c:78
 #2  0x00007effc21dcf0e in dump_core () at ../source3/lib/dumpcore.c:338
 #3  0x00007effc21ce247 in smb_panic_s3 (why=<optimized out>) at ../source3/lib/util.c:814
 #4  0x00007effc4ec9ddf in smb_panic (why=why@entry=0x7effc4f11c84 "internal error") at ../lib/util/fault.c:166
 #5  0x00007effc4ec9ff6 in fault_report (sig=<optimized out>) at ../lib/util/fault.c:83
 #6  sig_fault (sig=<optimized out>) at ../lib/util/fault.c:94
 #7  <signal handler called>
 #8  0x00007effbea7e419 in __nscd_get_nl_timestamp () at nscd_gethst_r.c:119
        retval = <optimized out>
        map = 0x56344e6c7470
 #9  0x00007effbea690ec in get_nl_timestamp () at ../sysdeps/unix/sysv/linux/check_pf.c:87
 No locals.
 #10 cache_valid_p () at ../sysdeps/unix/sysv/linux/check_pf.c:98
        timestamp = <optimized out>
        timestamp = <optimized out>
 #11 __check_pf (seen_ipv4=seen_ipv4@entry=0x7fff44fe0eb2, seen_ipv6=seen_ipv6@entry=0x7fff44fe0eb3, in6ai=in6ai@entry=0x7fff44fe0ec0, in6ailen=in6ailen@entry=0x7fff44fe0ec8) at ../sysdeps/unix/sysv/linux/check_pf.c:304
        olddata = 0x0
        data = 0x0
 #12 0x00007effbea3b911 in __GI_getaddrinfo (name=0x56344e6d5f20 "172.16.1.5", service=0x7fff44fe10e0 "88", service@entry=0x0,
     hints=0x7fff44fe10b0, hints@entry=0x1, pai=0x7fff44fe10a8, pai@entry=0x56344e6e0ba0) at ../sysdeps/posix/getaddrinfo.c:2374
 #13 0x00007effb9d24475 in system_getaddrinfo (res=res@entry=0x56344e6e0ba0, hint=hint@entry=0x1, serv=serv@entry=0x0,
     name=<optimized out>) at fake-addrinfo.c:1360
 #14 my_fake_getaddrinfo (result=result@entry=0x56344e6e0ba0, hint=hint@entry=0x1, serv=serv@entry=0x0, name=<optimized out>)
     at fake-addrinfo.c:1161
 #15 krb5int_getaddrinfo (node=<optimized out>, service=service@entry=0x7fff44fe10e0 "88", hints=hints@entry=0x7fff44fe10b0,
     aip=aip@entry=0x7fff44fe10a8) at fake-addrinfo.c:1361
 #16 0x00007effbf5e0c25 in resolve_server (servers=0x0, conns=0x7fff44fe1090, udpbufp=0x7fff44fe10a0, message=0x7fff44fe1270, socktype2=1,
     socktype1=2, ind=0, context=0x56344e6dc6c0) at sendto_kdc.c:579
 #17 k5_sendto (context=context@entry=0x56344e6dc6c0, message=message@entry=0x7fff44fe1270, servers=servers@entry=0x7fff44fe11e0,
     socktype1=socktype1@entry=2, socktype2=socktype2@entry=1, callback_info=callback_info@entry=0x0, reply=reply@entry=0x7fff44fe1280,
     remoteaddr=remoteaddr@entry=0x0, remoteaddrlen=remoteaddrlen@entry=0x0, server_used=server_used@entry=0x7fff44fe11dc,
     msg_handler=msg_handler@entry=0x7effbf5dfbb0 <check_for_svc_unavailable>, msg_handler_data=msg_handler_data@entry=0x7fff44fe11d8)
     at sendto_kdc.c:1037
 #18 0x00007effbf5e133a in krb5_sendto_kdc (context=context@entry=0x56344e6dc6c0, message=message@entry=0x7fff44fe1270,
     realm=realm@entry=0x7fff44fe1290, reply=reply@entry=0x7fff44fe1280, use_master=use_master@entry=0x7fff44fe126c,
     tcp_only=tcp_only@entry=0) at sendto_kdc.c:218
 #19 0x00007effbf5b712c in k5_init_creds_get (context=context@entry=0x56344e6dc6c0, ctx=0x56344e6d5840,
     use_master=use_master@entry=0x7fff44fe1408) at get_in_tkt.c:544
 #20 0x00007effbf5b725d in k5_get_init_creds (context=context@entry=0x56344e6dc6c0, creds=creds@entry=0x7fff44fe2700,
     client=client@entry=0x56344e6dc630, prompter=prompter@entry=0x7effc0afdae0 <kerb_prompter>,
     prompter_data=prompter_data@entry=0x7fff44fe3538, start_time=start_time@entry=0, in_tkt_service=in_tkt_service@entry=0x0,
     options=options@entry=0x56344e6dd270, gak_fct=gak_fct@entry=0x7effbf5b85d0 <krb5_get_as_key_password>,
     gak_data=gak_data@entry=0x7fff44fe1480, use_master=use_master@entry=0x7fff44fe1408, as_reply=as_reply@entry=0x7fff44fe1420)
     at get_in_tkt.c:1782

(gdb) frame 8
#8  0x00007effbea7e419 in __nscd_get_nl_timestamp () at nscd_gethst_r.c:119
119               && map->head->nscd_certainly_running == 0

The counter is > 0, while the persistent database header mapping seems to be nonexistent or previously unmapped:
(gdb) p ((struct mapped_database *)0x56344e6c7470)->counter
$4 = 1

(gdb) p *(((struct mapped_database *)0x56344e6c7470)->head)
Cannot access memory at address 0x7effb729f000

(gdb) x /8xw 0x7effb729f000
0x7effb729f000: Cannot access memory at address 0x7effb729f000

While checking the mappings, we can see the persistent db file shared mapping marked as deleted:
(gdb) info proc mappings
mapped address spaces:

  Start Addr         End Addr       Size    Offset   objfile
0x7effb729f000   0x7effb72d4000    0x35000    0x0 /run/nscd/dbYFJari (deleted)


The crash point of the SAP processes is also during persistent db file access in __nscd_get_map_ref():
Message: Process 14763 (gwrd) of user 1001 dumped core.

 Stack trace of thread 13071:
 #0  0x00007f53b6a4e6c6 __nscd_get_map_ref (libc.so.6)
 #1  0x00007f53b6a4ba66 nscd_gethst_r (libc.so.6)
 #2  0x00007f53b6a2d08f gethostbyaddr_r@@GLIBC_2.2.5 (libc.so.6)
 #3  0x00007f53b6a347c2 getnameinfo (libc.so.6)
 #4  0x000055764eaa606c _Z16NiPGetHostByAddrPK11NI_NODEADDRhPDsjPP8_IO_FILE (gwrd)
 #5  0x000055764ea0cc11 _ZN14NIHIMPL_LINEAR11getHostNameEPK11NI_NODEADDRPDsjhjPP8_IO_FILE (gwrd)
 #6  0x000055764e9f90ba _Z14NiIGetHostNamePK11NI_NODEADDRPDsjhjPP8_IO_FILE (gwrd)
 #7  0x000055764ea992f4 _Z12GwAddrToHostP11NI_NODEADDRPDsj (gwrd)
 #8  0x000055764e960951 _Z12GwRqDpSendToP11REQUEST_BUFiiihP15DP_SESSION_INFOPi (gwrd)
 #9  0x000055764ea21d17 _Z13GwRemGwHandlei (gwrd)
 #10 0x000055764e98d0dd _ZL6GwLoopv (gwrd)
 #11 0x000055764ea4780f nlsui_main (gwrd)
 #12 0x000055764e959f1a main (gwrd)
 #13 0x00007f53b694fa35 __libc_start_main (libc.so.6)
 #14 0x000055764ea20d4d _start (gwrd)

Resolution

The workaround and also the best practices recommendation would be to stop and disable nscd on systems that are running winbind.
 

Status

Reported to Engineering

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000019920
  • Creation Date: 18-Mar-2021
  • Modified Date:19-Mar-2021
    • SUSE Linux Enterprise Server
    • SUSE Linux Enterprise Server for SAP Applications

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center