effects of sss_cache on the memory cache

This document (000020646) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server for SAP Applications 15 All Service Packs
SUSE Linux Enterprise Server for SAP Applications 12 SP5

SUSE Linux Enterprise Server 15 All Service Packs
SUSE Linux Enterprise Server 12 SP5

 

Situation

There are several signs of the problem described here, which will rarely occur in most environments. The first sign also appear with a variety of similar problems. For a check, you should therefore carry out if all other signs match.
# id xxx
id: 'xxx': no such user
In the message log you can find repeated attempts by sssd to restart the nss service:
2022-03-28T17:32:09.780060+02:00 dcplnx25719280 nss: Starting up
2022-03-28T17:32:11.792566+02:00 dcplnx25719280 nss: Starting up
2022-03-28T17:32:15.805214+02:00 dcplnx25719280 nss: Starting up
2022-03-28T17:32:15.808333+02:00 dcplnx25719280 sssd: Exiting the SSSD. Could not restart critical service [nss].
also a full restart of the sssd service fails:
2022-03-28T19:21:46.296848+02:00 dcplnx25719280 systemd[1]: Starting System Security Services Daemon...
2022-03-28T19:21:46.313220+02:00 dcplnx25719280 sssd: Starting up
2022-03-28T19:21:46.323196+02:00 dcplnx25719280 be[LDAPS]: Starting up
2022-03-28T19:21:46.342295+02:00 dcplnx25719280 pam: Starting up
2022-03-28T19:21:46.342689+02:00 dcplnx25719280 sudo: Starting up
2022-03-28T19:21:46.343671+02:00 dcplnx25719280 nss: Starting up
2022-03-28T19:21:46.344005+02:00 dcplnx25719280 ssh: Starting up
2022-03-28T19:21:46.374240+02:00 dcplnx25719280 nss: Starting up
2022-03-28T19:21:52.452237+02:00 dcplnx25719280 sssd: Exiting the SSSD. Could not restart critical service [nss].
2022-03-28T19:21:52.452388+02:00 dcplnx25719280 sudo: Shutting down
2022-03-28T19:21:52.454401+02:00 dcplnx25719280 ssh: Shutting down
2022-03-28T19:21:52.456898+02:00 dcplnx25719280 pam: Shutting down
2022-03-28T19:21:52.459478+02:00 dcplnx25719280 be[LDAPS]: Shutting down
2022-03-28T19:21:52.464080+02:00 dcplnx25719280 systemd[1]: sssd.service: Main process exited, code=exited, status=1/FAILURE
2022-03-28T19:21:52.464473+02:00 dcplnx25719280 systemd[1]: Failed to start System Security Services Daemon.
for a clear check you should increase the sssd debug level by:
# sss_debuglevel 0x00ff
in the file /var/log/sssd/sssd_LDAPS.log is now logged
(2022-03-28 19:21:46): [be[LDAPS]] [ldb] (0x0010): ltdb: tdb(/var/lib/sss/db/cache_LDAPS.ldb): tdb_transaction_prepare_commit: expansion failed
(2022-03-28 19:21:46): [be[LDAPS]] [ldb] (0x0010): Failure during prepare_write): IO Error -> Protocol error
(2022-03-28 19:21:46): [be[LDAPS]] [ldb] (0x0010): cancel called but no ldb transactions are active!
(2022-03-28 19:21:52): [be[LDAPS]] [orderly_shutdown] (0x0010): SIGTERM: killing children
in the file /var/log/sssd/sssd_nss.log you will find any of the next errors:
(2022-03-28 19:17:51): [nss] [sss_mc_create_file] (0x0010): Failed to mark mmap file /var/lib/sss/mc/passwd as recycled: 28(No space left on device)
(2022-03-28 19:17:51): [nss] [sss_mc_create_file] (0x0010): Failed to mark mmap file /var/lib/sss/mc/group as recycled: 28(No space left on device)
 

Resolution

Please avoid frequent calls of sss_cache and check regularly whether there is enough space for further cache files.

As a workaround you might set the environment variable SSS_NSS_USE_MEMCACHE to "NO". This causes the memory cache to not be used at all, but performance issues are likely.

Cause

The sss_cache command invalidates records in SSSD cache. With other processes/PIDs, this can lead to deleted /var/lib/sss/mc/* files remaining open and significantly more disk space being required than directly indicated by the cache files. This is particularly important when the cache files are not saved as usual on the hard drive but, for example in a root file system on NFS setup, in a ramdisk for quick access.

EFFECTS ON THE FAST MEMORY CACHE sss_cache also invalidates the memory cache. Since the memory cache is a file which is mapped into the memory of each process which called SSSD to resolve users or groups the file cannot be truncated. A special flag is set in the header of the file to indicated that the content is invalid and then the file is unlinked by SSSD's NSS responder and a new cache file is created. Whenever a process is now doing a new lookup for a user or a group it will see the flag, close the old memory cache file and map the new one into its memory. When all processes which had opened the old memory cache file have closed it while looking up a user or a group the kernel can release the occupied disk space and the old memory cache file is finally removed completely. A special case is long running processes which are doing user or group lookups only at startup, e.g. to determine the name of the user the process is running as. For those lookups the memory cache file is mapped into the memory of the process. But since there will be no further lookups this process would never detect if the memory cache file was invalidated and hence it will be kept in memory and will occupy disk space until the process stops. As a result calling sss_cache might increase the disk usage because old memory cache files cannot be removed from the disk because they are still mapped by long running processes.

A possible work-around for long running processes which are looking up users and groups only at startup or very rarely is to run them with the environment variable SSS_NSS_USE_MEMCACHE set to "NO" so that they won't use the memory cache at all and not map the memory cache file into the memory. In general a better solution is to tune the cache timeout parameters so that they meet the local expectations and calling sss_cache is not needed.

Source:
https://github.com/SSSD/sssd/commit/b9e60ae067696782e3a52f58172f13077b5ea0f2

Background:
https://docs.pagure.org/sssd.sssd/design_pages/fast_nss_cache.html

 

Additional Information


 

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000020646
  • Creation Date: 28-Apr-2022
  • Modified Date:06-May-2022
    • SUSE Linux Enterprise Server
    • SUSE Linux Enterprise Server for SAP Applications

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center