DeepSea node runs out of memory and / or root filesystem disk space

This document (7022654) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Enterprise Storage 4
SUSE Enterprise Storage 5

Situation

The DeepSea salt master node runs out of memory and or reports that the root file system is full. Messages similar to the below are flooding the "/var/log/salt/master" log file:

2018-01-15 15:20:51,512 [salt.utils.process][ERROR   ][1149] An un-handled exception from the multiprocessing process 'Maintenance-18' was caught:
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/salt/utils/process.py", line 647, in _run
    return self._original_run()
  File "/usr/lib/python2.7/site-packages/salt/master.py", line 240, in run
    salt.daemons.masterapi.clean_old_jobs(self.opts)
  File "/usr/lib/python2.7/site-packages/salt/daemons/masterapi.py", line 193, in clean_old_jobs
    mminion.returners[fstr]()
  File "/usr/lib/python2.7/site-packages/salt/returners/local_cache.py", line 430, in clean_old_jobs
    shutil.rmtree(t_path)
  File "/usr/lib64/python2.7/shutil.py", line 247, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib64/python2.7/shutil.py", line 252, in rmtree
    onerror(os.remove, fullname, sys.exc_info())
  File "/usr/lib64/python2.7/shutil.py", line 250, in rmtree
    os.remove(fullname)
OSError: [Errno 13] Permission denied: '/var/cache/salt/master/jobs/6d/431cfbd8dccac414c625deb28178aed3c7625fdc5765849dda1897e039c114/jid'

Resolution

This will eventually be addressed with an update to DeepSea, currently to work around the issue create a CRON job that runs once per day (or multiple times per day if deemed necessary) that automatically removes all files older than one day from the salt masters jobs cache using something like the below examples:

/usr/bin/find /var/cache/salt/master/jobs/ -type f -mtime +1 -delete 2>&1 | logger -t salt-jobs

OR

/usr/bin/find /var/cache/salt/master/jobs/ -type f -mtime +1 | xargs /usr/bin/rm 2>&1 | logger -t salt-jobs

The above will find all files older than one day and remove them, logging any errors to "/var/log/messages" with a "salt-jobs" entry.

Cause

Runners in jinja are executed as root resulting in runner jobs having root file permissions. The salt-master service however runs as salt:salt and is thus unable to manage these files in its job cache.

Additional Information

An additional temporary solution may be to simply intermittently change the ownership of all files in the salt master job cache recursively to salt:salt using for example:

chown -R salt:salt /var/cache/salt/master/jobs/*

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7022654
  • Creation Date: 12-Feb-2018
  • Modified Date:03-Mar-2020
    • SUSE Enterprise Storage

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center