How to obtain application core dumps

This document (000020900) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 15
SUSE Linux Enterprise Server 12
SUSE Linux Enterprise Server for SAP Applications 15
SUSE Linux Enterprise Server for SAP Applications 12

Situation

A core dump of a process needs to be captured for troubleshooting purposes.

Resolution

Overview

If a process dies abnormally the kernel will dump its memory, writing a core dump.

The file /proc/sys/kernel/core_pattern determines how to process the dump and can contain a path name for the dump or a pipe command which will process the dump.

In the past it was customary to write your own patterns into /proc/sys/kernel/core_pattern, but on SLES you already have a maintained setup based on systemd-coredump. This document will guide you to all steps customizing this setup.

Note: The man page `core(5)` describes core dumping in detail.

Note: On SLES /proc/sys/kernel/core_pattern should contain |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h. If you overwrite it with your own pattern, you bypass systemd-coredump and you are responsible to handle the written dumps!

Set the resource limit for the maximum size of a core dump file

On Linux systems, limits can be set for various process resources. One of those limits is the core file size (RLIMIT_CORE). To ensure a complete core dump can be written, it should be set to unlimited.

To verify the current core file size of an existing process, run prlimit -p <PID> -c.

Typically server programs get started by a systemd and the default on SLES already should be unlimited.
If you want to change the limit globally, create a drop-in for /etc/system/systemd.conf`:

mkdir /etc/systemd/system.conf.d
cat > /etc/systemd/system.conf.d/99-coredumplimit.conf <<EOF
[Manager]
DefaultLimitCORE=<YOUR VALUE>
EOF

Since this affects the entire system, initiate a reboot after making the change.

If you just want to adjust the value for a specific service, create a drop-in just for the service, reload the systemd manager configuration and restart the service.

Here an example how to do it for cron.service:

# mkdir /etc/systemd/system/cron.service.d
# cat > /etc/systemd/system/cron.service.d/99-coredumplimit.conf <<EOF
[Service]
LimitCORE=<YOUR VALUE>
EOF
# systemctl daemon-reload
# systemctl restart cron.service

Note: For details see man systemd.exec.

If a program is called from the shell the PAM stack gets involved. This also can be the case for programs started by systemd, but switch user by invoking a PAM session after start. An indication would be, that the started processes run as a different user as root and its systemd service configuration has no corresponding User= entry.

In such cases configure the limit via PAM by configuring /etc/security/limits.conf (see man limits.conf for details). The default should already be unlimited on systems with a recent patch level.

Limits configured via PAM get applied on logon.

Here an example entry to set the core file size for the group sapsys to unlimited (both hard and soft limit):

...
@sapsys          -      core             unlimited
...

Note: Both hard and soft limit must be set for core dumps to be written!

Note: If no other reasons are against it, best configure both PAM and systemd

Note: If you have modified your PAM stack, make sure pam_limits.so is part otherwise this will not work

Note: It is possible to see lower limits then configured. Programs can reduce the limits further. In such a case you have to set the limit after start for the process like: prlimit --core=unlimited:unlimited -p <PID>

Configure core dump size restrictions of systemd-coredump

Additional to the resource limit for the core file size, systemd-coredump also has configurable limits.

The current default for core dumps is 2 GiB, which can be to small for larger applications like SAP systems and core dumps can be truncated. To disable any limitation, set the ProcessSizeMax= and ExternalSizeMax= to infinity by creating a drop-in file for coredump.conf:

cat > /etc/systemd/coredump.conf.d/99-sizing.conf <<EOF
[Coredump]
ProcessSizeMax=infinity
ExternalSizeMax=infinity
EOF

Alternatively /etc/systemd/coredump.conf can be edited directly.

Changes take effect immediately.

Note: The value infinity was introduced with systemd-228-150.101.3 (SLES12) and systemd-234-24.108.1 (SLES15).

When configuring the sizing also make sure that there is enough disk space to hold the dumps.
Even with core dumps written in compressed form (Zstandard or XZ), they still can consume a lot of disk space. If in doubt you can mount an external file system to /var/lib/systemd/coredump/ to hold the dumps.

Configuring cleanup

Two methods exist to cleanup old core dumps.

The first one is by systemd-coredump itself, which can remove old dumps when a core dump is invoked. This can be configured by setting MaxUse= and KeepFree= and is disabled per default. For details see man 5 coredump.conf.

The second one is by systemd-tmpfiles. The default configuration removes files from /var/lib/systemd/coredump/ which are older then 3 days, which can be to short for support cases.
The core dump cleanup configuration is part of /usr/lib/tmpfiles.d/systemd.conf, but it is not possible to overwrite specific entries by drop-in files, so you need to copy the entire file to /etc/tmpfiles.d/systemd.conf and modify it to your needs.

Here an example with the clean up time changed to 30 days:

...
d /var/lib/systemd/coredump 0755 root root 30d
...

You can disable cleanup entirely by removing the line from the configuration file.

Note: Because /etc/tmpfiles.d/systemd.conf is preferred over /usr/lib/tmpfiles.d/systemd.conf, check after a package update or system upgrade if /usr/lib/tmpfiles.d/systemd.conf got changes you want to adopt.

Disable AppArmor

AppArmor application security is based on learning the normal/good behavior of an application and then preventing the application from performing operations that do not fit the learned behavior. As writing a core dump is typically not part of an application's normal/good behavior, AppArmor is likely to prevent a core file from being written.

Either configure AppArmor to handle core dumps or disable if not needed: systemctl stop --now apparmor.service

Enable core dumps for setuid and setgid processes

Linux systems provide a facility, the setuid and setgid bits, to execute processes under a different user id or group id than that of the user/group starting the process. Such a process may be dealing with data which should not be directly accessible to the invoking user/group and if a core dump were written for such a process, that data could be leaked. For this reason, the kernel does not write core dumps for setuid/setgid processes by default. This default behavior can be overridden through the fs.suid_dumpable sysctl.

To allow setuid/setgid processes to be dumped, create a sysctl drop-in to set fs.suid_dumpable to 2 and let sysctl reload the configuration. This is the default on SLE 15 SP3 onward.

# cat > /etc/sysctl.d/99-coredump.conf <EOF
fs.suid_dumpable=2
EOF
# systemctl --system

The changes will apply only to newly created processes.

Important: Do not name the drop-in file /etc/sysctl.d/50-coredump.conf! This would overwrite the shipped configuration /usr/lib/sysctl.d/50-coredump.conf entirely and not just add or change parameters present there.

Note: The sysctl kernel.suid_dumpable was introduced by mistake way back in the past (Bug 6145 - kernel.suid_dumpable sysctl is really fs.suid_dumpable). The fs.suid_dumpable is the correct one. Nowadays both seem to exist and are linked together. If you change one, you also change the other.
Alwayss use fs.suid_dumpable to be on the safe side.

Test your setup

After configuring the core dump it is time to test it. This should be done after setting it up and also after a reboot!

A simple crash can be produces by executing: sh -c 'kill -SEGV $$'
Afterwards the command coredumpctl should list the core dump (see man 1 coredumpctl for details).
If you don't get a core dump, verify your configuration.

Note: The command coredumpctl retrieves the list of dumps from the systemd journal. On SLES the journal gets removed after a reboot and existing dumps might not be listed. The dumps themselves are still available in /var/lib/systemd/coredump/.

If the test was successful also check that you can create full core dumps of your important applications and that your setup is suitable for the core dump size they produce.

If you don't get a core dump of your application, the reason can be:

    - The filesystem containing /var/lib/systemd/coredump/ is full.
    - The signal was caught by the application and handled internally.
    - The application programmer prevented core dumps by setting the PR_SET_DUMPABLE flag (man 2 prctl).
    - It is a setuid/setgid processe and fs.suid_dumpable is set to 1.
    - The application has reduced the core file size (RLIMIT_CORE) after start.

Manually triggering a core dump

In some troubleshooting scenarios, it may be necessary to trigger the generation of a core dump manually. This can be done by sending a signal that generates a coredump to the process, for example SIGABRT. To send SIGABRT to a process, run: kill -ABRT <PID>

Additional Information

This TID is an updated rework of How to obtain application core dumps.

The dump of the Linux kernel is referred as "kernel core dump (kdump)" and is beyond the scope of this document (see TID 3374462 - Configure crashkernel memory for kernel core dump analysis).

Core dump analysis

As core dump analysis requires specialist skills and can be a highly complex process, it should generally only be attempted when more straightforward troubleshooting steps (in particular, applying relevant patches) have not brought relief.

Also a core dump might not contain the entire memory of a process. The file /proc/<pid>/coredump_filter defines, which part of the memory shall be dumped. Don't change the default setting unless SUSE support tells you so!
Also a programmer can exclude part of the address space n´by setting the MADV_DONTDUMP flag. In such a case the developer of an application might be involved into the analysis.

When do core dumps occur?

A process will crash if it receives a signal (which it is not ignoring, or which it cannot ignore) and for which it has not set up a signal handler routine. For example, most processes generally do not set up a signal handler to handle SIGSEGV and will crash when the kernel sends them this signal to indicate that they have tried to perform an incorrect memory access. When this situation occurs and certain environmental settings are right, the kernel will write a core dump file for the process.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

Document ID:000020900
Creation Date: 20-Dec-2022
Modified Date:20-Dec-2022
- SUSE Linux Enterprise Server
- SUSE Linux Enterprise Server for SAP Applications

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com