Kdump fails on IBM POWER system when crash is triggered after CPU/memory Add/Remove operation
This document (7023750) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server 15 SP 2
SUSE Linux Enterprise Server 15 SP 1
SUSE Linux Enterprise Server 15
SUSE Linux Enterprise Server 12 SP 5
SUSE Linux Enterprise Server 12 SP 4
SUSE Linux Enterprise Server 12 SP 3
Situation
Resolution
systemctl restart kdump.service
Cause
The kdump reload command (`kexec -p`) parses through device-tree nodes to get the CPU topology. Some of these nodes are removed when CPUs are hot-removed.
A race condition between kdump reload & CPUs hot-removal can cause kdump load to fail so a subsequent crash would not capture the dump.
When memory is hot-removed/hot-added, the udev rule to reload capture kernel is triggered while device-tree is being updated.
At times, it results in capture kernel looking at stale memory mapping and failing to save its dump to disk.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:7023750
- Creation Date: 27-Feb-2019
- Modified Date:19-May-2021
-
- SUSE Linux Enterprise Server
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com