In this and the next several blog entries I will explain SUSE YES Certification testing under the covers: specific tests in SUSE YES Certification which are not fully documented on the certification bulletin. This discussion covers the capture of Kdump memory images and how hardware is validated for crashkernel capture functionality during certification. Through YES Certification the SUSE kernel crash mechanism is validated to capture a local core dump image (also known as vmcore) on the certified server or workstations hard drive and file system.
First, a quick overview of what Kdump is and how it works. It can be used to save kernel dumps, thus preserving the state of the system configuration in a core dump file. Some organizations use this for system backup purposes. If the kernel crashes or panics, Kdump can be useful to copy and save the memory image of the crashed environment. Then the vmcore file can be analyzed or debugged to hopefully determine why the kernel crashed. When Kdump is triggered (either manually or automatically during a kernel crash), SUSE Linux boots a small “capture kernel” which then saves the memory image of the production kernel that was running at the time the core dump was invoked. Here’s how this works: when Kdump is set up the production kernel reserves some memory during load time for the capture kernel or crashkernel to run in when (or if) Kdump is triggered. This is where YES Certification validates the operation of Kdump on a specific hardware platform, with that hardware’s unique memory allocation (which includes the physical location of the reserved RAM) and configuration.
When Kdump is installed and set up on a system, the install process in SUSE Linux Enterprise uses various algorithms to determine the needed crashkernel memory size and offset. On a given system, due to differences in hardware, the determined memory size may not be sufficient to start the capture kernel and save the vmcore image. Some systems have “memory holes” where the manufacturer has blocked out memory for some other purpose, and Kdump will not work because of the lack of continuity in memory when it attempts to reserve the memory block. The number of network adapters in a system can cause the reserved crashkernel memory to be too low as well. On some systems the default size might be too large, which also causes the Kdump to fail. (In addition to a possible failure when the crashkernel memory is too large, there is also no need to waste that additional system memory). Increasing or decreasing the crashkernel memory size can fix problems with memory continuity, number of network adapters and other memory limitations. During SUSE YES Certification these configuration issues should be found, and, if necessary, the solution or workaround to make Kdump work will be documented in a configuration note on the certification bulletin.
Hardware/Firmware Can Affect Kdump
There are also system components or BIOS/uEFI settings that can affect a successful Kdump image capture. One component that can cause problems is the storage adapter and storage driver combination. When the capture kernel starts up, it must have access to the system storage and file system, and the storage adapter/driver can have compatibility issues with this capture kernel. System ACPI (Advanced Configuration and Power Interface) settings and BIOS/uEFI settings can also cause the Kdump process to fail on a given system. YES Certification should find and document these problems.
Newer OS, Better Kdump Auto-Configuration
In SUSE Linux Enterprise 11 the crashkernel allocated memory is configured with “best practice” amounts; tweaking is a regular practice on many systems. In SUSE Linux Enterprise 12 SUSE development has extensively advanced the technological methods used to determine these crashkernel memory amounts. This has resulted in huge Kdump improvements in the newer operating system. Even with these vast improvements some hardware still requires system tweaks to successfully use Kdump on a specific server or workstation. YES Certification validates and documents these requirements in the certification bulletin, making you more successful as an IT professional! You can search for your specific hardware at: https://www.suse.com/yessearch/.
I hope this gives you a better appreciation for the information that exists when you use YES Certification to help you buy SUSE-compatible hardware. You can find more information about SUSE YES Certification at https://www.suse.com/partners/ihv/yes/. Stay tuned for future blog topics about SUSE YES Certification testing under the covers. For other YES Certification topics, check out my other blogs.