Crashkernel=512M@128M Set on the Dom0, Causes Xen HVM Guest to Crash on Startup

This document (7017624) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 11 Service Pack 4 (SLES 11 SP4)
Xen 4.4.3

Situation

If the Crashkernel=512M@128M was set on the dom0, the HVM guest would crash on startup. The HVM guest start failed with a call trace about xen_balloon module. If we configure "currentMemory"="memory" in "libvirt.xml", vm guest boots normally. If we change "crashkernel=512M@128M" to "crashkernel=256M@128M", the vm guest boots successfully too.

SLES 11 SP4 fully patched running XEN and running a SLES11SP4 hvm guest. When we start this hvm guest, it will crash during boot with this message:

[    3.328784] xen_mem: Initialising balloon driver.
[    3.350533] Initialising virtual ethernet driver.
[    7.225896] emc: device handler registered
[    8.173649] ------------[ cut here ]------------
[    8.173649] kernel BUG at /usr/src/packages/BUILD/xen-4.4.2-testing/obj/default/balloon/balloon.c:407!
[    8.173649] invalid opcode: 0000 [#1] SMP
[    8.173649] CPU 0
[    8.173649] Modules linked in: scsi_dh_emc scsi_dh xen_vnif xen_balloon ata_generic ata_piix libata scsi_mod xen_vbd xen_platform_pci
[    8.173649] Supported: Yes
[    8.173649]
[    8.173649] Pid: 11, comm: kworker/0:1 Not tainted 3.0.101-63-default #1 Xen HVM domU
[    8.173649] RIP: 0010:[<ffffffffa0076604>]  [<ffffffffa0076604>] decrease_reservation+0x194/0x1a0 [xen_balloon]
[    8.173649] RSP: 0018:ffff880108bc5dc0  EFLAGS: 00010083
[    8.173649] RAX: 00000000000001b5 RBX: 0000000000000200 RCX: 0000000000000000
[    8.173649] RDX: 0000000000000100 RSI: ffff880108bc5dc0 RDI: 0000000000002801
[    8.173649] RBP: 0000000000000200 R08: 00000000000139d0 R09: 00000000000139d0
[    8.173649] R10: 0000000000000000 R11: 0000000000000000 R12: ffffea0000000000
[    8.173649] R13: 0000000000000000 R14: 0000000000000246 R15: ffffffffa0076610
[    8.173649] FS:  0000000000000000(0000) GS:ffff88010fc00000(0000) knlGS:0000000000000000
[    8.173649] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    8.173649] CR2: 00007f1e4a9eaae0 CR3: 0000000001a09000 CR4: 00000000000006f0
[    8.173649] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    8.173649] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    8.173649] Process kworker/0:1 (pid: 11, threadinfo ffff880108bc4000, task ffff880108bc22c0)
[    8.173649] Stack:
[    8.173649]  ffffffffa0078580 0000000000000200 0000000000000000 0000000000007ff0
[    8.173649]  fffffffffffc2000 0000000000000000 ffff880108bc4010 ffff88010fc0c700
[    8.173649]  ffff88010fc13c05 ffffffffa007670d 0000000000000000 ffffffffa0078000
[    8.173649] Call Trace:
[    8.173649]  [<ffffffffa007670d>] balloon_process+0xfd/0x110 [xen_balloon]
[    8.173649]  [<ffffffff8107d39c>] process_one_work+0x16c/0x350
[    8.173649]  [<ffffffff810800ca>] worker_thread+0x17a/0x410
[    8.173649]  [<ffffffff81084496>] kthread+0x96/0xa0
[    8.173649]  [<ffffffff81470564>] kernel_thread_helper+0x4/0x10
However, this problem only happens if: maxmem is higher than the memory for DomU configuration:
# xm list -l sles11 |grep mem
(maxmem 4096)
(memory 2048)

AND

crashkernel parameter is set to crashkernel=512M@128M for Dom0
If I change one of those 2 conditions above, the HVM guest will load without any problems For example, making both memory and maxmem the same amount OR changing crash parameter to be crashkernel=256M@16M
My server has 12GB memory and dom0_mem=2048M parameter is set. I disabled ballooning (enable-dom0-ballooning no). Customer is having the same issue with a server with 128GB memory. I also noticed if we remove the offset parameter from crashkernel hvm will load too. However, kdump process won't load. Looks like offset is required for Xen kernel.



Resolution

There is a PTF:
https://ptf.suse.com/a36c11ebc5300def75dd81c34eed2245/sles11-sp4/10857/x86_64/20160517

Also HAP can be enabled. This also fixes the problem. Usually HAP is enabled by default on newer versions of xen. HAP can be enabled/disabled by specifying hap=0 or 1 in the /etc/xen/vm/<guest> config file.
Hardware CPU however, must support HAP

Cause


Additional Information

HAP stands for hardware assisted paging and requires a CPU feature called EPT by Intel and RVI by AMD. It is used to manage the guest's MMU. The alternative is shadow paging, completely managed in software by Xen. On HAP TLB misses are expensive so if you have really random access, HAP will be expensive. On shadow page table updates are expensive. HAP is enabled by default (and it is the recommended setting) but can be disabled/enabled by passing hap=0 or 1 in the guest VM config file. Usually this file is in /etc/xen/vm/<guest> but can be in different locations depending on how the guest was installed. This setting is for HVM (fully virtualized guests).

HAP (Hardware Assisted Paging) can be optionally used to boost the performance of Xen memory management for HVM VMs. HAP is an additional feature of the CPU, and it's not present on older CPUs. Intel HAP is called Intel EPT (Extended Page Tables) and AMD HAP is called AMD NPT (Nested Page Tables). AMD NPT is sometimes also referred as AMD RVI (Rapid Virtualization Indexing).

How to check if your CPU supports HAP:

 "xl dmesg" to verify if HAP is supported on your CPU:

"(XEN) HVM: Hardware Assisted Paging detected and enabled" or a similar message  such as: "(XEN) HVM: Hardware Assisted Paging (HAP) detected (XEN) HVM: HAP page sizes: 4kB, 2MB"
 





Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7017624
  • Creation Date: 20-May-2016
  • Modified Date:03-Mar-2020
    • SUSE Linux Enterprise Server

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center