kexec issues during SLES 15 migration in Azure

This document (000019733) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server for SAP Applications
SUSE Linux Enterprise Server
Microsoft Azure Virtual Machines

Situation

When performing a distribution update of Azure Virtual Machine from SLES 12 SP4/SP5, the VM may experience soft lockups or Oops post-migration, similar to the following:
[    0.060015] BUG: unable to handle kernel paging request at ffffffffffffffd0
[    0.064000] IP: alloc_vmap_area+0x1bd/0x340
[    0.064000] PGD 39e00e067 P4D 39e00e067 PUD 39e010067 PMD 0
[    0.064000] Oops: 0000 [#1] SMP PTI
[    0.064000] CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.12.14-197.56-default #1 SLE15-SP1
[    0.064000] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
[    0.064000] task: ffff94b6c6224080 task.stack: ffffae17018a8000
[    0.064000] RIP: 0010:alloc_vmap_area+0x1bd/0x340
[    0.064000] RSP: 0000:ffffae17018abc58 EFLAGS: 00010207
[    0.064000] RAX: 0000000000000000 RBX: 0000000000005000 RCX: 0000000000000000
[    0.064000] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000005000
[    0.064000] RBP: ffffae1700000000 R08: 0000000000000001 R09: 0000000000000000
[    0.064000] R10: ffffce16ffffffff R11: 00000000000280c0 R12: 0000000000004000
[    0.064000] R13: ffffae1700000000 R14: ffffffffffffc000 R15: 0000000000003fff
[    0.064000] FS:  0000000000000000(0000) GS:ffff94b9ffc00000(0000) knlGS:0000000000000000
[    0.064000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.064000] CR2: ffffffffffffffd0 CR3: 000000039e00a001 CR4: 00000000001606f0
[    0.064000] Call Trace:
[    0.064000]  __get_vm_area_node+0xb0/0x130
[    0.064000]  __vmalloc_node_range+0x68/0x290
[    0.064000]  ? _do_fork+0xbd/0x360
[    0.064000]  copy_process.part.38+0x6db/0x1c10
...
[  216.032004] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x1 nice=0 stuck for 215s!
[  216.036003] Showing busy workqueues and worker pools:
[  216.040003] workqueue events: flags=0x0
[  216.044002]   pwq 0: cpus=0 node=0 flags=0x1 nice=0 active=1/256 refcnt=2
[  216.048000]     pending: vmstat_shepherd
or
[   28.016036] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:1]
[   28.016037] Modules linked in:
[   28.016038] Supported: Yes
[   28.016041] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.14-197.56-default #1 SLE15-SP1
[   28.016042] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 04/02/2020
[   28.016043] task: ffff8e5fc6248040 task.stack: ffffa92a018c4000
[   28.016048] RIP: 0010:cfb_imageblit+0x478/0x4e0
[   28.016049] RSP: 0018:ffffa92a018c7840 EFLAGS: 00010246 ORIG_RAX: ffffffffffffff13
[   28.016050] RAX: 0000000000000000 RBX: ffffffff90c9cca0 RCX: 0000000000000002
[   28.016050] RDX: ffffa92a022f6d74 RSI: ffff8e62ea51236b RDI: 0000000000000000
[   28.016051] RBP: ffffa92a022f6d78 R08: 0000000000000001 R09: 0000000000aaaaaa
[   28.016051] R10: 0000000000000001 R11: 0000000000000000 R12: ffffa92a022f7000
[   28.016052] R13: 0000000000001000 R14: ffffa92a022f6000 R15: ffff8e62ea512300
[   28.016052] FS:  0000000000000000(0000) GS:ffff8e62ffc00000(0000) knlGS:0000000000000000
[   28.016053] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   28.016053] CR2: 00007f14be093aa4 CR3: 000000003d00a001 CR4: 00000000003606f0
[   28.016055] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   28.016056] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   28.016056] Call Trace:
[   28.016061]  bit_putcs+0x2bd/0x4b0

Resolution

The VM can be rebooted from within Azure.  This completes the migration. Continue with the following documentation "After the Migration":
https://documentation.suse.com/suse-distribution-migration-system/1.0/single-html/distribution-migration-system/#_after_the_migration

Additional migrations should use the following procedure to completely avoid this issue. Before starting the migration, run the following command:

echo "soft_reboot: false" >> /etc/sle-migration-service.yml

Cause

The cause is due to the kexec utility.  kexec enables the loading and booting into another kernel from the currently running kernel.  This situation occurs when the migration is complete and when the distribution migration process utilizes kexec to boot into the SLES 15 kernel.  The kexec invocation has shown to inconsistently cause issues in Azure.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000019733
  • Creation Date: 14-Oct-2020
  • Modified Date:15-Oct-2020
    • SUSE Linux Enterprise Server
    • SUSE Linux Enterprise Server for SAP Applications

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center