Azure virtual machine hang after patching to kernel 4.4.120-94.17.1

This document (7022818) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server for SAP Applications Service Pack 3
SUSE Linux Enterprise Server Service Pack 3
SUSE Linux Enterprise Server for Azure

Situation

After upgrading an Azure virtual machine to kernel 4.4.120-94.17.1, the VM will experience a hang at boot, with messages similar to the following:

[   36.220002] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:1127]
[   36.224048] Modules linked in: mlx4_core(+) pci_hyperv(X) sb_edac edac_core crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drbg ansi_cprng aesni_intel aes_x86_64 lrw gf128mul glue_helper hv_utils(X) hv_balloon(X) ablk_helper fjes hyperv_fb(X) hv_netvsc(X) cryptd ptp pcspkr pps_core i2c_piix4 processor button joydev ext4 crc16 jbd2 mbcache sr_mod cdrom ata_generic sd_mod hid_generic hyperv_keyboard(X) hv_storvsc(X) hid_hyperv(X) scsi_transport_fc ata_piix ahci libahci hv_vmbus(X) floppy libata serio_raw sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
[   36.280042] Supported: Yes, External
[   36.284035] CPU: 0 PID: 1127 Comm: modprobe Tainted: G                 X 4.4.120-94.17-default #1
[   36.292036] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
[   36.300036] task: ffff8807c0ac89c0 ti: ffff8807b14ec000 task.ti: ffff8807b14ec000
[   36.308044] RIP: 0010:[<ffffffff813346e6>]  [<ffffffff813346e6>] delay_tsc+0x26/0x50
[   36.312049] RSP: 0018:ffff8807b14ef888  EFLAGS: 00000293
[   36.316056] RAX: 0000000000000000 RBX: ffff8807b1b55640 RCX: 0000001a29d12bb3
[   36.324043] RDX: 0000001a29d40657 RSI: 0000000000000000 RDI: 000000000003a0ee
[   36.328044] RBP: ffff8807b14ef958 R08: 000000000000000c R09: 0000000000003000
[   36.336048] R10: 0000000000000002 R11: 00000000ffffffa2 R12: ffff8807bf024220
[   36.340036] R13: ffff8807b14ef974 R14: ffff8807b1824380 R15: ffff8807ac874000
[   36.348042] FS:  00007f9c96d01700(0000) GS:ffff8807c1600000(0000) knlGS:0000000000000000
[   36.356283] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   36.360045] CR2: 00000000010e9748 CR3: 00000007c09dc000 CR4: 0000000000140670
[   36.364036] Stack:
[   36.368036]  ffffffffa037e241 ffff8807b1824660 0000000000000000 ffffffff00000000
[   36.376036]  ffff8807b14ef8a8 ffff8807b14ef8a8 ffffffff810ddd31 0000000000000246
[   36.380035]  ffff8807ac8740e8 ffffffffa037d120 ffff8807b14ef898 0000000242490017
[   36.388035] Call Trace:
[   36.388035]  [<ffffffffa037e241>] hv_compose_msi_msg+0x1c1/0x300 [pci_hyperv]
[   36.396041]  [<ffffffff810de077>] irq_chip_compose_msi_msg+0x47/0x60
[   36.400042]  [<ffffffff810e234a>] msi_domain_activate+0x1a/0x40
[   36.408044]  [<ffffffff810e27e2>] msi_domain_alloc_irqs+0x122/0x1d0
[   36.412043]  [<ffffffff8138b942>] __pci_enable_msix+0x422/0x4b0
[   36.416043]  [<ffffffff8138ba13>] pci_enable_msix_range+0x33/0x60
[   36.424047]  [<ffffffffa042c3c0>] mlx4_enable_msi_x+0x160/0x3d0 [mlx4_core]
[   36.428037]  [<ffffffffa042e4d8>] mlx4_load_one+0x938/0x11f0 [mlx4_core]
[   36.436051]  [<ffffffffa042f3a5>] mlx4_init_one+0x4f5/0x6b0 [mlx4_core]
[   36.440036]  [<ffffffff81372614>] local_pci_probe+0x44/0xa0
[   36.444047]  [<ffffffff81373aa4>] pci_device_probe+0xd4/0x120
[   36.448045]  [<ffffffff81474650>] driver_probe_device+0x200/0x420
[   36.452045]  [<ffffffff814748ee>] __driver_attach+0x7e/0x80
[   36.456263]  [<ffffffff8147254a>] bus_for_each_dev+0x5a/0x90
[   36.464046]  [<ffffffff81473aa0>] bus_add_driver+0x1c0/0x280
[   36.468045]  [<ffffffff8147527b>] driver_register+0x5b/0xd0
[   36.472377]  [<ffffffffa030911a>] mlx4_init+0x11a/0x1000 [mlx4_core]
[   36.476044]  [<ffffffff8100213a>] do_one_initcall+0xca/0x1f0
[   36.484044]  [<ffffffff81191896>] do_init_module+0x5a/0x1d7
[   36.488044]  [<ffffffff81110a92>] load_module+0x1382/0x1c70
[   36.492037]  [<ffffffff81111530>] SYSC_finit_module+0x70/0xa0
[   36.496042]  [<ffffffff81615f05>] entry_SYSCALL_64_fastpath+0x1e/0xb6
[   36.504041] DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xb6
[   36.508153] 
[   36.512037] Leftover inexact backtrace:
[   36.512037] 
[   36.516042] Code: 00 00 00 00 00 0f 1f 44 00 00 65 8b 35 44 8a cd 7e 0f ae e8 0f 31 48 89 d1 48 c1 e1 20 48 09 c1 eb 0d f3 90 65 8b 05 2a 8a cd 7e <39> c6 75 18 0f ae e8 0f 31 48 c1 e2 20 48 09 c2 48 89 d0 48 29 

Resolution

To recover a hanging Azure Virtual Machine, perform the following:

  1. Power off VM
  2. Add and attach a new NIC to the Azure VM with accelerated networking disabled
  3. Detach old NIC from Azure VM which had accelerated networking enabled
  4. Boot VM
  5. Upgrade to kernel version >= 4.4.126-94.22.1
    1. zypper upgrade kernel-default-4.4.126-94.22.1
  6. halt VM
  7. Detach NIC created in step #1
  8. Reattach NIC with accelerated networking enabled in step #2
  9. Boot VM


Cause

This occurs on Azure virtual machines which have SR-IOV functionality enabled for the NIC (accelerated networking).

Additional Information


Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7022818
  • Creation Date: 05-Apr-2018
  • Modified Date:03-Mar-2020
    • SUSE Linux Enterprise Server
    • SUSE Linux Enterprise Server for SAP Applications

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center