multipath/lvm system drops into Emergency mode

This document (7023336) is provided subject to the disclaimer at the end of this document.

Environment


SUSE Linux Enterprise Server 12 Service Pack 3 (SLES 12 SP3)
SUSE Linux Enterprise Server for SAP Applications Service Pack 3


Situation

This problem occurs on systems using Logical Volume Management (LVM) on top of device-mapper multipath devices.
Even though multipath is enabled and set up correctly
(See: SLES Storage Administration Guide - Chapter 17.5 Configuring the System for Multipathing),
some LVM physical volumes (PVs) aren’t mapped to multipath devices but to the low-level storage devices.
Error messages about “duplicate PV” are logged by LVM.
Sometimes, the system boots into Emergency mode on (re)boot.

The problem happens only, when specific timing characteristics and a specific system/setup are present.
It is not a common issue.


Symptoms:

The 'pvs', 'lvs' or 'pvscan' output shows "duplicate PV" entries and single path devices rather than multipath entries.
# pvs
  Found duplicate PV Ed9ecmOvaE2LrL3rQLC8ducCJmjQsfqn: using /dev/sdc2 not /dev/sda2
  Using duplicate PV /dev/sdc2 without holders, replacing /dev/sda2
  Found duplicate PV Ed9ecmOvaE2LrL3rQLC8ducCJmjQsfqn: using /dev/sdc2 not /dev/sda2
  Using duplicate PV /dev/sdc2 without holders, ignoring /dev/sda2
  PV         VG     Fmt  Attr PSize  PFree
  /dev/sdc2  system lvm2 a--  99.99g 4.00m


Sometimes, an additional message like: "WARNING: duplicate PV X is being used from both devices Y and Z” are logged, which indicate a fatal error condition which is likely to lead to emergency mode:
WARNING: duplicate PV OFysu7oHJ4x0vnwwDjYCzC2qnyPitcwc is being used from both devices
/dev/sdaz2 and /dev/sdz2 Found duplicate PV Ed9ecmOvaE2LrL3rQLC8ducCJmjQsfqn: using /dev/sdc2 not /dev/sda2

Emergency mode occurs in this scenario after switching root from initramfs to the root file system, because one or more PVs couldn't be activated.
When in Emergency mode, the "pvs" or "lsblk" commands show some PVs mapped to multipath devices (/dev/mapper/....) and some to low-level devices (e.g./dev/sdX). The system log contains messages like:
kernel: device-mapper: table: 254:28: multipath: error getting device

Note: There are other, similar failure scenarios. See e.g. Support TID # 7023205: multipath/lvm boot issues with few specific storage devices - "failed to read sysfs vpd pg80"

Characteristics of the problem discussed in this TID are as follows:
  • it occurs with LVM over multipath
  • Emergency mode occurs after switching root
  • most multipath devices are correctly set up, just a few are not; typically these devices correspond to PVs that provide storage for the root file system
  • messages like the ones listed above are logged

Resolution

The following workaround needs to be applied.

First ensure multipath is enabled (systemctl enable multipathd) and multipath is present in the initrd:
# lsinitrd /boot/initrd| grep multipathd.service
  lrwxrwxrwx   1 root     root           21 Jun 20 12:05 usr/lib/systemd/system/sysinit.target.wants/multipathd.service -> ../multipathd.service

Workaround: Add the boot parameter "rd.lvm.conf=0"

Edit /etc/default/grub and add "rd.lvm.conf=0" to the GRUB_CMDLINE_LINUX_DEFAULT line.
After making the change, run 'grub2-mkconfig -o /boot/grub2/grub.cfg' to update /boot/grub2/grub.cfg

Note: This workaround causes any customization to the LVM configuration (/etc/lvm/lvm.conf) to be ignored while the initial RAM disk is processed. However the configuration would be applied after switching to the root file system.

Note I:
The workaround won't be needed in later SLES versions (SLES12 SP4 and SLE15). The LVM2 versions contain the following change: lvmetad: use udev to ignore multipath components during scan.
With this change and lvmetad running, LVM2 will use udev to check whether a device is a mpath component.

Cause

The issue was caused by a race condition during initrd processing between the multipath map setup and the LVM physical volume detection.
It can happen that LVM mistakenly grabs a low-level device rather than waiting for the corresponding multipath device to be set up.
It depends on the timing of device detection during boot whether or not this causes errors in the multipath setup.
The LVM configuration directive called "multipath_component_detection" is unfortunately ineffective in a situation like the one encountered.

One system property increases the likelihood of this race:
- the root filesystem LUN is detected late. (e.g. it's /dev/sdz rather than /dev/sda).

The parameter "rd.lvm.conf=0" avoids this race condition by forcing LVM not to look at any SCSI devices (only multipath devices).

Additional Information


Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7023336
  • Creation Date: 06-Sep-2018
  • Modified Date:03-Mar-2020
    • SUSE Linux Enterprise Server
    • SUSE Linux Enterprise Server for SAP Applications

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center