System crash during Oracle Grid installation

This document (000019762) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 15 SP2
 

Situation

The system crashes while tying to install Oracle Grid, initially this seems related to kfod tool (from Oracle GI) when ASMLib is enabled.
crash> bt
PID: 19424  TASK: ffff925e2ce04d00  CPU: 1   COMMAND: "kfod.bin"
 #0 [ffffa3840ff23a70] machine_kexec at ffffffffac86d6b1
 #1 [ffffa3840ff23ac8] __crash_kexec at ffffffffac9531a5
 #2 [ffffa3840ff23b90] crash_kexec at ffffffffac953fbd
 #3 [ffffa3840ff23ba8] oops_end at ffffffffac8354df
 #4 [ffffa3840ff23bc8] no_context at ffffffffac87de6f
 #5 [ffffa3840ff23c30] do_page_fault at ffffffffac87f0a0
 #6 [ffffa3840ff23c60] page_fault at ffffffffad20122e
    [exception RIP: lock_timer_base+0x4e]
    RIP: ffffffffac92d7ae  RSP: ffffa3840ff23d18  RFLAGS: 00010246
    RAX: 000000000001da80  RBX: 000000000ff23da9  RCX: 0000000000000000
    RDX: 0000000000023da9  RSI: ffffa3840ff23d50  RDI: ffffa3840ff23da8
    RBP: ffffa3840ff23da8   R8: 0000000000000000   R9: 0000000000000000
    R10: ffffa3840ff23ed0  R11: 0000000000000000  R12: ffffffffad9a1980
    R13: 000000000001da80  R14: ffffa3840ff23d50  R15: ffffa3840ff23d98
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffa3840ff23d48] try_to_del_timer_sync at ffffffffac92ea76
 #8 [ffffa3840ff23d70] del_timer_sync at ffffffffac92eb11
 #9 [ffffa3840ff23d80] asm_do_io at ffffffffc072d779 [oracleasm]
#10 [ffffa3840ff23e28] asmfs_svc_io64 at ffffffffc072d8ff [oracleasm]
#11 [ffffa3840ff23ec8] vfs_read at ffffffffacacf6a9
#12 [ffffa3840ff23ef8] ksys_read at ffffffffacacfa31
#13 [ffffa3840ff23f38] do_syscall_64 at ffffffffac8052eb
#14 [ffffa3840ff23f50] entry_SYSCALL_64_after_hwframe at ffffffffad20008c
    RIP: 00007fd4a788ae61  RSP: 00007ffecfdb2aa8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00007ffecfdb2ad0  RCX: 00007fd4a788ae61
    RDX: 0000000000000050  RSI: 00007ffecfdb2ad0  RDI: 0000000000000007
    RBP: 00007ffecfdb2fe8   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000034  R11: 0000000000000246  R12: 00007fd4aea953e8
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000001b75d40
    ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b

crash> dis -rl ffffffffac92d7ae|tail
0xffffffffac92d795 <lock_timer_base+53>:        test   $0x40000,%ebx
0xffffffffac92d79b <lock_timer_base+59>:        jne    0xffffffffac92d790 <lock_timer_base+48> /usr/src/debug/kernel-default-5.3.18-24.12.1.x86_64/linux-5.3/linux-obj/../kernel/time/timer.c: 835
0xffffffffac92d79d <lock_timer_base+61>:        mov    %ebx,%edx
0xffffffffac92d79f <lock_timer_base+63>:        mov    %r13,%rax
0xffffffffac92d7a2 <lock_timer_base+66>:        and    $0x3ffff,%edx /usr/src/debug/kernel-default-5.3.18-24.12.1.x86_64/linux-5.3/linux-obj/../kernel/time/timer.c: 841
0xffffffffac92d7a8 <lock_timer_base+72>:        test   $0x80000,%ebx /usr/src/debug/kernel-default-5.3.18-24.12.1.x86_64/linux-5.3/linux-obj/../kernel/time/timer.c: 835
0xffffffffac92d7ae <lock_timer_base+78>:        mov    (%r12,%rdx,8),%rdx 
The failing instruction is on line 835:
# kernel/time/timer.c
 833 static inline struct timer_base *get_timer_cpu_base(u32 tflags, u32 cpu)
 834 {
 835         struct timer_base *base = per_cpu_ptr(&timer_bases[BASE_STD], cpu);
 836
 837         /*
 838          * If the timer is deferrable and NO_HZ_COMMON is set then we need
 839          * to use the deferrable base.
 840          */
 841         if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && (tflags & TIMER_DEFERRABLE))
 842                 base = per_cpu_ptr(&timer_bases[BASE_DEF], cpu);
 843         return base;
 844 }
The calls chain:
->lock_timer_base(timer, &flags)
  ->get_timer_base(tf)
    ->get_timer_cpu_base()
      ->per_cpu_ptr(&timer_bases[BASE_STD], cpu)
The timer_list in hand:
crash> timer_list ffffa3840ff23da8
struct timer_list {
  entry = {
    next = 0xffff925b6bdcee30,
    pprev = 0x0
  },
  expires = 18446623520082161200,
  function = 0xffffffffc0729e00,
  flags = 267533737
}
Which point to timeout_func() of oracleasm module:
crash> sym 0xffffffffc0729e00
ffffffffc0729e00 (t) timeout_func [oracleasm]

Resolution

The issue was in investigated by Oracle and SUSE engineering. As of May 2021 a fix can be requested via the PTF process

A potential workaround would be:
  1. Create the ASM Devices with oracleasm createdisk.
  2. Stop asmlib:
     systemctl stop oracleasm
  3. Change the Owner and Group of the Device:
     chown grid:asmadmin /dev/sda5
  4. Start the Grid setup:
    su grid; cd $ORACLE_HOME; ./gridsetup.sh
  5. At this point /dev/sda5 disk will be discovered, a disk group can be created and the ASM Instance installation can continue further. 
  6. After, a reboot of the machine is needed. The ASM-Instance should start without problems and at this point /dev/oracleasm/ASMDISK1 device can be used. 
  7. Kfod should be also run without any problems. 

Cause

Initially, the issue seems to be related to kfod tool (from Oracle GI) when ASMLib is enabled.

Status

Reported to Engineering

Additional Information

This issue can be encountered on SLES15 SP2 with oracleasmlib-2.0.13-1.sle15.x86_64.
 

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000019762
  • Creation Date: 22-Mar-2023
  • Modified Date:22-Mar-2023
    • SUSE Linux Enterprise Server

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center