My Favorites

Close

Please to see your favorites.


Corruption on ASM disks during device-mapper map reloads

This document (7011313) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 11 Service Pack 1

Situation

Sporadic Oracle database corruption was being encountered in a SLES11 SP1 environment. This corruption was only seen on multipath ASM disks that were being accessed through Oracle's ASMLib library. (All ASM disks were being accessed through 'ORCL:*' devices.)

The database corruption was detected through a verify process. This verify process can report "fractured blocks", or general corruption. The trace files showing the corruption appeared similar to the following:

Hex dump of (file 6, block 370913)
Dump of memory from 0x00007F50F2ADA000 to 0x00007F50F2AE2000
7F50F2ADA000 0000E200 0005A921 00000000 05010000  [....!...........]
7F50F2ADA010 00004E24 00000000 00000000 00000000  [$N..............]
7F50F2ADA020 00000000 00000000 00000000 00000000  [................]
        Repeat 2044 times
7F50F2AE1FF0 00000000 00000000 00000000 00000001  [................]
Corrupt block relative dba: 0x0185a8e1 (file 6, block 370913)
Bad header found during validation
Data in bad block:
 type: 0 format: 2 rdba: 0x0005a921
 last change scn: 0x0000.00000000 seq: 0x1 flg: 0x05
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x00000001
 check value in block header: 0x4e24
 computed block checksum: 0x0
Reread of blocknum=370913, file=+DATA/orcl/datafile/app_test_index02_ts.266.797206559. found same corrupt data
Reread of blocknum=370913, file=+DATA/orcl/datafile/app_test_index02_ts.266.797206559. found same corrupt data
Reread of blocknum=370913, file=+DATA/orcl/datafile/app_test_index02_ts.266.797206559. found same corrupt data
Reread of blocknum=370913, file=+DATA/orcl/datafile/app_test_index02_ts.266.797206559. found same corrupt data
Reread of blocknum=370913, file=+DATA/orcl/datafile/app_test_index02_ts.266.797206559. found same corrupt data

Resolution

This problem is resolved in SLES11 SP2. As SLES11 SP1 is out of general support, the recommended resolution is to upgrade to SP2. Customers with a long term support contract (LTSS) can obtain an updated SLES11 SP1 kernel RPM, which contains the fix for this issue, through the LTSS patch channel.

This issue can be worked around by bypassing ASMLib using the Oracle parameter of 'asm_diskstring=/dev/oracleasm/disks/*'. Removing the ASMLib software will also resolve the corruption risk.

Cause

The corruption was tracked down to a problem at the device-mapper layer - which is only exposed through the ASMLib IO path. During device-mapper map reloading, there is a small window where a NULL page can be returned to the caller. This case is handled during normal IO paths, but the IO path that ASMLib uses can result in an empty page being written to disk.

The resolution to this issue is a lock which ensures the device-mapper map is swapped out atomically. This ensure all IO is effectively suspended during the device-mapper operation. This lock is already included in the shipping SLES11 SP2 device-mapper code.

Disclaimer

This Support Knowledgebase provides a valuable tool for NetIQ/Novell/SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7011313
  • Creation Date:05-NOV-12
  • Modified Date:05-NOV-12
    • SUSESUSE Linux Enterprise Server

Did this document solve your problem? Provide Feedback