HP Superdome X with a high number of LUNs fails to dump and renders an OOM message

This document (7017393) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 11 Service Pack 4 (SLES 11 SP4)

System is :
8 Blade Superdome X with a max IO card configuration and a very large LUN configuration (113 multipath LUNs, with a total of 370 LUN paths)
 

Situation

Kdump is configured with a crashkernel size set to 768M and udev.children-max=2 in kdump command line.
Attempts to take a crash dump of the system fail  on most occasions, with a few succeeding.
In case of a failure, dmesg output shows a message that looks like the following:

makedumpfile Completed.
-------------------------------------------------------------------------------
Saving dump using makedumpfile
-------------------------------------------------------------------------------
[  270.020790] alua: release port group 1
[  270.024945] sd 10:0:1:21: alua: Detached
[  270.420280] alua: release port group 1
[  270.424446] sd 10:0:1:25: alua: Detached
[  270.429811] makedumpfile invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0
[  270.439674] makedumpfile cpuset=/ mems_allowed=0
[  270.444765] Pid: 12312, comm: makedumpfile Not tainted 3.0.101-57-default #1
[  270.452525] Call Trace:
[  270.455261]  [<ffffffff81004b95>] dump_trace+0x75/0x300
[  270.461047]  [<ffffffff81464233>] dump_stack+0x69/0x6f
[  270.466741]  [<ffffffff810fe49e>] dump_header+0x8e/0x110
[  270.472616]  [<ffffffff810fe856>] oom_kill_process+0xa6/0x350
[  270.478969]  [<ffffffff810fedb7>] out_of_memory+0x2b7/0x310
[  270.485133]  [<ffffffff811047e5>] __alloc_pages_slowpath+0x7b5/0x7f0
[  270.492157]  [<ffffffff81104a09>] __alloc_pages_nodemask+0x1e9/0x200
[  270.499184]  [<ffffffff811407e0>] alloc_pages_vma+0xd0/0x1c0
[  270.505444]  [<ffffffff8111f24b>] do_anonymous_page+0x13b/0x300
[  270.511995]  [<ffffffff8146ae3d>] do_page_fault+0x1fd/0x4c0
[  270.518158]  [<ffffffff81467a45>] page_fault+0x25/0x30
[  270.523858]  [<00007f21d707283d>] 0x7f21d707283c
[  270.528944] Mem-Info:
[  270.531455] Node 0 DMA per-cpu:
[  270.534956] CPU    0: hi:    0, btch:   1 usd:   0
[  270.540236] CPU    1: hi:    0, btch:   1 usd:   0
[  270.545514] CPU    2: hi:    0, btch:   1 usd:   0
[  270.550793] CPU    3: hi:    0, btch:   1 usd:   0
[  270.556075] Node 0 DMA32 per-cpu:
[  270.559763] CPU    0: hi:  186, btch:  31 usd: 142
[  270.565042] CPU    1: hi:  186, btch:  31 usd: 161
[  270.570319] CPU    2: hi:  186, btch:  31 usd:  98
[  270.575599] CPU    3: hi:  186, btch:  31 usd:  60
[  270.580882] active_anon:13242 inactive_anon:243 isolated_anon:0
[  270.580883]  active_file:17 inactive_file:0 isolated_file:0
[  270.580883]  unevictable:15006 dirty:0 writeback:0 unstable:0
[  270.580884]  free:9754 slab_reclaimable:3253 slab_unreclaimable:71872
[  270.580885]  mapped:1536 shmem:869 pagetables:164 bounce:0
[  270.612993] Node 0 DMA free:484kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB 
active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:260kB
mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB
 kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? yes [ 270.652731] lowmem_reserve[]: 0 755 755 755 [ 270.657498] Node 0 DMA32 free:38532kB min:38412kB low:48012kB high:57616kB active_anon:52968kB
inactive_anon:972kB active_file:68kB inactive_file:0kB unevictable:60024kB isolated(anon):0kB
isolated(file):0kB present:773568kB mlocked:15676kB dirty:0kB writeback:0kB mapped:6144kB shmem:3476kB
slab_reclaimable:13012kB slab_unreclaimable:287488kB kernel_stack:6600kB pagetables:656kB unstable:0kB
 bounce:0kB writeback_tmp:0kB pages_scanned:19 all_unreclaimable? no [ 270.702099] lowmem_reserve[]: 0 0 0 0 [ 270.706301] Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB
0*4096kB = 484kB [ 270.718196] Node 0 DMA32: 1625*4kB 710*8kB 288*16kB 123*32kB 80*64kB 57*128kB 13*256kB 0*512kB
0*1024kB 1*2048kB 0*4096kB = 38516kB [ 270.731611] 9210 total pagecache pages [ 270.735744] 0 pages in swap cache [ 270.739402] Swap cache stats: add 0, delete 0, find 0/0 [ 270.745159] Free swap = 0kB [ 270.748338] Total swap = 0kB [ 270.751516] 193457 pages RAM [ 270.754695] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name [ 270.762841] [ 256] 0 256 44991 5133 3 -17 -1000 multipathd [ 270.771554] [ 259] 0 259 3453 1025 0 -17 -1000 udevd [ 270.779792] [ 265] 0 265 2614 197 0 -17 -1000 udevd [ 270.788027] [ 268] 0 268 2769 344 0 -17 -1000 udevd [ 270.796266] [ 604] 0 604 4658 2163 0 0 0 blogd [ 270.804521] [12309] 0 12309 15060 826 1 0 0 kdumptool [ 270.813136] [12312] 0 12312 18089 11304 3 0 0 makedumpfile [ 270.822038] Out of memory: Kill process 12312 (makedumpfile) score 33 or sacrifice child [ 270.830941] Killed process 12312 (makedumpfile) total-vm:72356kB, anon-rss:44356kB, file-rss:860kB

Resolution

This issue has not yet been fixed.
However the following changes have been tested successfully in the above described environment:

- increase the crashkernel size to 832M
- use udev.children-max=2 in kdump command line
- add a multipath.conf (if not existent) that blacklists over half the LUNs; 
e.g. from 113 LUNs with 370 paths to 40 LUNs with 88 paths. 
- rebuild initrd dump

It is nevertheless recommended to contact SUSE Technical Support when facing such an issue. 


Cause

This issue is still being investigated.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7017393
  • Creation Date: 21-Mar-2016
  • Modified Date:03-Mar-2020
    • SUSE Linux Enterprise Server

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback@suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center