SLES High-Performance Computing Module for SLES 12

Release Notes #

This document provides guidance and an overview to high-level general features and updates for the High-Performance Computing Module for SUSE Linux Enterprise Server 12. It describes the capabilities and limitations of the High-Performance Computing Module for SLES 12.

If you are skipping one or more releases, check the release notes of the skipped releases as well. Release notes usually only list changes that happened between two subsequent releases. If you are only reading the release notes of the current release, you could miss important changes.

General documentation can be found at: https://www.suse.com/documentation/.

Publication Date: 2019-02-26, Version: 12.20190220

1 SLES High-Performance Computing Module for SLES 12

2 Availability

3 Support and Life Cycle

4 Documentation and Other Information

5 How to Obtain Source Code

6 Support Statement SLES High-Performance Computing Module

7 Installation and Upgrade

7.1 Upgrade-Related Notes

8 Functionality

8.1 ConMan — The Console Manager
8.2 cpuid — x86 CPU Identification Tool
8.3 Ganglia — System Monitoring
8.4 hwloc — Portable Abstraction of Hierarchical Architectures for High-Performance Computing
8.5 memkind — Heap Manager for Heterogeneous Memory Platforms and Mixed Memory Policies
8.6 mrsh/mrlogin — Remote Login Using "munge" Authentication
8.7 pdsh — Parallel Remote Shell Program
8.8 ohpc — OpenHPC Compatibility Macros
8.9 PowerMan — Centralized Power Control for Clusters
8.10 rasdaemon — Utility to Log RAS Error Tracings
8.11 Slurm — Utility for HPC Workload Management
8.12 GNU Compiler Collection for HPC
8.13 Lmod — Lua-based Environment Modules
8.14 Support for Genders Static Cluster Configuration Database

9 HPC Libraries

9.1 FFTW HPC Library — Discrete Fourier Transforms
9.2 HDF5 HPC Library — Model, Library, File Format for Storing and Managing Data
9.3 HPC Library Package mpiP
9.4 NetCDF HPC Library — Implementation of Self-Describing Data Formats
9.5 NumPy Python Library
9.6 OpenBLAS Library — Optimized BLAS Library
9.7 PAPI HPC Library — Consistent Interface for Hardware Performance Counters
9.8 PETSc HPC Library — Solver for Partial Differential Equations
9.9 ScaLAPACK HPC Library — LAPACK Routines

10 Updated Packages

10.1 Support for Genders in pdsh
10.2 Lmod Has Been Updated to Version 7.6
10.3 Support for Intel Knights Mill CPUs in cpuid
10.4 pdsh Has Been Updated to Version 2.33
10.5 ConMan Has Been Updated to Version 0.2.8
10.6 Slurm Has Been Updated to Version 17.02.9
10.7 Slurm Has Been Updated to Version 17.02.10
10.8 Slurm Has Been Updated to Version 17.02.11

11 Legal Notices

1 SLES High-Performance Computing Module for SLES 12 #

The High-Performance Computing Module supplements SUSE Linux Enterprise Server 12. It provides tools and libraries related to High Performance Computing. Presently, the tools include:

Workload manager (Slurm)
Remote and parallel shells
Performance monitoring and measuring tools
Serial console monitoring tool
Cluster power management tool
Tool to discover the machine hardware topology
Tool to monitor memory errors
Tool to determine CPU model capabilities (x86-64 only)
User extensible heap manager capable of distinguishing between different kinds of memory (x86-64 only)

This document only describes features and procedures specific to this module. Make sure to also review the release notes for the base product, which is SUSE Linux Enterprise Server 12 SP2 or later versions of SUSE Linux Enterprise Server 12. The release notes for SUSE Linux Enterprise Server 12 SP2 are published at https://www.suse.com/releasenotes/x86_64/SUSE-SLES/12-SP2/.

2 Availability #

The High-Performance Computing Module for SUSE Linux Enterprise Server 12 can be installed on SUSE Linux Enterprise Server 12 SP2 and later. It is available to any registered user of SUSE Linux Enterprise 12 for the x86-64 and AArch64 platforms.

3 Support and Life Cycle #

The SLES High-Performance Computing Module is supported throughout the life cycle of SLE 12. Long Term Support Service is not available. Any release is fully maintained and supported until the availability of the next release.

For more information, see the Support Policy page https://www.suse.com/support/policy.html.

4 Documentation and Other Information #

Accessing the documentation on the product media:

Read the READMEs on the media.
Get the detailed change log information about a particular package from the RPM (where <FILENAME>.rpm is the name of the RPM):
```
rpm --changelog -qp <FILENAME>.rpm
```
Check the ChangeLog file in the top level of the media for a chronological log of all changes made to the updated packages.
These Release Notes are identical across all architectures, and the most recent version is always available online at https://www.suse.com/releasenotes/. Some entries may be listed twice, if they are important and belong to more than one section.

5 How to Obtain Source Code #

This SUSE product includes materials licensed to SUSE under the GNU General Public License (GPL). The GPL requires SUSE to provide the source code that corresponds to the GPL-licensed material. The source code is available for download at https://www.suse.com/download-linux/source-code.html.

Also, for up to three years after distribution of the SUSE product, upon request, SUSE will mail a copy of the source code. Requests should be sent by e-mail to mailto:sle_source_request@suse.com or as otherwise instructed at https://www.suse.com/download-linux/source-code.html. SUSE may charge a reasonable fee to recover distribution costs.

6 Support Statement SLES High-Performance Computing Module #

To receive support, you need an appropriate subscription with SUSE. For more information, see https://www.suse.com/products/server/services-and-support/.

The following definitions apply:

L1: Problem determination, which means technical support designed to provide compatibility information, usage support, ongoing maintenance, information gathering and basic troubleshooting using available documentation.
L2: Problem isolation, which means technical support designed to analyze data, reproduce customer problems, isolate problem area and provide a resolution for problems not resolved by Level 1 or alternatively prepare for Level 3.
L3: Problem resolution, which means technical support designed to resolve problems by engaging engineering to resolve product defects which have been identified by Level 2 Support.

For contracted customers and partners, the SLES High-Performance Computing Module for SLES 12 is delivered with L3 support for all packages, except the following:

Technology Previews
sound, graphics, fonts and artwork
packages that require an additional customer contract
development packages for libraries which are only delivered with L2 support

SUSE will only support the usage of original (that is, unchanged and un-recompiled) packages.

7 Installation and Upgrade #

To install packages from the High-Performance Computing Module:

Make sure that the High-Performance Computing Module is available for installation:
```
SUSEConnect --list-extensions | grep HPC
```
The output should be HPC Module 12 x86_64.
The High-Performance Computing Module can now be added to the repositories by calling:
```
SUSEConnect -p sle-module-hpc/12/x86_64
```
To verify that the repositories are correctly set up, run:
```
SUSEConnect --status-text
```
If the module is registered, it will be mentioned in the output.

Since different users may want to use different components from this module, there are presently no preselected packages which will be installed by default when this module is added.

7.1 Upgrade-Related Notes #

This section includes upgrade-related information for the High-Performance Computing Module for SLES 12.

7.1.1 Error on Migration From 12 SP2 to 12 SP3 When High-Performance Computing Module Is Selected #

When the High-Performance Computing Module is selected, the following error message may be encountered during migration from SLES 12 SP2 to SLES 12 SP3:

Can't get available migrations from server: SUSE::Connect::ApiError: The requested products '' are not activated on the system.
'/usr/lib/zypper/commands/zypper-migration' exited with status 1

The problem can be resolved by re-registering the High-Performance Computing Module using the following two commands:

rpm -e sle-module-hpc-release-POOL sle-module-hpc-release
SUSEConnect -p sle-module-hpc/12/x86_64

These commands can also be performed before migration as a preventive measure.

7.1.2 Upgrading to SLE 15 #

You can upgrade to SLE HPC 15 from SLES 12 SP3 or SLE HPC 12 SP3. When upgrading from SLES 12 SP3, the upgrade will only be performed if the SLES High-Performance Computing Module has been registered before starting the upgrade. Otherwise, the system will instead be upgraded to SLES 15.

8 Functionality #

This section comprises information about packages and their functionality, as well as additions, updates, removals and changes to the package layout of software.

8.1 ConMan — The Console Manager #

ConMan is a serial console management program designed to support a large number of console devices and simultaneous users. It supports:

local serial devices
remote terminal servers (via the telnet protocol)
IPMI Serial-Over-LAN (via FreeIPMI)
Unix domain sockets
external processes (for example, using 'expect' scripts for telnet, ssh, or ipmi-sol connections)

ConMan can be used for monitoring, logging and optionally timestamping console device output.

To install ConMan, run zypper in conman.

Important: conmand Sends Unencrypted Data

The daemon conmand sends unencrypted data over the network and its connections are not authenticated. Therefore, it should be used locally only: Listening to the port localhost. However, the IPMI console does offer encryption. This makes conman a good tool for monitoring a large number of such consoles.

Usage:

ConMan comes with a number of expect-scripts: check /usr/lib/conman/exec.
Input to conman is not echoed in interactive mode. This can be changed by entering the escape sequence &E.
When pressing Return in interactive mode, no line feed is generated. To generate a line feed, press Ctrl-L.

For more information about options, see the ConMan man page.

8.2 cpuid — x86 CPU Identification Tool #

cpuid executes the x86 CPUID instruction and decodes and prints the results to stdout. Its knowledge of Intel, AMD and Cyrix CPUs is fairly complete.

To install cpuid, run: zypper in cpuid.

For information about its options, see the man page cpuid.

Note that this tool is only available for x86-64.

8.3 Ganglia — System Monitoring #

Ganglia is a scalable distributed monitoring system for high-performance computing systems, such as clusters and grids. It is based on a hierarchical design targeted at federations of clusters.

To use Ganglia, make sure to install ganglia-gmetad on the management serve then start the Ganglia meta-daemon: rcgmetad start To make sure the service is started after a reboot, run: systemctl enable gmetad. On each cluster node which you want to monitor, install ganglia-gmond, start the service rcgmond start and make sure it is enabled to be started automatically after a reboot: systemctl enable gmond. To test whether the gmond daemon has connected to the meta-daemon, run gstat -a and check that each node to be monitored is present in the output.

When using the Btrfs file system, the monitoring data will be lost after a rollback and the service gmetad. To be able to start it again, either install the package ganglia-gmetad-skip-bcheck or create the file /etc/ganglia/no_btrfs_check.

To use the Ganglia Web interface, it is required to add the "Web and Scripting Module" first. This can be done by running SUSEConnect -p sle-module-web-scripting/12/x86_64. Install ganglia-web on the management server. Depending on which PHP version is used (default is PHP 5), enable it in Apache2: a2enmod php5 or a2enmod php7. Then start Apache2 on this machine: rcapache2 start and make sure it is started automatically after a reboot: systemctl enable apache2. The ganglia web interface should be accessible from http://<management_server>/ganglia.

8.4 hwloc — Portable Abstraction of Hierarchical Architectures for High-Performance Computing #

hwloc provides command-line tools and a C API to obtain the hierarchical map of key computing elements, such as: NUMA memory nodes, shared caches, processor packages, processor cores, processing units (logical processors or "threads") and even I/O devices. hwloc also gathers various attributes such as cache and memory information, and is portable across a variety of different operating systems and platforms. Additionally it may assemble the topologies of multiple machines into a single topology so as to let applications consult the topology of an entire fabric or cluster at once.

In graphical mode (X11), hwloc can display the topology in a human-readable format. Alternatively, it can export to one of several formats, including plain text, PDF, PNG, and FIG. For more information, see the man pages provided by hwloc.

It also features full support for import and export of XML-formatted topology files via the libxml2 library.

The package hwloc-devel offers a library that can be directly included into external programs. This requires that the libxml2 development library (package libxml2-devel) is available when compiling hwloc.

libxml2-devel is part of the Software Development Kit (SDK). Therefore, installing the hwloc-devel package requires the availability of SDK packages.

8.5 memkind — Heap Manager for Heterogeneous Memory Platforms and Mixed Memory Policies #

The memkind library is a user-extensible heap manager built on top of jemalloc which enables control of memory characteristics and a partitioning of the heap between kinds of memory. The kinds of memory are defined by operating system memory policies that have been applied to virtual address ranges. Memory characteristics supported by memkind without user extension include control of NUMA and page size features.

For more information, see:

Note that this tool is only available for x86-64.

8.6 mrsh/mrlogin — Remote Login Using "munge" Authentication #

mrsh is a set of remote shell programs using the "munge" authentication system instead of reserved ports for security. "munge" allows users to connect as the same user from one machine to any other machine which shares the same secret key. This can be used to set up a cluster of machines between which the user can connect and execute commands without any additional authentication.

It can be used as a drop-in replacement for rsh and rlogin.

To install mrsh, do the following:

If only the mrsh client is required (without allowing remote login to this machine), use: zypper in mrsh.
To allow logging in to a machine, the server needs to be installed: zypper in mrsh-server.
To get a drop-in replacement for rsh and rlogin, run: zypper in mrsh-rsh-server-compat or zypper in mrsh-rsh-compat.

To set up a cluster of machines allowing remote login from each other, copy the "munge" key from one machine (ideally a head node of the cluster) to the other machines within this cluster:

scp /etc/munge/munge.key root@<nodeN>:/etc/munge/munge.key

Then enable and start the services munge and mrlogin on each machine users should log in to:

systemctl enable munge.service
systemctl start munge.service
systemctl enable mrlogind.socket mrshd.socket
systemctl start mrlogind.socket mrshd.socket

To start mrsh support at boot, run:

systemctl enable munge.service
systemctl enable mrlogin.service

We do not recommend using mrsh when logged in as the user root. This is disabled by default. To enable it anyway, run:

echo "mrsh" >> /etc/securetty
echo "mrlogin" >> /etc/securetty

8.7 pdsh — Parallel Remote Shell Program #

pdsh is a parallel remote shell which can be used with multiple back-ends for remote connections. It can run a command on multiple machines in parallel.

To install pdsh, run zypper in pdsh.

On SLES 12, the back-ends ssh, mrsh, and exec are supported. The ssh back-end is the default. Non-default login methods can be used by either setting the PDSH_RCMD_TYPE environment variable or by using the -R command argument.

When using the ssh back-end, it is important that a non-interactive (that is, password-less) login method is used.

The mrsh back-end requires the mrshd to be running on the client. The mrsh back-end does not require the use of reserved sockets. Therefore, it does not suffer from port exhaustion when executing commands on many machines in parallel. For information about setting up the system to use this back-end, see Section 8.6, “mrsh/mrlogin — Remote Login Using "munge" Authentication”.

Remote machines can either be specified on the command line or pdsh can use a machines file (/etc/pdsh/machines), dsh (Dancer's shell) style groups or netgroups. Also, it can target nodes based on the currently running Slurm jobs.

The different ways to select target hosts are realized by modules. Some of these modules provide identical options to pdsh. The module loaded first will win and consume the option. Therefore, we recommend limiting yourself to a single method and specifying this with the -M option.

The machines file lists all target hosts one per line. The appropriate netgroup can be selected with the -g command line option.

Newer updates of pdsh provide the host-list plugins in separate packages. This avoids conflicts between command line options for different modules which happen to be identical and helps to keep installations small and free of unneeded dependencies. Check the Section 10.1, “Support for Genders in pdsh” in pdsh for details.

For further information, see the man page pdsh.

8.8 ohpc — OpenHPC Compatibility Macros #

ohpc contains compatibility macros to build OpenHPC packages on SUSE Linux Enterprise.

To install ohpc, run: zypper in ohpc.

8.9 PowerMan — Centralized Power Control for Clusters #

PowerMan allows manipulating remote power control devices (RPC) from a central location. It can control:

local devices connected to a serial port
RPCs listening on a TCP socket
RPCs which are accessed through an external program

The communication to RPCs is controlled by "expect"-like scripts. For a list of currently supported devices, see the configuration file /etc/powerman/powerman.conf.

To install PowerMan, run zypper in powerman.

To configure it, include the appropriate device file for your RPC (/etc/powerman/*.dev) in /etc/powerman/powerman.conf and add devices and nodes. The device "type" needs to match the "specification" name in one of the included device files, the list of "plugs" used for nodes need to match an entry in the "plug name" list.

After configuring PowerMan, start its service by:

systemctl start powerman.service

To start PowerMan automatically after every boot, do:

systemctl enable powerman.service

Optionally, PowerMan can connect to a remote PowerMan instance. To enable this, add the option listen to /etc/powerman/powerman.conf.

Important: Unencrypted Transfer

Data is transferred unencrypted, therefore this is not recommended unless the network is appropriately secured.

8.10 rasdaemon — Utility to Log RAS Error Tracings #

rasdaemon is a RAS (Reliability, Availability and Serviceability) logging tool. It records memory errors using the EDAC tracing events. EDAC drivers in the Linux kernel handle detection of ECC errors from memory controllers.

rasdaemon can be used on large memory systems to track, record and localize memory errors and how they evolve over time to detect hardware degradation. Furthermore, it can be used to localize a faulty DIMM on the board.

To check whether the EDAC drivers are loaded, execute:

ras-mc-ctl --status

The command should return ras-mc-ctl: drivers are loaded. If it indicates that the drivers are not loaded, EDAC may not be supported on your board.

To start rasdaemon, run systemctl start rasdaemon.service To start rasdaemon automatically at boot time, execute systemctl enable rasdaemon.service. The daemon will log information to /var/log/messages and to an internal database. A summary of the stored errors can be obtained with:

ras-mc-ctl --summary

The errors stored in the database can be viewed with

ras-mc-ctl --errors

Optionally, you can load the DIMM labels silk-screened on the system board to more easily identify the faulty DIMM. To do so, before starting rasdaemon, run:

systemctl start ras-mc-ctl start

For this to work, you need to set up a layout description for the board. There are no descriptions supplied by default. To add a layout description, create a file with an arbitrary name in the directory /etc/ras/dimm_labels.d/. The format is:

Vendor: <vendor-name>
  Model: <model-name>
    <label>: <mc>.<top>.<mid>.<low>

8.11 Slurm — Utility for HPC Workload Management #

Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters containing up to 65,536 nodes. Components include machine status, partition management, job management, scheduling and accounting modules.

For a minimal setup to run Slurm with "munge" support on one compute node and multiple control nodes, follow these instructions:

Before installing Slurm, create a user and a group called slurm.

Important: Make Sure of Consistent UIDs and GIDs for Slurm's Accounts

For security reasons, Slurm does not run as the user root but under its own user. It is important that the user slurm has the same UID/GID across all nodes of the cluster.

If this user/group does not exist, the package slurm creates this user and group when it is installed. However, this does not guarantee that the generated UIDs/GIDs will be identical on all systems.

Therefore, we strongly advise you to create the user/group slurm before installing slurm. If you are using a network directory service such as LDAP for user and group management, you can use it to provide the slurm user/group as well.
Install slurm-munge on the control and compute nodes: zypper in slurm-munge
Configure, enable and start "munge" on the control and compute nodes as described in Section 8.6, “mrsh/mrlogin — Remote Login Using "munge" Authentication”.
On the compute node, edit /etc/slurm/slurm.conf:
1. Configure the parameter ControlMachine=CONTROL_MACHINE with the host name of the control node.
  
  To find out the correct host name, run hostname -s on the control node.
2. Additionally add:
```
NodeName=NODE_LIST Sockets=SOCKETS \
  CoresPerSocket=CORES_PER_SOCKET \
  ThreadsPerCore=THREADS_PER_CORE \
  State=UNKNOWN
```
  and
```
PartitionName=normal Nodes=NODE_LIST \
  Default=YES MaxTime=24:00:00 State=UP
```
  where NODE_LIST is the list of compute nodes (that is, the output of hostname -s run on each compute node (either comma-separated or as ranges: foo[1-100]). Additionally, SOCKETS denotes the number of sockets, CORES_PER_SOCKET the number of cores per socket, THREADS_PER_CORE the number of threads for CPUs which can execute more than one thread at a time. (Make sure that SOCKETS * CORES_PER_SOCKET * THREADS_PER_CORE does not exceed the number of system cores on the compute node).
3. On the control node, copy /etc/slurm/slurm.conf to all compute nodes:
```
scp /etc/slurm/slurm.conf COMPUTE_NODE:/etc/slurm/
```
4. On the control node, start slurmctld:
```
systemctl start slurmctld.service
```
  Also enable it so that it starts on every boot:
```
systemctl enable slurmctld.service
```
5. On the compute nodes, start and enable slurmd:
```
systemctl start slurmd.service
systemctl enable slurmd.service
```
  The last line causes slurmd to be started on every boot automatically.

Note: Epilog Script

The standard epilog script will kill all remaining processes of a user on a node. If this behavior is not wanted, disable the standard epilog script.

For further documentation, see the Quick Start Administrator Guide (https://slurm.schedmd.com/quickstart_admin.html) and Quick Start User Guide (https://slurm.schedmd.com/quickstart.html). There is further in-depth documentation on the Slurm documentation page (https://slurm.schedmd.com/documentation.html).

8.12 GNU Compiler Collection for HPC #

gnu-compilers-hpc installs the base version of the GNU compiler suite and provides environment files for Lmod to select this compiler suite and provides environment module files for them. This version of the compiler suite is required to enable linking against HPC libraries enabled for environment modules.

This package requires lua-lmod to supply environment module support.

To install gnu-compilers-hpc, run:

zypper in gnu-compilers-hpc

To set up the environment appropriately and select the GNU toolchain, run:

module load gnu

If you have more than one version of this compiler suite installed, add the version number number of the compiler suite. For more information, see Section 8.13, “Lmod — Lua-based Environment Modules” .

8.13 Lmod — Lua-based Environment Modules #

Lmod is an advanced environment module system which allows the installation of multiple versions of a program or shared library, and helps configure the system environment for the use of a specific version. It supports hierarchical library dependencies and makes sure that the correct version of dependent libraries are selected. Environment Modules-enabled library packages supplied with the HPC module support parallel installation of different versions and flavors of the same library or binary and are supplied with appropriate lmod module files.

Installation and Basic Usage#

To install Lmod, run: zypper in lua-lmod.

Before Lmod can be used, an init file needs to be sourced from the initialization file of your interactive shell. The following init files are available:

/usr/share/lmod/<lmod_version>/init/bash
/usr/share/lmod/<lmod_version>/init/ksh 
/usr/share/lmod/<lmod_version>/init/tcsh
/usr/share/lmod/<lmod_version>/init/zsh 
/usr/share/lmod/<lmod_version>/init/sh

Pick the one appropriate for your shell. Then add the following to the init file of your shell:

. /usr/share/lmod/<LMOD_VERSION>/init/<INIT-FILE>

To obtain <lmod_version>, run:

rpm -q lua-lmod | sed "s/.*-\([^-]\+\)-.*/\1/"

The init script adds the command module.

Listing Available Modules#

To list the available all available modules, run: module spider. To show all modules which can be loaded with the currently loaded modules, run: module avail. A module name consists of a name and a version string separated by a / character. If more than one version is available for a certain module name, the default version (marked by *) or (if this isn't set) the one with the highest version number is loaded. To refer to a specific module version, the full string <name>/<version> may be used.

Listing Loaded Modules#

module list shows all currently loaded modules. Refer to module help for a short help on the module command and module help <module-name> for a help on the particular module. Please note that the 'module' command is available only when you log in after installing lua-lmod.

Gathering Information About a Module#

To get information about a particular module, run: module whatis <module-name> To load a module, run: module load <module-name>. This will ensure that your environment is modified (that is, the PATH and LD_LIBRARY_PATH and other environment variables are prepended) such that binaries and libraries provided by the respective modules are found. To run a program compiled against this library, the appropriate module load <module-name> commands must to be issued beforehand.

Loading Modules#

The module load <module> command needs to be run in the shell from which the module is to be used. Some modules require a compiler toolchain or MPI flavor module to be loaded before they are available for loading.

Environment Variables#

If the respective development packages are installed, build time environment variables like LIBRARY_PATH, CPATH, C_INCLUDE_PATH and CPLUS_INCLUDE_PATH will be set up to include the directories containing the appropriate header and library files. However, some compiler and linker commands may not honor these. In this case, use the appropriate options together with the environment variables -I <PACKAGE_NAME>_INC and -L <PACKAGE_NAME>_LIB to add the include and library paths to the command lines of the compiler and linker.

For More Information#

For more information on Lmod, see https://lmod.readthedocs.org (https://lmod.readthedocs.org).

8.14 Support for Genders Static Cluster Configuration Database #

Support for Genders has been added to the the HPC module.

Genders is a static cluster configuration database used for configuration management. It allows grouping and addressing sets of hosts by attributes and is used by a variety of tools. The Genders database is a text file which is usually replicated on each node in a cluster.

Perl, Python, C, and C++ bindings are supplied with Genders, the respective packages provide man pages or other documentation describing the APIs.

To create the Genders database, follow the instructions and examples in /etc/genders and check /usr/share/doc/packages/genders-base/TUTORIAL. Testing a configuration can be done with nodeattr (for more information, see man 1 nodeattr).

List of packages:

genders
genders-base
genders-devel
python-genders
genders-perl-compat
libgenders0
libgendersplusplus2

9 HPC Libraries #

Library packages which support environment modules follow a distinctive naming scheme: all packages have the compiler suite and, if built with MPI support, the MPI flavor in their name: *-[<MPI-flavor>]-<compiler>-hpc*. To support a parallel installation of multiple versions of a library package, the package name contains the version number (with dots . replaced by underscores _). To simplify the installation of a library, master -packages are supplied which will ensure that the latest version of a package is installed. When these 'master'-packages are updated the latest version of the respective library packages will be installed while leaving previous versions installed. Library packages are split between runtime and compile time packages. The compile time packages typically supply include files and .so-files for shared libraries. Compile time package names end with -devel. For some libraries static (.a) libraries are supplied as well, package names for these end with -devel-static.

As an example: Package names of the ScaLAPACK library version 2.0.2 built with GCC for Open MPI v1:

master library package: libscalapack2_2_0_2-gnu-openmpi1-hpc
master package: libscalapack2-gnu-openmpi1-hpc
development package: libscalapack2_2_0_2-gnu-openmpi1-hpc-devel
development master package: libscalapack2-gnu-openmpi1-hpc-devel
static library package: libscalapack2_2_0_2-gnu-openmpi1-hpc-devel-static

(Note that the digit 2 appended to the library name denotes the .so version of the library).

To install a library packages run zypper in <library-master-package>, to install a development file run zypper in <library-devel-master-package>.

Presently, the GNU compiler collection version 4.8 as provided with SUSE Linux Enterprise 12 and the MPI flavors Open MPI v.2 and MVAPICH2 are supported.

9.1 FFTW HPC Library — Discrete Fourier Transforms #

FFTW is a C subroutine library for computing the Discrete Fourier Transform (DFT) in one or more dimensions, of both real and complex data, and of arbitrary input size.

This library is available as both a serial and an MPI-enabled variant. This module requires a compiler toolchain module loaded. To select an MPI variant, the respective MPI module needs to be loaded beforehand. To load this module, run:

module load fftw3

List of master packages:

libfftw3-gnu-hpc
fftw3-gnu-hpc-devel
libfftw3-gnu-openmpi1-hpc
fftw3-gnu-openmpi1-hpc-devel
libfftw3-gnu-mvapich2-hpc
fftw3-gnu-mvapich2-hpc-devel

For general information about Lmod and modules, see Section 8.13, “Lmod — Lua-based Environment Modules” .

9.2 HDF5 HPC Library — Model, Library, File Format for Storing and Managing Data #

HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of data types, and is designed for flexible and efficient I/O and for high volume and complex data. HDF5 is portable and extensible, allowing applications to evolve in their use of HDF5.

There are serial and MPI variants of this library available. All flavors require loading a compiler toolchain module beforehand. The MPI variants also require loading the correct MPI flavor module.

To load the highest available serial version of this module run:

module load hdf5

When an MPI flavor is loaded, the MPI version of this module can be loaded by:

module load phpdf5

List of master packages:

hdf5-examples
hdf5-gnu-hpc-devel
libhdf5-gnu-hpc
libhdf5_cpp-gnu-hpc
libhdf5_fortran-gnu-hpc
libhdf5_hl_cpp-gnu-hpc
libhdf5_hl_fortran-gnu-hpc
hdf5-gnu-openmpi1-hpc-devel
libhdf5-gnu-openmpi1-hpc
libhdf5_fortran-gnu-openmpi1-hpc
libhdf5_hl_fortran-gnu-openmpi1-hpc
hdf5-gnu-mvapich2-hpc-devel
libhdf5-gnu-mvapich2-hpc
libhdf5_fortran-gnu-mvapich2-hpc
libhdf5_hl_fortran-gnu-mvapich2-hpc

For general information about Lmod and modules, see Section 8.13, “Lmod — Lua-based Environment Modules”.

9.3 HPC Library Package mpiP #

mpiP (package mpip) is a profiling library for MPI applications. It only collects statistical information about MPI functions, so mpiP generates less overhead and much less data than tracing tools.

This library is provided for the different MPI flavors supported. To use it the environment module for the desired flavor needs to be loaded (see above). To load the highest available version of this module, run: module load mpiP

List of master packages:

mpiP-gnu-openmpi1-hpc
mpiP-gnu-mvapich2-hpc

9.4 NetCDF HPC Library — Implementation of Self-Describing Data Formats #

The NetCDF software libraries for C, C++, FORTRAN, and Perl are a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.

`netcdf` Packages #

The packages with names starting with netcdf provide C bindings for the NetCDF API. These are available with and without MPI support.

There are serial and MPI variants of this library available. All flavors require loading a compiler toolchain module beforehand. The MPI variants also require loading the correct MPI flavor module.

The MPI variant becomes available when the MPI module is loaded. Both variants require loading a compiler toolchain module beforehand. To load the highest version of the non-MPI netcdf module, run:

module load netcdf

To load the highest available MPI version of this module, run:

module load pnetcdf

List of master packages:

netcdf-gnu-hpc
netcdf-gnu-hpc-devel
netcdf-gnu-hpc
netcdf-gnu-hpc-devel
netcdf-gnu-openmpi1-hpc
netcdf-gnu-openmpi1-hpc-devel
netcdf-gnu-mvapich2-hpc
netcdf-gnu-mvapich2-hpc-devel

`netcdf-cxx` Packages #

netcdf-cxx4 provides a C++ binding for the NetCDF API.

This module requires loading a compiler toolchain module beforehand. To load this module, run:

module load netcdf-cxx4

List of master packages:

libnetcdf-cxx4-gnu-hpc
libnetcdf-cxx4-gnu-hpc-devel
netcdf-cxx4-gnu-hpc-tools

`netcdf-fortran` Packages#

The netcdf-fortran packages provide FORTRAN bindings for the NetCDF API, with and without MPI support.

For More Information#

For general information about Lmod and modules, see Section 8.13, “Lmod — Lua-based Environment Modules”.

9.5 NumPy Python Library #

NumPy is a general-purpose array-processing package designed to efficiently manipulate large multi-dimensional arrays of arbitrary records without sacrificing too much speed for small multi-dimensional arrays.

NumPy is built on the Numeric code base and adds features introduced by numarray as well as an extended C-API and the ability to create arrays of arbitrary type which also makes NumPy suitable for interfacing with general-purpose data-base applications.

There are also basic facilities for discrete Fourier transform, basic linear algebra and random number generation.

This package is available both for Python 2 and Python 3. The specific compiler toolchain and MPI library flavor modules must be loaded for this library. The correct library module for the Python version used needs to be specified when loading this module.

To load this module, run for Python 2:

module load python2-numpy

For Python 3:

module load python3-numpy

List of master packages:

python2-numpy-gnu-hpc
python2-numpy-gnu-hpc-devel
python3-numpy-gnu-hpc
python3-numpy-gnu-hpc-devel

9.6 OpenBLAS Library — Optimized BLAS Library #

OpenBLAS is an optimized BLAS (Basic Linear Algebra Subprograms) library based on GotoBLAS2 1.3, BSD version. It provides the BLAS API. It is shipped as a package enabled for environment modules and thus requires using Lmod to select a version. There are two variants of this library, an OpenMP-enabled variant and a pthreads variant.

OpenMP-Enabled Variant#

The OpenMP variant covers all use cases:

Programs using OpenMP. This requires the OpenMP-enabled library version to function correctly.
Programs using pthreads. This requires an OpenBLAS library without pthread support. This can be achieved with the OpenMP-version. We recommend limiting the number of threads that are used to 1 by setting the environment variable OMP_NUM_THREADS=1.
Programs without pthreads and without OpenMP. Such programs can still take advantage of the OpenMP optimization in the library by linking against the OpenMP variant of the library.

When linking statically, ensure that libgomp.a is included by adding the linker flag -lgomp.

pthreads Variant#

The pthreads variant of the OpenBLAS library can improve the performance of single-threaded programs. The number of threads used can be controlled with the environment variable OPENBLAS_NUM_THREADS.

Installation and Usage#

This module requires loading a compiler toolchain beforehand. To select the latest version of this module provided, run:

OpenMP version:
```
module load openblas-pthreads
```
pthreads version:
```
module load openblas
```

List of master package for:

libopenblas-gnu-hpc
libopenblas-gnu-hpc-devel
libopenblas-pthreads-gnu-hpc
libopenblas-pthreads-gnu-hpc-devel

For general information about Lmod and modules, see Section 8.13, “Lmod — Lua-based Environment Modules”.

9.7 PAPI HPC Library — Consistent Interface for Hardware Performance Counters #

PAPI (package papi) provides a tool with a consistent interface and methodology for use of the performance counter hardware found in most major microprocessors.

This package serves all compiler toolchains and does not require a compiler toolchain to be selected. The latest version provided can be selected by running:

module load papi

List of master packages:

papi-hpc
papi-hpc-devel

For general information about Lmod and modules, see Section 8.13, “Lmod — Lua-based Environment Modules”.

9.8 PETSc HPC Library — Solver for Partial Differential Equations #

PETSc is a suite of data structures and routines for the scalable (parallel) solution of scientific applications modeled by partial differential equations.

This module requires loading a compiler toolchain as well as an MPI library flavor beforehand. To load this module, run:

module load petsc

List of master packages:

libpetsc-gnu-openmpi1-hpc
petsc-gnu-openmpi1-hpc-devel
libpetsc-gnu-mvapich2-hpc
petsc-gnu-mvapich2-hpc-devel

For general information about Lmod and modules, see Section 8.13, “Lmod — Lua-based Environment Modules”.

9.9 ScaLAPACK HPC Library — LAPACK Routines #

The library ScaLAPACK (short for "Scalable LAPACK") includes a subset of LAPACK routines designed for distributed memory MIMD-parallel computers.

This library requires loading both a compiler toolchain and an MPI library flavor beforehand. To load this library, run

module load scalapack

List of master packages:

libblacs2-gnu-openmpi1-hpc
libblacs2-gnu-openmpi1-hpc-devel
libscalapack2-gnu-openmpi1-hpc
libscalapack2-gnu-openmpi1-hpc-devel
libblacs2-gnu-mvapich2-hpc
libblacs2-gnu-mvapich2-hpc-devel
libscalapack2-gnu-mvapich2-hpc
libscalapack2-gnu-mvapich2-hpc-devel

For general information about Lmod and modules, see Section 8.13, “Lmod — Lua-based Environment Modules”.

10 Updated Packages #

10.1 Support for Genders in pdsh #

Since Genders has been added to the HPC module, the genders plugin for pdsh is now supported.

At the same time, all host-list plugins to pdsh have been packaged separately to avoid conflicts due to identical options.

Host list plugins are no longer installed automatically. If, for instance, the slurm plugin has been used so far, it must be installed separately after the update.

10.2 Lmod Has Been Updated to Version 7.6 #

Lmod (package lua-lmod has been updated to version 7.6. This version is the minimum version that is required to work with the SUSE-supplied HPC libraries.

10.3 Support for Intel Knights Mill CPUs in cpuid #

cpuid has been updated to support Intel Knights Mill CPUs (x86-64).

10.4 pdsh Has Been Updated to Version 2.33 #

pdsh has been updated version 2.33. For more information on the update, see the package change log.

10.5 ConMan Has Been Updated to Version 0.2.8 #

ConMan has been updated to version 0.2.8. For more information about the update, see the package change log.

10.6 Slurm Has Been Updated to Version 17.02.9 #

Slurm has been update to version 17.02.9. This update is recommended as it contains a security update to fix CVE-2017-15566. For more information about the update, see the package change log.

To make it possible to keep older versions of this library installed, with this version, the libslurm and libslurmdb have been split from the slurm base package.

Together with the updated version, the deprecated package slurm-sched-wiki has been removed. This package was only relevant in connection with the MOAB and MAUI schedulers which were never shipped with SUSE Linux Enterprise.

The subpackage slurm-torque has been newly introduced: It provides a Torque-like set of commands to Slurm for users switching from Torque.

When updating Slurm, the configuration file needs to be updated: In /etc/slurm/slurm.conf set: SlurmctldPidFile=/var/run/slurm/slurmctld.pid SlurmdPidFile=/var/run/slurm/slurmd.pid

10.7 Slurm Has Been Updated to Version 17.02.10 #

slurm has been updated to version 17.02.10. The update is recommended, as it contains the security fix CVE-2018-7033. For details on this and other changes introduced by this version refer to the package change log.

If the Slurm database daemon slurmdbd is used, its configuration /etc/slurm/slurmdbd.conf may need to be updated:

PidFile=/var/run/slurm/slurmdbd.pid

During update one or more of the following error messages may occur:

Job for slurmd.service failed because the control process exited
with error code. See "systemctl status slurmd.service" and
"journalctl -xe" for details.

 Job for slurmctld.service failed because the control process exited with
 error code. See "systemctl status slurmctld.service" and
 "journalctl -xe" for details

Job for slurmdbd.service failed because the control process exited
 with error code. See "systemctl status slurmdbd.service" and
 "journalctl -xe" for details

These messages should be harmless, the services should have been restarted regardless. However, make sure that all enabled Slurm services are running again after an update:

for the control daemon, run: systemctl status slurmctld
for the compute node daemon, run: systemctl status slurmd
for the database daemon daemon, run: systemctl status slurmctld

If any service is not running, restart it manually: systemctl start <service_name>

10.8 Slurm Has Been Updated to Version 17.02.11 #

Slurm has been updated to version 17.02.11 to mitigate insecure handling of user_name and gid fields as reported in CVE-2018-10995.

11 Legal Notices #

SUSE makes no representations or warranties with respect to the contents or use of this documentation, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. Further, SUSE reserves the right to revise this publication and to make changes to its content, at any time, without the obligation to notify any person or entity of such revisions or changes.

Further, SUSE makes no representations or warranties with respect to any software, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. Further, SUSE reserves the right to make changes to any and all parts of SUSE software, at any time, without any obligation to notify any person or entity of such changes.

Any products or technical information provided under this Agreement may be subject to U.S. export controls and the trade laws of other countries. You agree to comply with all export control regulations and to obtain any required licenses or classifications to export, re-export, or import deliverables. You agree not to export or re-export to entities on the current U.S. export exclusion lists or to any embargoed or terrorist countries as specified in U.S. export laws. You agree to not use deliverables for prohibited nuclear, missile, or chemical/biological weaponry end uses. Refer to https://www.suse.com/company/legal/ for more information on exporting SUSE software. SUSE assumes no responsibility for your failure to obtain any necessary export approvals.

Copyright © 2010-2019 SUSE LLC. This release notes document is licensed under a Creative Commons Attribution-NoDerivs 3.0 United States License (CC-BY-ND-3.0 US, https://creativecommons.org/licenses/by-nd/3.0/us/).

SUSE has intellectual property rights relating to technology embodied in the product that is described in this document. In particular, and without limitation, these intellectual property rights may include one or more of the U.S. patents listed at https://www.suse.com/company/legal/ and one or more additional patents or pending patent applications in the U.S. and other countries.

For SUSE trademarks, see SUSE Trademark and Service Mark list (https://www.suse.com/company/legal/). All third-party trademarks are the property of their respective owners.

SLES High-Performance Computing Module for SLES 12

Release Notes #

1 SLES High-Performance Computing Module for SLES 12 #

2 Availability #

3 Support and Life Cycle #

4 Documentation and Other Information #

5 How to Obtain Source Code #

6 Support Statement SLES High-Performance Computing Module #

7 Installation and Upgrade #

7.1 Upgrade-Related Notes #

7.1.1 Error on Migration From 12 SP2 to 12 SP3 When High-Performance Computing Module Is Selected #

7.1.2 Upgrading to SLE 15 #

8 Functionality #

8.1 ConMan — The Console Manager #

Important: conmand Sends Unencrypted Data

8.2 cpuid — x86 CPU Identification Tool #

8.3 Ganglia — System Monitoring #

8.4 hwloc — Portable Abstraction of Hierarchical Architectures for High-Performance Computing #

8.5 memkind — Heap Manager for Heterogeneous Memory Platforms and Mixed Memory Policies #

8.6 mrsh/mrlogin — Remote Login Using "munge" Authentication #

8.7 pdsh — Parallel Remote Shell Program #

8.8 ohpc — OpenHPC Compatibility Macros #

8.9 PowerMan — Centralized Power Control for Clusters #

Important: Unencrypted Transfer

8.10 rasdaemon — Utility to Log RAS Error Tracings #

8.11 Slurm — Utility for HPC Workload Management #

Important: Make Sure of Consistent UIDs and GIDs for Slurm's Accounts

Note: Epilog Script

8.12 GNU Compiler Collection for HPC #

8.13 Lmod — Lua-based Environment Modules #

Installation and Basic Usage#

Listing Available Modules#

Listing Loaded Modules#

Gathering Information About a Module#

Loading Modules#

Environment Variables#

For More Information#

8.14 Support for Genders Static Cluster Configuration Database #

9 HPC Libraries #

9.1 FFTW HPC Library — Discrete Fourier Transforms #

9.2 HDF5 HPC Library — Model, Library, File Format for Storing and Managing Data #

9.3 HPC Library Package mpiP #

9.4 NetCDF HPC Library — Implementation of Self-Describing Data Formats #

netcdf Packages #

netcdf-cxx Packages #

netcdf-fortran Packages#

For More Information#

9.5 NumPy Python Library #

9.6 OpenBLAS Library — Optimized BLAS Library #

OpenMP-Enabled Variant#

pthreads Variant#

Installation and Usage#

9.7 PAPI HPC Library — Consistent Interface for Hardware Performance Counters #

9.8 PETSc HPC Library — Solver for Partial Differential Equations #

9.9 ScaLAPACK HPC Library — LAPACK Routines #

10 Updated Packages #

10.1 Support for Genders in pdsh #

10.2 Lmod Has Been Updated to Version 7.6 #

10.3 Support for Intel Knights Mill CPUs in cpuid #

10.4 pdsh Has Been Updated to Version 2.33 #

10.5 ConMan Has Been Updated to Version 0.2.8 #

10.6 Slurm Has Been Updated to Version 17.02.9 #

10.7 Slurm Has Been Updated to Version 17.02.10 #

10.8 Slurm Has Been Updated to Version 17.02.11 #

11 Legal Notices #

`netcdf` Packages #

`netcdf-cxx` Packages #

`netcdf-fortran` Packages#