Recommended update for slurm_23_02
| Announcement ID: | SUSE-RU-2023:4333-1 |
|---|---|
| Rating: | moderate |
| References: | |
| Affected Products: |
|
An update that has one fix can now be installed.
Description:
This update for slurm_23_02 fixes the following issues:
-
Updated to version 23.02.5 with the following changes:
-
Bug Fixes:
- Revert a change in 23.02 where
SLURM_NTASKSwas no longer set in the job's environment when--ntasks-per-nodewas requested. The method that is is being set, however, is different and should be more accurate in more situations. - Change pmi2 plugin to honor the
SrunPortRangeoption. This matches the new behavior of the pmix plugin in 23.02.0. Note that neither of these plugins makes use of theMpiParams=ports=option, and previously were only limited by the systems ephemeral port range. - Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if a node features plugin is configured.
- Fix and prevent reoccurring reservations from overlapping.
job_container/tmpfs- Avoid attempts to share BasePath between nodes.- With
CR_Cpu_Memory, fix node selection for jobs that request gres and--mem-per-cpu. - Fix a regression from 22.05.7 in which some jobs were allocated too few nodes, thus overcommitting cpus to some tasks.
- Fix a job being stuck in the completing state if the job ends while the primary controller is down or unresponsive and the backup controller has not yet taken over.
- Fix
slurmctldsegfault when a node registers with a configuredCpuSpecListwhileslurmctldconfiguration has the node withoutCpuSpecList. - Fix cloud nodes getting stuck in
POWERED_DOWN+NO_RESPONDstate after not registering byResumeTimeout. slurmstepd- Avoid cleanup ofconfig.json-lesscontainers spooldir getting skipped.- Fix scontrol segfault when 'completing' command requested repeatedly in interactive mode.
- Properly handle a race condition between
bind()andlisten()calls in the network stack when running with SrunPortRange set. - Federation - Fix revoked jobs being returned regardless of the
-a/--alloption for privileged users. - Federation - Fix canceling pending federated jobs from non-origin clusters which could leave federated jobs orphaned from the origin cluster.
- Fix sinfo segfault when printing multiple clusters with
--noheaderoption. - Federation - fix clusters not syncing if clusters are added to a federation before they have registered with the dbd.
node_features/helpers- Fix node selection for jobs requesting changeable. features with the|operator, which could prevent jobs from running on some valid nodes.node_features/helpers- Fix inconsistent handling of&and|, where an AND'd feature was sometimes AND'd to all sets of features instead of just the current set. E.g.foo|bar&bazwas interpreted as{foo,baz}or{bar,baz}instead of how it is documented:{foo} or {bar,baz}.- Fix job accounting so that when a job is requeued its allocated node
count is cleared. After the requeue, sacct will correctly show that
the job has 0
AllocNodeswhile it is pending or if it is canceled before restarting. sacct-AllocCPUSnow correctly shows 0 if a job has not yet received an allocation or if the job was canceled before getting one.- Fix intel OneAPI autodetect: detect the
/dev/dri/renderD[0-9]+GPUs, and do not detect/dev/dri/card[0-9]+. - Fix node selection for jobs that request
--gpusand a number of tasks fewer than GPUs, which resulted in incorrectly rejecting these jobs. - Remove
MYSQL_OPT_RECONNECTcompletely. - Fix cloud nodes in
POWERING_UPstate disappearing (getting set toFUTURE) when anscontrol reconfigurehappens. openapi/dbv0.0.39- Avoid assert / segfault on missing coordinators list.slurmrestd- Correct memory leak while parsing OpenAPI specification templates with server overrides.- Fix overwriting user node reason with system message.
- Prevent deadlock when
rpc_queueis enabled. slurmrestd- Correct OpenAPI specification generation bug where fields with overlapping parent paths would not get generated.- Fix memory leak as a result of a partition info query.
- Fix memory leak as a result of a job info query.
- For step allocations, fix
--gres=nonesometimes not ignoring gres from the job. - Fix
--exclusivejobs incorrectly gang-scheduling where they shouldn't. - Fix allocations with
CR_SOCKET, gres not assigned to a specific socket, and block core distribion potentially allocating more sockets than required. - Revert a change in 23.02.3 where Slurm would kill a script's process
group as soon as the script ended instead of waiting as long as any
process in that process group held the stdout/stderr file descriptors
open. That change broke some scripts that relied on the previous
behavior. Setting time limits for scripts (such as
PrologEpilogTimeout) is strongly encouraged to avoid Slurm waiting indefinitely for scripts to finish. - Fix
slurmdbd -Rnot returning an error under certain conditions. slurmdbd- Avoid potential NULL pointer dereference in the mysql plugin.- Fix regression in 23.02.3 which broken X11 forwarding for hosts when
MUNGE sends a localhost address in the encode host field. This is caused
when the node hostname is mapped to 127.0.0.1 (or similar) in
/etc/hosts. openapi/[db]v0.0.39- fix memory leak on parsing error.data_parser/v0.0.39- fix updating qos for associations.openapi/dbv0.0.39- fix updating values for associations with null users.- Fix minor memory leak with
--tres-per-taskand licenses. - Fix cyclic socket cpu distribution for tasks in a step where
--cpus-per-task< usable threads per core. slurmrestd- ForGET /slurm/v0.0.39/node[s], change format of node's energy fieldcurrent_wattsto a dictionary to account for unset value instead of dumping 4294967294.slurmrestd- ForGET /slurm/v0.0.39/qos, change format of QOS's field "priority" to a dictionary to account for unset value instead of dumping 4294967294.- slurmrestd - For
GET /slurm/v0.0.39/job[s], the 'return code' code field inv0.0.39_job_exit_code will be set to -127 instead of being left unset where job does not have a relevant return code.
- Revert a change in 23.02 where
-
Other Changes:
- Remove --uid / --gid options from salloc and srun commands. These options did not work correctly since the CVE-2022-29500 fix in combination with some changes made in 23.02.0.
- Add the
JobIdtodebug()messages indicating whencpus_per_task/mem_per_cpuorpn_min_cpusare being automatically adjusted. - Change the log message warning for rate limited users from verbose to info.
slurmstepd- Cleanup per task generated environment for containers in spooldir.- Format batch, extern, interactive, and pending step ids into strings that are human readable.
slurmrestd- Reduce memory usage when printing out job CPU frequency.data_parser/v0.0.39- Addrequired/memory_per_cpuandrequired/memory_per_nodetosacct --jsonandsacct --yamlandGET /slurmdb/v0.0.39/jobsfrom slurmrestd.gpu/oneapi- Store cores correctly so CPU affinity is tracked.- Allow
slurmdbd -Rto work if the root assoc id is not 1. - Limit periodic node registrations to 50 instead of the full
TreeWidth. Since unresolvablecloud/dynamicnodes must disable fanout by settingTreeWidthto a large number, this would cause all nodes to register at once.
Patch Instructions:
To install this SUSE update use the SUSE recommended
installation methods like YaST online_update or "zypper patch".
Alternatively you can run the command listed for your product:
-
openSUSE Leap 15.3
zypper in -t patch SUSE-2023-4333=1 -
openSUSE Leap 15.4
zypper in -t patch openSUSE-SLE-15.4-2023-4333=1 -
HPC Module 15-SP4
zypper in -t patch SUSE-SLE-Module-HPC-15-SP4-2023-4333=1 -
SUSE Linux Enterprise High Performance Computing ESPOS 15 SP3
zypper in -t patch SUSE-SLE-Product-HPC-15-SP3-ESPOS-2023-4333=1 -
SUSE Linux Enterprise High Performance Computing LTSS 15 SP3
zypper in -t patch SUSE-SLE-Product-HPC-15-SP3-LTSS-2023-4333=1
Package List:
-
openSUSE Leap 15.3 (aarch64 ppc64le s390x x86_64)
- libnss_slurm2_23_02-23.02.5-150300.7.11.2
- slurm_23_02-hdf5-23.02.5-150300.7.11.2
- slurm_23_02-plugin-ext-sensors-rrd-23.02.5-150300.7.11.2
- slurm_23_02-debugsource-23.02.5-150300.7.11.2
- libslurm39-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-plugin-ext-sensors-rrd-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-plugins-23.02.5-150300.7.11.2
- libpmi0_23_02-23.02.5-150300.7.11.2
- slurm_23_02-lua-23.02.5-150300.7.11.2
- libpmi0_23_02-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-devel-23.02.5-150300.7.11.2
- perl-slurm_23_02-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-cray-23.02.5-150300.7.11.2
- slurm_23_02-auth-none-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-node-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-munge-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-torque-23.02.5-150300.7.11.2
- slurm_23_02-hdf5-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-lua-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-auth-none-23.02.5-150300.7.11.2
- libnss_slurm2_23_02-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-pam_slurm-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-pam_slurm-23.02.5-150300.7.11.2
- slurm_23_02-rest-23.02.5-150300.7.11.2
- slurm_23_02-torque-debuginfo-23.02.5-150300.7.11.2
- libslurm39-23.02.5-150300.7.11.2
- slurm_23_02-slurmdbd-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-23.02.5-150300.7.11.2
- slurm_23_02-node-23.02.5-150300.7.11.2
- slurm_23_02-munge-23.02.5-150300.7.11.2
- perl-slurm_23_02-23.02.5-150300.7.11.2
- slurm_23_02-rest-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-sql-23.02.5-150300.7.11.2
- slurm_23_02-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-cray-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-sql-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-sview-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-sview-23.02.5-150300.7.11.2
- slurm_23_02-slurmdbd-23.02.5-150300.7.11.2
- slurm_23_02-plugins-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-testsuite-23.02.5-150300.7.11.2
-
openSUSE Leap 15.3 (noarch)
- slurm_23_02-doc-23.02.5-150300.7.11.2
- slurm_23_02-config-man-23.02.5-150300.7.11.2
- slurm_23_02-sjstat-23.02.5-150300.7.11.2
- slurm_23_02-webdoc-23.02.5-150300.7.11.2
- slurm_23_02-seff-23.02.5-150300.7.11.2
- slurm_23_02-config-23.02.5-150300.7.11.2
- slurm_23_02-openlava-23.02.5-150300.7.11.2
-
openSUSE Leap 15.4 (aarch64 ppc64le s390x x86_64)
- libnss_slurm2_23_02-23.02.5-150300.7.11.2
- slurm_23_02-plugin-ext-sensors-rrd-23.02.5-150300.7.11.2
- slurm_23_02-debugsource-23.02.5-150300.7.11.2
- libslurm39-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-plugin-ext-sensors-rrd-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-plugins-23.02.5-150300.7.11.2
- libpmi0_23_02-23.02.5-150300.7.11.2
- slurm_23_02-lua-23.02.5-150300.7.11.2
- libpmi0_23_02-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-devel-23.02.5-150300.7.11.2
- perl-slurm_23_02-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-cray-23.02.5-150300.7.11.2
- slurm_23_02-auth-none-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-node-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-munge-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-torque-23.02.5-150300.7.11.2
- slurm_23_02-lua-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-auth-none-23.02.5-150300.7.11.2
- libnss_slurm2_23_02-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-pam_slurm-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-pam_slurm-23.02.5-150300.7.11.2
- slurm_23_02-rest-23.02.5-150300.7.11.2
- slurm_23_02-torque-debuginfo-23.02.5-150300.7.11.2
- libslurm39-23.02.5-150300.7.11.2
- slurm_23_02-slurmdbd-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-23.02.5-150300.7.11.2
- slurm_23_02-node-23.02.5-150300.7.11.2
- slurm_23_02-munge-23.02.5-150300.7.11.2
- perl-slurm_23_02-23.02.5-150300.7.11.2
- slurm_23_02-rest-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-sql-23.02.5-150300.7.11.2
- slurm_23_02-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-cray-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-sql-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-sview-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-sview-23.02.5-150300.7.11.2
- slurm_23_02-slurmdbd-23.02.5-150300.7.11.2
- slurm_23_02-plugins-debuginfo-23.02.5-150300.7.11.2
-
openSUSE Leap 15.4 (noarch)
- slurm_23_02-config-23.02.5-150300.7.11.2
- slurm_23_02-doc-23.02.5-150300.7.11.2
- slurm_23_02-config-man-23.02.5-150300.7.11.2
- slurm_23_02-webdoc-23.02.5-150300.7.11.2
-
HPC Module 15-SP4 (aarch64 x86_64)
- libnss_slurm2_23_02-23.02.5-150300.7.11.2
- slurm_23_02-plugin-ext-sensors-rrd-23.02.5-150300.7.11.2
- slurm_23_02-debugsource-23.02.5-150300.7.11.2
- libslurm39-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-plugin-ext-sensors-rrd-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-plugins-23.02.5-150300.7.11.2
- libpmi0_23_02-23.02.5-150300.7.11.2
- slurm_23_02-lua-23.02.5-150300.7.11.2
- libpmi0_23_02-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-devel-23.02.5-150300.7.11.2
- perl-slurm_23_02-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-cray-23.02.5-150300.7.11.2
- slurm_23_02-auth-none-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-node-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-munge-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-torque-23.02.5-150300.7.11.2
- slurm_23_02-lua-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-auth-none-23.02.5-150300.7.11.2
- libnss_slurm2_23_02-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-pam_slurm-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-pam_slurm-23.02.5-150300.7.11.2
- slurm_23_02-rest-23.02.5-150300.7.11.2
- slurm_23_02-torque-debuginfo-23.02.5-150300.7.11.2
- libslurm39-23.02.5-150300.7.11.2
- slurm_23_02-slurmdbd-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-23.02.5-150300.7.11.2
- slurm_23_02-node-23.02.5-150300.7.11.2
- slurm_23_02-munge-23.02.5-150300.7.11.2
- perl-slurm_23_02-23.02.5-150300.7.11.2
- slurm_23_02-rest-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-sql-23.02.5-150300.7.11.2
- slurm_23_02-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-cray-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-sql-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-sview-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-sview-23.02.5-150300.7.11.2
- slurm_23_02-slurmdbd-23.02.5-150300.7.11.2
- slurm_23_02-plugins-debuginfo-23.02.5-150300.7.11.2
-
HPC Module 15-SP4 (noarch)
- slurm_23_02-config-23.02.5-150300.7.11.2
- slurm_23_02-doc-23.02.5-150300.7.11.2
- slurm_23_02-config-man-23.02.5-150300.7.11.2
- slurm_23_02-webdoc-23.02.5-150300.7.11.2
-
SUSE Linux Enterprise High Performance Computing ESPOS 15 SP3 (aarch64 x86_64)
- libnss_slurm2_23_02-23.02.5-150300.7.11.2
- slurm_23_02-plugin-ext-sensors-rrd-23.02.5-150300.7.11.2
- slurm_23_02-debugsource-23.02.5-150300.7.11.2
- libslurm39-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-plugin-ext-sensors-rrd-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-plugins-23.02.5-150300.7.11.2
- libpmi0_23_02-23.02.5-150300.7.11.2
- slurm_23_02-lua-23.02.5-150300.7.11.2
- libpmi0_23_02-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-devel-23.02.5-150300.7.11.2
- perl-slurm_23_02-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-cray-23.02.5-150300.7.11.2
- slurm_23_02-auth-none-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-node-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-munge-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-torque-23.02.5-150300.7.11.2
- slurm_23_02-lua-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-auth-none-23.02.5-150300.7.11.2
- libnss_slurm2_23_02-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-pam_slurm-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-pam_slurm-23.02.5-150300.7.11.2
- slurm_23_02-rest-23.02.5-150300.7.11.2
- slurm_23_02-torque-debuginfo-23.02.5-150300.7.11.2
- libslurm39-23.02.5-150300.7.11.2
- slurm_23_02-slurmdbd-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-23.02.5-150300.7.11.2
- slurm_23_02-node-23.02.5-150300.7.11.2
- slurm_23_02-munge-23.02.5-150300.7.11.2
- perl-slurm_23_02-23.02.5-150300.7.11.2
- slurm_23_02-rest-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-sql-23.02.5-150300.7.11.2
- slurm_23_02-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-cray-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-sql-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-sview-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-sview-23.02.5-150300.7.11.2
- slurm_23_02-slurmdbd-23.02.5-150300.7.11.2
- slurm_23_02-plugins-debuginfo-23.02.5-150300.7.11.2
-
SUSE Linux Enterprise High Performance Computing ESPOS 15 SP3 (noarch)
- slurm_23_02-config-23.02.5-150300.7.11.2
- slurm_23_02-doc-23.02.5-150300.7.11.2
- slurm_23_02-config-man-23.02.5-150300.7.11.2
- slurm_23_02-webdoc-23.02.5-150300.7.11.2
-
SUSE Linux Enterprise High Performance Computing LTSS 15 SP3 (aarch64 x86_64)
- libnss_slurm2_23_02-23.02.5-150300.7.11.2
- slurm_23_02-plugin-ext-sensors-rrd-23.02.5-150300.7.11.2
- slurm_23_02-debugsource-23.02.5-150300.7.11.2
- libslurm39-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-plugin-ext-sensors-rrd-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-plugins-23.02.5-150300.7.11.2
- libpmi0_23_02-23.02.5-150300.7.11.2
- slurm_23_02-lua-23.02.5-150300.7.11.2
- libpmi0_23_02-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-devel-23.02.5-150300.7.11.2
- perl-slurm_23_02-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-cray-23.02.5-150300.7.11.2
- slurm_23_02-auth-none-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-node-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-munge-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-torque-23.02.5-150300.7.11.2
- slurm_23_02-lua-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-auth-none-23.02.5-150300.7.11.2
- libnss_slurm2_23_02-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-pam_slurm-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-pam_slurm-23.02.5-150300.7.11.2
- slurm_23_02-rest-23.02.5-150300.7.11.2
- slurm_23_02-torque-debuginfo-23.02.5-150300.7.11.2
- libslurm39-23.02.5-150300.7.11.2
- slurm_23_02-slurmdbd-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-23.02.5-150300.7.11.2
- slurm_23_02-node-23.02.5-150300.7.11.2
- slurm_23_02-munge-23.02.5-150300.7.11.2
- perl-slurm_23_02-23.02.5-150300.7.11.2
- slurm_23_02-rest-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-sql-23.02.5-150300.7.11.2
- slurm_23_02-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-cray-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-sql-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-sview-debuginfo-23.02.5-150300.7.11.2
- slurm_23_02-sview-23.02.5-150300.7.11.2
- slurm_23_02-slurmdbd-23.02.5-150300.7.11.2
- slurm_23_02-plugins-debuginfo-23.02.5-150300.7.11.2
-
SUSE Linux Enterprise High Performance Computing LTSS 15 SP3 (noarch)
- slurm_23_02-config-23.02.5-150300.7.11.2
- slurm_23_02-doc-23.02.5-150300.7.11.2
- slurm_23_02-config-man-23.02.5-150300.7.11.2
- slurm_23_02-webdoc-23.02.5-150300.7.11.2