Feature update for slurm and pdsh
| Announcement ID: | SUSE-FU-2023:3860-1 |
|---|---|
| Rating: | moderate |
| References: | |
| Affected Products: |
|
An update that contains one feature and has seven fixes can now be installed.
Description:
This update for slurm and pdsh fixes the following issues:
Release of Slurm 23.02 (jsc#PED-2987):
- Important notes:
- If using the
slurmdbd(Slurm DataBase Daemon) you must update this first. - If using a backup DBD you must start the primary first to do any database conversion, the backup will not start until this has happened.
- The 23.02
slurmdbdwill work with Slurm daemons of version 21.08 and above. You will not need to update all clusters at the same time, but it is very important to updateslurmdbdfirst and having it running before updating any other clusters making use of it. - Slurm can be upgraded from version 21.08 or 22.05 to version 23.02 without loss of jobs or other state information. Upgrading directly from an earlier version of Slurm will result in loss of state information.
- All SPANK plugins must be recompiled when upgrading from any Slurm version prior to 23.02
- PMIx v1.x is no longer supported
- Highlights of bug fixes and changes:
- From version 23.02.04:
- Fix main scheduler loop not starting after a failover to backup controller.
Avoid
slurmctldsegfault when specifyingAccountingStorageExternalHost(bsc#1214983)
- Fix main scheduler loop not starting after a failover to backup controller.
Avoid
- From version 23.02.03:
slurmctld- Fix backup slurmctld crash when it takes control multiple times.
- From version 23.02.02:
- Fix for a regression in 23.02 that caused openmpi mpirun to fail to launch tasks.
- From version 23.02.01:
- Use libpmix.so.2 instead of libpmix.so (bsc#1209260)
- From version 23.02.00:
- Web-configurator: changed presets to SUSE defaults.
- Remove workaround to fix the restart issue in the Slurm package (bsc#1088693) The Slurm version in this package is 16.05. Any attempt to directly migrate to the current version is bound to fail
- Now require
slurm-mungeifmungeauthentication is installed - Move the ext_sensors/rrd plugin to a separate package:
this plugin requires
librrdwhich in turn requires huge parts of the client side X Window System stack. - slurmctld - Add new RPC rate limiting feature. This is enabled through SlurmctldParameters=rl_enable, otherwise disabled by default.
- Make scontrol reconfigure and sending a SIGHUP to the slurmctld behave the same. If you were using SIGHUP as a 'lighter' scontrol reconfigure to rotate logs please update your scripts to use SIGUSR2 instead.
- Change cloud nodes to show by default. PrivateData=cloud is no longer needed.
- sreport - Count planned (FKA reserved) time for jobs running in IGNORE_JOBS reservations. Previously was lumped into IDLE time.
- job_container/tmpfs - Support running with an arbitrary list of private mount points (/tmp and /dev/shm are the default, but not required).
- job_container/tmpfs - Set more environment variables in InitScript.
- Make all cgroup directories created by Slurm owned by root. This was the behavior in cgroup/v2 but not in cgroup/v1 where by default the step directories ownership were set to the user and group of the job.
- accounting_storage/mysql - change purge/archive to calculate record ages based on end time, rather than start or submission times.
- job_submit/lua - add support for log_user() from slurm_job_modify().
- Run the following scripts in slurmscriptd instead of slurmctld: ResumeProgram, ResumeFailProgram, SuspendProgram, ResvProlog, ResvEpilog and RebootProgram (only with SlurmctldParameters=reboot_from_controller).
- Only permit changing log levels with 'srun --slurmd-debug' by root or SlurmUser.
- slurmctld will fatal() when reconfiguring the job_submit plugin fails.
- Add PowerDownOnIdle partition option to power down nodes after nodes become idle.
- Add "[jobid.stepid]" prefix from slurmstepd and "slurmscriptd" prefix from slurmcriptd to Syslog logging. Previously was only happening when logging to a file.
- Add purge and archive functionality for job environment and job batch script records.
- Extend support for Include files to all "configless" client commands.
- Make node weight usable for powered down and rebooting nodes.
- Removed 'launch' plugin.
- Add "Extra" field to job to store extra information other than a comment.
- Add usage gathering for AMD (requires ROCM 5.5+) and NVIDIA gpus.
- Add job's allocated nodes, features, oversubscribe, partition, and reservation to SLURM_RESUME_FILE output for power saving.
- Automatically create directories for stdout/stderr output files. Paths may use %j and related substitution characters as well.
- Add --tres-per-task to salloc/sbatch/srun.
- Allow nodefeatures plugin features to work with cloud nodes.
- For the full list of new features fixes and changes please consult the packaged NEWS file at the following point
release references:
- 23.02.4
- 23.02.3
- 23.02.2
- 23.02.1
- 23.02.0
- Configuration file changes:
job_container.conf- Added "Dirs" option to list desired private mount pointsnode_featuresplugins - invalid users specified forAllowUserBootwill now result infatal()rather than just an error- Allow jobs to queue even if the user is not in
AllowGroupswhenEnforcePartLimits=nois set. This ensures consistency for all the Partition access controls, and matches the documented behavior forEnforcePartLimits - Add
InfluxDBTimeoutparameter toacct_gather.conf job_container/tmpfs- add support for expanding%hand%ninBasePathslurm.conf- RemovedSlurmctldPlugstackoption- Add new
SlurmctldParameters=validate_nodeaddr_threads=<number>option to allow concurrent hostname resolution atslurmctldstartup - Add new
AccountingStoreFlags=job_extraoption to store a job's extra field in the database - Add new "
defer_batch" option toSchedulerParametersto only defer scheduling for batch jobs - Add new
DebugFlagsoption 'JobComp' to replace 'Elasticsearch' - Add configurable job requeue limit parameter -
MaxBatchRequeue- inslurm.confto permit changes from the old hard-coded value of 5 helpers.conf- Allow specification of node specific featureshelpers.conf- Allow many features to one helper scriptjob_container/tmpfs- Add "Shared" option to support shared namespaces. This allows autofs to work with thejob_container/tmpfsplugin when enabledacct_gather.conf- AddedEnergyIPMIPowerSensors=Node=DCMIandNode=DCMI_ENHANCED.- Add new "
getnameinfo_cache_timeout=<number>" option to CommunicationParameters to adjust or disable caching the results ofgetnameinfo() - Add new PrologFlags=ForceRequeueOnFail option to automatically requeue batch jobs on Prolog failures regardless of the job --requeue setting
- Add
HealthCheckNodeState=NONDRAINED_IDLEoption. - Add '
explicit' to Flags ingres.conf. This makes it so the gres is not automatically added to a job's allocation when--exclusiveis used. Note that this is a per-node flag. - Moved the "
preempt_" options fromSchedulerParameterstoPreemptParameters, and dropped the prefix from the option names. (The old options will still be parsed for backwards compatibility, but are now undocumented.) - Add
LaunchParameters=ulimit_pam_adopt, which enables settingRLIMIT_RSSin adopted processes. - Update SwitchParameters=job_vni to enable/disable creating job VNIs for all jobs, or when a user requests them
- Update
SwitchParameters=single_node_vnito enable/disable creating single node VNIs for all jobs, or when a user requests them - Add ability to preserve
SuspendExc*parameters on reconfig withReconfigFlags=KeepPowerSaveSettings slurmdbd.conf- Add newAllResourcesAbsoluteto force all new resources to be created with theAbsoluteflagtopology/tree- Add newTopologyParam=SwitchAsNodeRankoption to reorder nodes based on switch layout. This can be useful if the naming convention for the nodes does not natually map to the network topology- Removed the default setting for
GpuFreqDef. If unset, no attempt to change the GPU frequency will be made if--gpu-freqis not set for the step - Command Changes:
sacctmgr- no longer force updates to the AdminComment, Comment, or SystemComment to lower-casesinfo- Add -F/--future option to sinfo to display future nodes.sacct- Rename 'Reserved' field to 'Planned' to match sreport and the nomenclature of the 'Planned' nodescontrol- advanced reservation flag MAINT will no longer replace nodes, similar to STATIC_ALLOCsbatch- add parsing for #PBS -d and #PBS -w.scontrolshow assoc_mgr will show username(uid) instead of uid in QoS section.- Add
strigger --drainingand-R/--resumeoptions. - Change
--oversubscribeand--exclusiveto be mutually exclusive for job submission. Job submission commands will now fatal if both are set. Previously, these options would override each other, with the last one in the job submission command taking effect. scontrol- Requested TRES and allocated TRES will now always be printed when showing jobs, instead of one TRES output that was either the requested or allocated.srun --ntasks-per-corenow applies to job and step allocations. Now, use of--ntasks-per-core=1implies--cpu-bind=coresand--ntasks-per-core>1implies--cpu-bind=threads.salloc/sbatch/srun- Check and abort ifntasks-per-core>threads-per-core.scontrol- AddResumeAfter=<secs>option to "scontrol update nodename=".- Add a new "nodes=" argument to scontrol setdebug to allow the debug level on the slurmd processes to be temporarily altered
- Add a new "nodes=" argument to "scontrol setdebugflags" as well.
- Make it so
scrontabprints client-side the job_submit() err_msg (which can be set i.e. by using the log_user() function for the lua plugin). scontrol- Reservations will not be allowed to have STATIC_ALLOC or MAINT flags and REPLACE[_DOWN] flags simultaneouslyscontrol- Reservations will only accept one reoccurring flag when being created or updated.scontrol- A reservation cannot be updated to be reoccurring if it is already a floating reservation.squeue- removed unused '%s' and 'SelectJobInfo' formats.squeue- align print format for exit and derived codes with that of other components (<exit_status>:<signal_number>).sacct- Add --array option to expand job arrays and display array tasks on separate lines.- Partial support for
--jsonand--yamlformated outputs have been implemented forsacctmgr,sdiag,sinfo,squeue, andscontrol. The resultant data ouput will be filtered by normal command arguments. Formatting arguments will continue to be ignored. salloc/sbatch/srun- extended the--nodessyntax to allow for a list of valid node counts to be allocated to the job. This also supports a "step count" value (e.g., --nodes=20-100:20 is equivalent to --nodes=20,40,60,80,100) which can simplify the syntax when the job needs to scale by a certain "chunk" sizesrun- add user requestible vnis with '--network=job_vni' optionsrun- add user requestible single node vnis with the--network=single_node_vnioption- API Changes:
job_containerplugins -container_p_stepd_create()function signature replaceduint32_tuid withstepd_step_rec_t*step.gresplugins -gres_g_get_devices()function signature replacedpid_t pidwithstepd_step_rec_t*step.cgroupplugins -task_cgroup_devices_constrain()function signature removedpid_t pid.taskplugins -replace task_p_pre_set_affinity(),task_p_set_affinity(), andtask_p_post_set_affinity()withtask_p_pre_launch_priv()like it was back in slurm 20.11.- Allow for concurrent processing of
job_submit_g_submit()andjob_submit_g_modify()calls. If your plugin is not capable of concurrent operation you must add additional locking within your plugin. - Removed return value from slurm_list_append().
- The List and ListIterator types have been removed in favor of list_t and list_itr_t respectively.
- burst buffer plugins:
- add
bb_g_build_het_job_script() bb_g_get_status()- added authenticated UID and GIDbb_g_run_script()- added job_info argument
- add
burst_buffer.lua- Pass UID and GID to most hooks. Passjob_info(detailed job information) to many hooks. Seeetc/burst_buffer.lua.examplefor a complete list of changes. WARNING: Backwards compatibility is broken forslurm_bb_get_status: UID and GID are passed before the variadic arguments. If UID and GID are not explicitly listed as arguments toslurm_bb_get_status(), then they will be included in the variadic arguments. Backwards compatibility is maintained for all other hooks because the new arguments are passed after the existing arguments.node_features pluginschanges:node_features_p_reboot_weight()function removed.node_features_p_job_valid()- added parameter feature_list.node_features_p_job_xlate()- added parameters feature_list andjob_node_bitmap
- New
data_parserinterface with v0.0.39 plugin - Test Suite fixes:
- Update README_Testsuite.md
- Clean up left over files when de-installing test suite
- Adjustment to test suite package: for SLE mark the openmpi4 devel package and slurm-hdf5 optional
- Add
-ffat-lto-objectsto the build flags when LTO is set to make sure the object files we ship with the test suite still work correctly. - Improve
setup-testsuite.sh: copy ssh fingerprints from all nodes
pdsh:
- Prepared
pdshfor Slurm 23.02 (jsc#PED-2987) - Fix slurm plugin: make sure slurm_init() is called before using the Slurm API (bsc#1209216)
- Fix regression in Slurm 23.02 breaking the pdsh-internal List type by exposing it thru it's public API (bsc#1208846)
- Backport a number of features and fixes (bsc#1206795):
- Add '-C' option on Slrum plugin to restrict selected nodes to ones with the specified features present
- Add option '-k' to the ssh plugin to fail faster on connection failures
- Fix use of
strchr dshbak: Fix uninitialized use of $tag on empty inputdsh: Release a lock that is no longer used in dsh()
Patch Instructions:
To install this SUSE update use the SUSE recommended
installation methods like YaST online_update or "zypper patch".
Alternatively you can run the command listed for your product:
-
HPC Module 12
zypper in -t patch SUSE-SLE-Module-HPC-12-2023-3860=1
Package List:
-
HPC Module 12 (aarch64 x86_64)
- slurm_23_02-munge-debuginfo-23.02.4-3.7.1
- pdsh-slurm-debuginfo-2.34-7.41.2
- libslurm39-23.02.4-3.7.1
- libpmi0_23_02-debuginfo-23.02.4-3.7.1
- slurm_23_02-plugins-23.02.4-3.7.1
- slurm_23_02-slurmdbd-debuginfo-23.02.4-3.7.1
- pdsh-genders-2.34-7.41.2
- pdsh-slurm_23_02-2.34-7.41.4
- slurm_23_02-plugin-ext-sensors-rrd-debuginfo-23.02.4-3.7.1
- slurm_23_02-auth-none-debuginfo-23.02.4-3.7.1
- slurm_23_02-munge-23.02.4-3.7.1
- slurm_23_02-torque-23.02.4-3.7.1
- slurm_23_02-sview-23.02.4-3.7.1
- pdsh-slurm_20_02-debuginfo-2.34-7.41.2
- libslurm39-debuginfo-23.02.4-3.7.1
- slurm_23_02-node-23.02.4-3.7.1
- libnss_slurm2_23_02-debuginfo-23.02.4-3.7.1
- slurm_23_02-sql-debuginfo-23.02.4-3.7.1
- pdsh-slurm_22_05-debuginfo-2.34-7.41.2
- slurm_23_02-debuginfo-23.02.4-3.7.1
- perl-slurm_23_02-23.02.4-3.7.1
- pdsh-slurm-2.34-7.41.2
- pdsh-slurm_18_08-2.34-7.41.2
- pdsh-slurm_18_08-debuginfo-2.34-7.41.2
- pdsh_slurm_20_11-debugsource-2.34-7.41.2
- pdsh-slurm_20_11-debuginfo-2.34-7.41.2
- pdsh_slurm_22_05-debugsource-2.34-7.41.2
- slurm_23_02-debugsource-23.02.4-3.7.1
- pdsh-machines-debuginfo-2.34-7.41.2
- slurm_23_02-pam_slurm-23.02.4-3.7.1
- pdsh_slurm_20_02-debugsource-2.34-7.41.2
- pdsh-slurm_23_02-debuginfo-2.34-7.41.4
- slurm_23_02-devel-23.02.4-3.7.1
- slurm_23_02-sview-debuginfo-23.02.4-3.7.1
- slurm_23_02-plugins-debuginfo-23.02.4-3.7.1
- pdsh-genders-debuginfo-2.34-7.41.2
- slurm_23_02-sql-23.02.4-3.7.1
- slurm_23_02-torque-debuginfo-23.02.4-3.7.1
- libnss_slurm2_23_02-23.02.4-3.7.1
- pdsh-2.34-7.41.2
- pdsh_slurm_18_08-debugsource-2.34-7.41.2
- slurm_23_02-node-debuginfo-23.02.4-3.7.1
- pdsh-dshgroup-debuginfo-2.34-7.41.2
- slurm_23_02-cray-23.02.4-3.7.1
- slurm_23_02-auth-none-23.02.4-3.7.1
- pdsh-slurm_20_11-2.34-7.41.2
- slurm_23_02-lua-debuginfo-23.02.4-3.7.1
- pdsh-slurm_22_05-2.34-7.41.2
- slurm_23_02-pam_slurm-debuginfo-23.02.4-3.7.1
- slurm_23_02-slurmdbd-23.02.4-3.7.1
- slurm_23_02-lua-23.02.4-3.7.1
- pdsh-netgroup-debuginfo-2.34-7.41.2
- slurm_23_02-cray-debuginfo-23.02.4-3.7.1
- pdsh-debuginfo-2.34-7.41.2
- pdsh-dshgroup-2.34-7.41.2
- libpmi0_23_02-23.02.4-3.7.1
- slurm_23_02-plugin-ext-sensors-rrd-23.02.4-3.7.1
- pdsh-slurm_20_02-2.34-7.41.2
- pdsh-machines-2.34-7.41.2
- slurm_23_02-23.02.4-3.7.1
- pdsh-debugsource-2.34-7.41.2
- perl-slurm_23_02-debuginfo-23.02.4-3.7.1
- pdsh-netgroup-2.34-7.41.2
-
HPC Module 12 (noarch)
- slurm_23_02-config-23.02.4-3.7.1
- slurm_23_02-doc-23.02.4-3.7.1
- slurm_23_02-webdoc-23.02.4-3.7.1
- slurm_23_02-config-man-23.02.4-3.7.1
References:
- https://bugzilla.suse.com/show_bug.cgi?id=1088693
- https://bugzilla.suse.com/show_bug.cgi?id=1206795
- https://bugzilla.suse.com/show_bug.cgi?id=1208846
- https://bugzilla.suse.com/show_bug.cgi?id=1209216
- https://bugzilla.suse.com/show_bug.cgi?id=1209260
- https://bugzilla.suse.com/show_bug.cgi?id=1212946
- https://bugzilla.suse.com/show_bug.cgi?id=1214983
- https://jira.suse.com/browse/PED-2987