SUSE Enterprise Storage 5.5
Release Notes #
SUSE Enterprise Storage is an extension to SUSE Linux Enterprise. It combines the capabilities from the Ceph storage project (https://ceph.com/) with the enterprise engineering and support of SUSE. SUSE Enterprise Storage provides IT organizations with the ability to deploy a distributed storage architecture that can support a number of use cases using commodity hardware platforms.
Manuals can be found in the docu
directory of the
installation media for SUSE Enterprise Storage. Any documentation (if
installed) can be found in the /usr/share/doc/
directory of the installed system.
- 1 Support Statement for SUSE Enterprise Storage
- 2 Support for Specific Packages
- 3 Technology Previews
- 4 New Features and Known Issues
- 4.1 OpenStack Integration (new in SES 5.5!)
- 4.2 Improved Visualization of the Cluster Rebuild Status (new in SES 5.5!)
- 4.3 Support for Managing RBD Snapshots in openATTIC (new in SES 5.5!)
- 4.4 Support for Samba Gateway for CephFS (new in SES 5.5!)
- 4.5 DeepSea Can Now Deploy AppArmor Profiles for Storage Software (new in SES 5.5!)
- 4.6 Non-SUSE RBD and CephFS Clients Have Been Validated (new in SES 5.5!)
- 5 Changes in Packaging and Delivery
- 6 Ceph-Related Changes
- 6.1 CephFS Provides Multi-Active/Active MDS Capabilities
- 6.2 BlueStore Storage Backend
- 6.3 BlueStore Inline Compression
- 6.4 Overwrite Support for Erasure-Coded Pools
- 6.5 New
ceph-mgr
Daemon - 6.6 CRUSH Device Classes
- 6.7 Upgrading to Device Class CRUSH Maps Using
crushtool
- 6.8 Other Improvements
- 6.9 Upgrade Compatibility
- 7 How to Obtain Source Code
- 8 More Information and Feedback
1 Support Statement for SUSE Enterprise Storage #
Support requires an appropriate subscription from SUSE. For more information, see http://www.suse.com/products/server/.
General Support Statement
The following definitions apply:
L1: Installation and problem determination - technical support designed to provide compatibility information, installation and configuration assistance, usage support, on-going maintenance and basic troubleshooting. Level 1 Support is not intended to correct product defect errors.
L2: Reproduction of problem isolation - technical support designed to duplicate customer problems, isolate problem areas and potential issues, and provide resolution for problems not resolved by Level 1 Support.
L3: Code Debugging and problem resolution - technical support designed to resolve complex problems by engaging engineering in patch provision, resolution of product defects which have been identified by Level 2 Support.
SUSE will only support the usage of original (unchanged or not recompiled) packages.
2 Support for Specific Packages #
This section lists support differences and restrictions for specific packages.
2.1 Support Status of Ceph Manager Modules #
Ceph Manager modules are supported on a best-effort basis only. This affects the following packages:
ceph-mgr
python-influxdb
3 Technology Previews #
Technology previews are packages, stacks, or features delivered by SUSE. These features are not supported. They may be functionally incomplete, unstable or in other ways not suitable for production use. They are mainly included for customer convenience and give customers a chance to test new technologies within an enterprise environment.
Whether a technology preview will be moved to a fully supported package later, depends on customer and market feedback. A technology preview does not automatically result in support at a later point in time. Technology previews can be dropped at any time and SUSE is not committed to providing a technology preview later in the product cycle.
Give your SUSE representative feedback, including your experience and use case.
3.1 Event Monitoring (new in SES 5.5!) #
As a technology preview, SUSE Enterprise Storage 5.5 now brings a fully functional alerting stack and a set of default alerts. Add your preferred alerting channel (as specified in the documentation) and receive cluster alerts right away. The documentation also contains instructions on how to add custom alerts.
For more information, see the documentation at https://www.suse.com/documentation/suse-enterprise-storage-5/book_storage_admin/data/monitoring_alerting.html.
3.2 New DeepSea CLI #
DeepSea and Salt are very powerful, however, previous versions had drawbacks:
The command syntax could be intimidating.
Error messages from Salt can be difficult to understand.
Some DeepSea operations triggered by commands require a longer time to complete. The lack of feedback made it difficult to determine whether DeepSea is still working or has aborted the operation.
With SUSE Enterprise Storage 5, we are introducing a new command-line
interface for DeepSea as a technology
preview. The new interface is designed to address feedback
from customers and testers. The package deepsea-cli
contains a new command: deepsea
allows monitoring or
running stages while visualizing the progress in real-time.
We continue to refine this interface, to make it a better fit for your needs. Therefore, this feature is currently offered as a technology preview. We encourage you to submit feedback about this feature and how it could be changed/improved.
4 New Features and Known Issues #
4.1 OpenStack Integration (new in SES 5.5!) #
DeepSea now includes an openstack.integrate
runner
which will create the necessary storage pools and
cephx
keys for use by OpenStack Glance, Cinder, and
Nova. It also returns a block of configuration data that can be used to
subsequently configure OpenStack. To learn more about this feature, run
the following command on the administration node: salt-run
openstack.integrate -d
.
4.2 Improved Visualization of the Cluster Rebuild Status (new in SES 5.5!) #
The cluster dashboard's placement group (PG) state panel now gives a clear indication of ongoing cluster rebuild activity. Inactive PGs will be displayed in red, giving a clear indication of the rebuilds status.
4.3 Support for Managing RBD Snapshots in openATTIC (new in SES 5.5!) #
openATTIC now includes more management functionality for RADOS block devices (RBD). The actions create, clone, rollback, protect / unprotect, and delete are now supported:
Cloning an RBD snapshot creates a new RBD image by entering a new name for the new clone.
Optionally, you can select different feature set for the new RBD image.
You can also copy RBD images, which can be helpful when several RBD images with the exact same size and feature(s) are needed (but as in he case of cloning, features can be also added).
To avoid accidentally deleting a snapshot of an RBD image, you can mark it as protected.
4.4 Support for Samba Gateway for CephFS (new in SES 5.5!) #
Samba can be deployed to expose the CephFS filesystem to SMB clients such as Windows and macOS. Samba gateway deployments can be standalone or, when combined with CTDB, highly available.
For more information, see https://www.suse.com/documentation/suse-enterprise-storage-5/book_storage_deployment/data/cha_ses_cifs.html.
Warning: Do Not Export a File System via NFS and SAMBA Simultaneously
If you export the same file system via NFS and SAMBA at the same time, you risk data corruption on the exported file system, as there is no cross-protocol file locking.
4.5 DeepSea Can Now Deploy AppArmor Profiles for Storage Software (new in SES 5.5!) #
SUSE Enterprise Storage 5.5 now comes with AppArmor profiles for all Ceph daemons and openATTIC and deployment support. By default, AppArmor is turned off, but DeepSea can install the necessary profiles and set AppArmor to either complain or enforce mode. For more information, see https://www.suse.com/documentation/suse-enterprise-storage-5/book_storage_admin/data/admin_apparmor.html.
4.6 Non-SUSE RBD and CephFS Clients Have Been Validated (new in SES 5.5!) #
The following non-SUSE RBD and CephFS clients have been successfully validated to operate with SES 5.5:
RHEL 6 and 7
Ubuntu 16.04 and 18.04
5 Changes in Packaging and Delivery #
5.1 libradosstriper Has Been Deprecated #
libradosstriper
is no longer part of the recommended
and supported Ceph interfaces upstream.
SUSE Enterprise Storage 4 and earlier already did not utilize or
advertise libradosstriper
. Aligning with upstream
development, it was deprecated in SUSE Enterprise Storage 5 and removed
in SUSE Enterprise Storage 6.
5.2 SES Crowbar Barclamp for Non-SOC Deployments Has Been Removed #
SES Crowbar Barclamp is no longer part of the SES product.
SES installation via Crowbar for non-SUSE OpenStack Cloud (SOC) deployments has been removed.
6 Ceph-Related Changes #
6.1 CephFS Provides Multi-Active/Active MDS Capabilities #
In SUSE Enterprise Storage 5, CephFS provides multi-active/active MDS capabilities to improve scalability and performance.
6.2 BlueStore Storage Backend #
The BlueStore backend for
ceph-osd
is now stable and the new default for newly
created OSDs. It provides increased performance and features.
BlueStore manages data stored on each OSD by directly managing the physical HDDs or SSDs without the use of an intervening file system like XFS. This provides greater performance and more features. BlueStore supports full data and metadata checksums of all data stored by Ceph.
6.3 BlueStore Inline Compression #
To increase data density as it is stored to OSDs,
BlueStore supports inline
compression using zlib
or snappy
.
Ceph also supports zstd
for RGW (RADOS Gateway)
compression, but for performance reasons, zstd
is not
recommended for BlueStore.
6.4 Overwrite Support for Erasure-Coded Pools #
With SES 5, it is now possible to use erasure-coded (EC) pools with RBD and CephFS. Recommended use cases for this feature are scenarios that have low performance requirements and infrequent random access, for example, cold storage, backups, or archiving.
Erasure coding is only supported for data pools. Metadata pools must use replication.
To use CephFS with an erasure-coded pool, the OSDs used for that pool
must run on BlueStore and the pool must have the
allow_ec_overwrite
option set. This option can be set
by running ceph osd pool set ec_pool allow_ec_overwrites
true
.
Erasure coding adds a significant overhead to file system operations, especially small updates. This overhead is inherent to using erasure coding as a fault tolerance mechanism. This penalty is the trade-off for a significantly reduced storage space overhead.
6.5 New ceph-mgr
Daemon #
To provide a single source of cluster metrics, there is the new daemon
ceph-mgr
.
ceph-mgr
is a required part of
any Ceph deployment. Although I/O can continue when
ceph-mgr
is down, metrics will
not refresh and some metrics-related calls may block (for example,
ceph df
). For reliability, SUSE recommends deploying
several instances of ceph-mgr
.
When upgrading from SUSE Enterprise Storage 3 or SUSE Enterprise
Storage 4 using DeepSea,
ceph-mgr
roles will be created
automatically on all existing ceph-mon
nodes.
6.6 CRUSH Device Classes #
To provide an easy way to specify CRUSH rules that apply only to hard disks or SSDs, each OSD can now have a device class associated with it: for example, “hard disk” or “SSD”.
This allows for CRUSH rules to map data to a subset of devices in the system. Manually writing CRUSH rules or manual editing of the CRUSH map is normally not required.
6.7 Upgrading to Device Class CRUSH Maps Using crushtool
#
crushtool
, part of the Ceph tools, now has
functionality to transition from older CRUSH maps that maintain parallel
hierarchies for OSDs of different types to a modern CRUSH map that makes
use of the device class feature
(“reclassification”).
For more information, see:
SUSE Enterprise Storage Administration Guide, Chapter “Stored Data Management”, Section “Devices”, Section “Migrating from a Legacy SSD Rule to Device Classes” at https://www.suse.com/documentation/suse-enterprise-storage-5/singlehtml/book_storage_admin/book_storage_admin.html#device_classes.reclassify
The upstream documentation at http://docs.ceph.com/docs/luminous/rados/operations/crush-map-edits/#migrating-from-a-legacy-ssd-rule-to-device-classes
6.8 Other Improvements #
The following other improvements to Ceph were also made:
Ceph now defaults to the AsyncMessenger (
ms
typeasync
) instead of the legacy SimpleMessenger. The most noticeable difference is that Ceph now uses a fixed-sized thread pool for network connections instead of two threads per socket with SimpleMessenger.Some OSD failures are now detected almost immediately, whereas previously the heartbeat timeout (which defaults to 20 seconds) had to expire. This prevents I/O from blocking for an extended period for failures where the host remains up but the
ceph-osd
process is no longer running.The size of encoded OSDMaps has been reduced.
The OSDs now quiesce scrubbing when recovery or rebalancing is in progress.
For Luminous clients, there is a new
upmap
exception mechanism that allows individual Placement Groups (PGs) to be moved around to achieve a perfect distribution.Each OSD now adjusts its default configuration based on whether the backing device is an hard disk or SSD. Manual tuning is generally not required.
There is now a back-off mechanism that prevents OSDs from being overloaded by requests to objects or PGs that currently cannot process IO.
There is a simplified OSD replacement process that is more robust.
You can query the supported features and (apparent) releases of all connected daemons and clients with Ceph features.
You can configure the oldest Ceph client version you want to allow to connect to the cluster via
ceph osd set-require-min-compat-client
. Ceph will then prevent you from enabling features that will break compatibility with those clients.Several sleep settings, including
osd_recovery_sleep
,osd_snap_trim_sleep
, andosd_scrub_sleep
have been reimplemented to work more efficiently.Pools are now expected to be associated with the application using them. When upgrading from SES 3 or 4, the cluster will attempt to associate existing pools to known applications (that is, CephFS, RBD, and RGW).
In-use pools that are not associated with an application will generate a health warning. Unassociated pools can be manually associated using the new Ceph OSD pool application enable command.
The maximum number of Placement Groups per OSD before the monitor issues a warning has been reduced from 300 to 200 PGs. 200 is still twice the generally recommended target of 100 PGs per OSD. This limit can be adjusted via the
mon_max_pg_per_osd
option on the monitors. The oldermon_pg_warn_max_per_osd
option has been removed.Creating pools or adjusting
pg_num
will now fail if the change would make the number of PGs per OSD exceed the configuredmon_max_pg_per_osd
limit. If it is necessary to create a pool with more PGs, this option can be adjusted.RGW metadata search backed by ElasticSearch now supports end user requests service via RGW itself, and also supports custom metadata fields. A set of RESTful APIs was created, so users can search objects by their metadata. New APIs that allow control of custom metadata fields were also added.
RGW now supports dynamic bucket index sharding. This is now enabled by default: RGW will now automatically reshard the bucket index when the index grows beyond
rgw_max_objs_per_shard
. Hence, as the number of objects in a bucket grows, RGW will automatically reshard the bucket index in response. No user intervention or bucket size capacity planning is required.RGW introduces server-side encryption of uploaded objects with three options for the management of encryption keys:
Automatic encryption (only recommended for test setups)
Customer-provided keys similar to the Amazon SSE-C specification
Use of an external key management service (OpenStack Barbican) similar to the Amazon SSE-KMS specification
S3 Object Tagging API has been added: APIs for GET/PUT/DELETE object tags and the PUT object API are supported. However, there is no support for tags on Policies and Lifecycle yet.
RGW multisite now supports for enabling or disabling sync at a bucket level.
RGW now supports the S3 multipart object copy-part API.
You can now reshard an existing RGW bucket offline. Offline bucket resharding currently requires that all I/O (especially writes) to the specific bucket is quiesced.
RGW now supports data compression for objects.
RGW: the embedded Civetweb has been upgraded to version 1.8.
RGW: The Swift static website API is now supported (S3 support has been added previously).
RGW: S3 bucket lifecycle API has been added. Note that it currently only supports object expiration.
RGW: Support for custom search filters has been added to the LDAP authentication implementation.
Support for NFS version 3 has been added to the RGW NFS gateway.
A Python binding has been created for
librgw
.RBD now has full, stable support for erasure coded pools via the new
--data-pool
option torbd create
.RBD mirroring’s
rbd-mirror
daemon is now highly available. For reliability, SUSE recommends deploying several instances ofrbd-mirror
.The name of the default pool used by the
rbd
CLI when no pool is specified can be overridden using therbd default pool = POOL_NAME
option.Initial support for deferred image deletion via new
rbd trash
CLI commands. Images, even ones actively in-use by clones, can be moved to the trash and deleted at a later time.New pool-level rbd mirror pool promote and
rbd mirror pool demote
commands to batch promote/demote all mirrored images within a pool.Mirroring now optionally supports a configurable replication delay via the
rbd mirroring replay delay = SECONDS
configuration option.Improved discard handling when the object map feature is enabled.
rbd
CLI import and copy commands now detect sparse and preserve sparse regions.RBD Images and Snapshots will now include a creation timestamp.
Specifying user authorization capabilities for RBD clients has been simplified. The general syntax for using RBD capability profiles is:
mon 'profile rbd' osd 'profile rbd[-read-only][pool=POOL-NAME[, …]]'
The
rbd-mirror
daemon now supports replicating dynamic image feature updates and image metadata key/value pairs from the primary image to the non-primary image.The number of RBD image snapshots can be optionally restricted to a configurable maximum.
The RBD Python API now supports asynchronous IO operations.
CLI: the
ceph -s
/ceph status
command has a fresh look.CLI:
ceph mgr metadata
will dump metadata associated with eachceph-mgr
daemon.CLI:
ceph versions
/ceph osd,mds,mon,mgr versions
summarize the version numbers of running daemons.CLI:
ceph osd,mds,mon,mgr count-metadata PROPERTY
tabulates any other daemon metadata visible via theceph osd,mds,mon,mgr metadata
commands.CLI:
ceph features
summarizes features and releases of connected clients and daemons.CLI:
ceph osd require-osd-release RELEASE
replaces the oldrequire_RELEASE_osds
flags.CLI:
ceph osd pg-upmap
,ceph osd rm-pg-upmap
,ceph osd pg-upmap-items
,ceph osd rm-pg-upmap-items
can explicitly manageupmap
items.CLI:
ceph osd getcrushmap
returns a CRUSH map version number onstderr
, andceph osd setcrushmap VERSION
will only inject an updated CRUSH map if the version matches.This allows CRUSH maps to be updated offline and then reinjected into the cluster without fear of clobbering racing changes. For example, by newly added OSDs or changes by other administrators.
CLI:
ceph osd create
has been replaced byceph osd new
. This should be hidden from most users by user-facing tools likeceph-disk
and DeepSea.CLI:
ceph osd destroy
will mark an OSD destroyed and remove itscephx
andlockbox
keys. However, the OSD ID and CRUSH map entry will remain in place. This allows reusing the ID by a replacement device with minimal data rebalancing.CLI:
ceph osd purge
will remove all traces of an OSD from the cluster, including itscephx
encryption keys,dm-crypt
lockbox
keys, OSD ID, and CRUSH map entry.CLI:
ceph osd ls-tree NAME
will output a list of OSD IDs under the given CRUSH name (like a host or rack name). This is useful for applying changes to entire subtrees. For example:ceph osd down $(ceph osd ls-tree rack1)
CLI:
ceph osd add,rm-noout,noin,nodown,noup
allow applying thenoout
,noin
,nodown
, andnoup
flags to specific OSDs.CLI:
ceph osd safe-to-destroy OSD(s)
will report whether it is safe to remove or destroy OSD(s) without reducing data durability or redundancy.CLI:
ceph osd ok-to-stop OSD(s)
will report whether it is okay to stop OSD(s) without immediately compromising availability (that is, all PGs should remain active but may be degraded).CLI:
ceph log last N
will output the last N lines of the cluster log.CLI:
ceph mgr dump
will dump the MgrMap, including the currently activeceph-mgr
daemon and any standbys.CLI:
ceph mgr module ls
will list activeceph-mgr
modules.CLI:
ceph mgr module enable,disable NAME
will enable or disable the namedceph-mgr
module. The module must be present in the configuredmgr_module_path
on the host(s) whereceph-mgr
is running.CLI:
ceph osd crush ls NODE
will list items (OSDs or other CRUSH nodes) directly beneath a given CRUSH node.CLI:
ceph osd crush swap-bucket SRC DEST
will swap the contents of two CRUSH buckets in the hierarchy while preserving the bucket's IDs. This allows an entire subtree of devices to be replaced without disrupting the distribution of data across neighboring devices. For example, this can be used when replacing an entire host of FileStore OSDs with newly-imaged BlueStore OSDs.CLI:
ceph osd set-require-min-compat-client RELEASE
configures the oldest client release the cluster is required to support. Other changes, like CRUSH tunables, will fail with an error if they would violate this setting. Changing this setting also fails if clients older than the specified release are currently connected to the cluster.CLI:
ceph config-key dump
dumps config-key entries and their contents. The existingceph config-key list
only dumps the key names, not the values.CLI:
ceph config-key list
is deprecated in favor ofceph config-key ls
.CLI:
ceph config-key put
is deprecated in favor ofceph config-key set
.CLI:
ceph auth list
is deprecated in favor ofceph auth ls
.CLI:
ceph osd crush rule list
is deprecated in favor ofceph osd crush rule ls
.CLI:
ceph osd set-full,nearfull,backfillfull-ratio
sets the cluster-wide ratio for various full thresholds: Respectively, when the cluster refuses I/O, when the cluster warns about being close to full, and when an OSD will defer rebalancing a PG to itself.CLI:
ceph osd reweightn
will specify the reweight values for multiple OSDs in a single command. This is equivalent to a series ofceph osd reweight
commands.CLI:
ceph osd crush set,rm-device-class
manage the new CRUSH device class feature. Manually creating or deleting a device class name is generally not necessary as it will be smart enough to be self-managed.ceph osd crush class ls
andceph osd crush class ls-osd
will output all existing device classes and a list of OSD IDs under the given device class respectively.CLI:
ceph osd crush rule create-replicated
replaces theceph osd crush rule create-simple
command to create a CRUSH rule for a replicated pool. It takes a class argument for the device class the rule should target (for example, hard disk or SSD).CLI:
ceph tell DAEMON help
will now return a usage summary.CLI:
ceph fs authorize
creates a new client key with caps automatically set to access the given CephFS file system.CLI: The
ceph health
structured output (JSON or XML) no longer containstimechecks
section describing the time sync status. This information is now available via theceph time-sync-status
command.CLI: Certain extra fields in the
ceph health
structured output that used to appear if the monitors were low on disk space (which duplicated the information in the normal health warning messages) are now gone.CLI: The
ceph -w
output no longer contains audit log entries by default. Add a--watch-channel=audit
or--watch-channel=*
to see them.New
ceph -w
behavior: Theceph -w
output no longer contains I/O rates, available space, PG info, etc. because these are no longer logged to the central log (which is whatceph -w
shows). The same information can be obtained by runningceph pg stat
.Alternatively, I/O rates per pool can be determined using
ceph osd pool stats
. These commands do not self-update likeceph -w
did. However, they have the ability to return formatted output by providing a--format=FORMAT
option.CLI: Added new commands
pg force-recovery
andpg force-backfill
. Use them to boost recovery or backfill priority of specified PGs, so they are recovered/backfilled before any other. These commands do not interrupt ongoing recovery/backfill, but merely queue specified PGs before others, so they are recovered/backfilled as soon as possible.The new commands
pg cancel-force-recovery
andpg cancel-force-backfill
restore default recovery/backfill priority of previously forced PGs.
6.9 Upgrade Compatibility #
The changes listed in the following affect upgrades from SUSE Enterprise Storage 3/SUSE Enterprise Storage 4 (Ceph Jewel release) to SUSE Enterprise Storage 5 (Ceph Luminous release).
The
osd crush location
configuration option is no longer supported. Update yourceph.conf
to use the crushlocation
configuration option instead. To avoid movement of OSDs from a custom location to the default one, make sure to update your configuration file.OSDs now avoid starting new scrubs while recovery is in progress. To revert to the old behavior in which the recovery activity does not affect scrub scheduling, set the following option:
osd scrub during recovery = true
The list of monitor hosts/addresses for building the monmap can now be obtained from DNS SRV records. The service name used when querying the DNS is defined in the
mon_dns_srv_name
configuration option which defaults toceph-mon
.The
osd class load list
configuration option is a list of object class names that the OSD is permitted to load (or*
for all classes). By default, it contains all existing in-tree classes for backward compatibility.The
osd class default list
configuration option is a list of object class names (or*
for all classes) that can be invoked by clients that have only capabilities*
,x
,class-read
, orclass-write
. For backward compatibility, by default, it contains all existing in-tree classes. Invoking classes not listed inosd class default list
requires a capability naming the class. For example,allow class foo
.The
rgw rest getusage op compat
configuration option allows dumping (or not dumping) the description of user statistics in the S3 GetUsage API. This option defaults tofalse
. If the value istrue
, the reponse data forGetUsage
looks like:"stats": { "TotalBytes": 516, "TotalBytesRounded": 1024, "TotalEntries": 1 }
If the value is
false
, the reponse forGetUsage
looks as it did before:{ 516, 1024, 1 }
The
osd out
andosd in
commands now preserve the OSD weight. That is, after marking an OSD out and then in, the weight will be the same as before (instead of being reset to 1.0). Previously, the monitors would only preserve the weight if the monitor automatically marked an OSD out and then in, but not when an administrator did so explicitly.The
ceph osd perf
command will displaycommit_latency(ms)
andapply_latency(ms)
. Previously, the names of these two columns werefs_commit_latency(ms)
andfs_apply_latency(ms)
. The prefixfs_
has been removed because these values are not specific to FileStore.Monitors will no longer allow pools to be removed by default. Monitors will only allow pools to be removed, when the setting
mon_allow_pool_delete
istrue
(defaults tofalse
). This is an additional safeguard against pools being removed by accident.If you have manually specified the monitor user
rocksdb
via themon keyvaluedb = rocksdb
option, you will need to manually add a file to the monitor's data directory to preserve this option:echo rocksdb > /var/lib/ceph/mon/ceph-HOSTNAME/kv_backend
New monitors will now use
rocksdb
by default, but if that file is not present, existing monitors will useleveldb
. Themon keyvaluedb
option now only affects the back-end chosen when a monitor is created.The
osd crush initial weight
option allows specifying a CRUSH weight for a newly added OSDs. However, the meaning of value0
has changed:A value of
0
(the old default) means the OSD is given a weight of Previously, a value of0
meant that the OSD's weight would be based on its size.A negative value, such as
-1
(the new default), means that the OSD weight will be based on its size.
If your configuration file explicitly specifies a value of
0
for this option, change it to a negative value (for example,-1
) to preserve the current behavior.The
jerasure
andshec
plug-ins can now detect the SIMD instruction set at runtime and no longer need to be explicitly configured for different processors. Instead of processor-specific plug-ins, usejerasure
orshec
instead. The following plugins are now deprecated:jerasure_generic
jerasure_sse3
jerasure_sse4
jerasure_neon
shec_generic
shec_sse3
shec_sse4
shec_neon
Using any of these plug-ins directly will create a warning in the monitor's log file.
Calculation of recovery priorities has been updated. Previously, this could lead to unintuitive recovery prioritization during cluster upgrades. In case of such a recovery: old-version OSDs would operate on different priority ranges than new ones. After the upgrade, the cluster will operate on consistent values.
The configuration option
osd pool erasure code stripe width
has been replaced byosd pool erasure code stripe unit
, and given the ability to be overridden by the erasure code profile settingstripe_unit
. For more information, see http://docs.ceph.com/docs/master/rados/operations/erasure-code-profile.RBD and CephFS can use erasure coding with BlueStore. This can be enabled for a pool by setting
allow_ec_overwrites
totrue
.This relies on checksumming of BlueStore to do deep scrubbing. Therefore, enabling this on a pool stored on FileStore is not allowed.
The JSON output of
rados df
now prints numeric values as numbers instead of strings.The
mon_osd_max_op_age
option has been renamed tomon_osd_warn_op_age
(default: 32 seconds), to indicate that a warning is generated at this age.There is also a new option that controls when an error is generated: The option
mon_osd_err_op_age_ratio
is a expressed as a multiple ofmon_osd_warn_op_age
(default:128
, or around 60 minutes).The default maximum size for a single RADOS object has been reduced from 100 GB to 128 MB. The 100 GB limit was completely impractical in practice while the 128 MB limit is high but reasonable.
If you have an application written directly to librados that is using objects larger than 128 MB, you can adjust the default value with
osd_max_object_size
.Whiteout objects are objects which logically do not exist and which will return
ENOENT
if you try to access them. Previously, whiteout objects only occurred in cache tier pools. With SUSE Enterprise Storage 5 (Ceph Luminous), logically deleted but snapshot objects now also result in a whiteout object.Whiteout objects can lead to confusion, because they are listed by
rados ls
and in object listings from librados but are otherwise inaccessible. To determine which objects are actually snapshots, userados listsnaps
.The deprecated
crush_ruleset
property has been removed. For the commandsosd pool get
andosd pool set
, usecrush_rule
instead.The option
osd pool default crush replicated ruleset
has been removed and replaced by theosd pool default crush rule
option. By default, it is set to-1
. This means, the monitor will pick the first type of replicated rule in the CRUSH map for replicated pools.Erasure-coded pools have rules that are automatically created for them if they are not specified when creating the pool.
SUSE recommends against using Btrfs with FileStore. To upgrade to SUSE Enterprise Storage 5 (Ceph luminous), If you are using Btrfs-based OSDs and want to upgrade to luminous, add the following to your
ceph.conf
:enable experimental unrecoverable data corrupting features = btrfs
SUSE recommends moving these OSDs to FileStore with XFS or BlueStore.
The
ruleset-*
properties for the erasure code profiles have been renamed tocrush-*
. This was done to move away from the obsolete termruleset
and to be more clear about their purpose. There is also a new optionalcrush-device-class
property to specify a CRUSH device class to use for the erasure-coded pool.Existing erasure-code profiles will be converted automatically when the upgrade completes (that is, when
ceph osd require-osd-release luminous
is run). However, provisioning tools that create erasure-coded pools need to be updated.The structure of the XML output for
osd crush tree
has changed slightly to better match the output ofosd tree
. The top-level structure is nownodes
instead ofcrush_map_roots
.When assigning a network to the public network and not to the cluster network, the network specification of the public network will be used for the cluster network as well.
Previously, this would lead to cluster services being bound to
0.0.0.0:PORT
, thus making the cluster service even more publicly available than the public services. When only specifying a cluster network, the public services will still bind to0.0.0.0
.Previously, if a client sent an operation to the wrong OSD, the OSD would reply with
ENXIO
. The rationale here is that the client or OSD is buggy and the error should be made clear.With SUSE Enterprise Storage 5 (Ceph Luminous), the reply
ENXIO
will only be sent if the optionosd_enxio_on_misdirected_op
option is enabled (off by default). This means that a VM using librbd that previously would have received anEIO
and gone read-only will now see a blocked/hung IO instead.The configuration option
journaler allow split entries
has been removed.The configuration option
mon_warn_osd_usage_min_max_delta
has been removed and the associated health warning has been disabled. This option did not take into account clusters undergoing recovery or CRUSH rules that do not target all devices in the cluster.A new configuration option
public bind addr
was added to support dynamic environments like Kubernetes. When set, the Ceph MON daemon could bind locally to an IP address and advertise a different IP address (public addr
) on the network.
CephFS#
In CephFS, using multiple active MDS daemons is now considered stable. The number of active MDS servers can be adjusted up or down on an active CephFS file system.
CephFS directory fragmentation is now stable and enabled by default on new file systems. To enable it on existing file systems use
ceph fs set FS_NAME allow_dirfrags
.Large or very busy directories are sharded and (potentially) distributed across multiple MDS daemons automatically.
CephFS directory subtrees can be explicitly pinned to specific MDS daemons in cases where the automatic load balancing is not desired or effective.
Client keys can now be created using the new
ceph fs authorize
command to create keys with access to the given CephFS file system and all of its data pools.libcephfs
function definitions have been changed to enable proper UID/GID control. The library version has been increased to reflect the interface change.CephFS: Standby replay MDS daemons now consume less memory on work loads doing deletions.
CephFS:
scrub
now repairs backtrace, and populates damage ls with discovered errors.CephFS: A new
pg_files
subcommand tocephfs-data-scan
can identify files affected by a damaged or lost RADOS PG.CephFS: False-positive
failing to respond to cache pressure
warnings have been fixed.CephFS: Limiting MDS cache via a memory limit is now supported using the new
mds_cache_memory_limit
configuration option (1 GB by default). A cache reservation can also be specified usingmds_cache_reservation
as a percentage of the limit (5 % by default).Limits by inode count are still supported using
mds_cache_size
. Settingmds_cache_size
to0
(the default) disables the inode limit.When configuring ceph-fuse mounts in
/etc/fstab
, you can use the new syntaxceph.ARGUMENT=VALUE
in the options column instead of putting configuration in the device column. However, the previous syntax style still works.CephFS clients without the
p
flag in their authentication capability string can no longer set quotas or layout fields. Previously, this flag only restricted modification of the pool and namespace fields in layouts.CephFS will generate a health warning if you have fewer standby daemons than it thinks you should have. By default, this will be
1
if you ever had a standby, and0
if you did not. You can customize this usingceph fs set FS standby_count_wanted NUMBER
. Setting it to0
will disable the health check.The
ceph mds tell
command has been removed. It has been superseded byceph tell mds.ID
.The apply mode of
cephfs-journal-tool
has been removed.
librados#
For information about changes in librados, see the upstream release notes at http://docs.ceph.com/docs/master/release-notes.
7 How to Obtain Source Code #
This SUSE product includes materials licensed to SUSE under the GNU General Public License (GPL). The GPL requires SUSE to provide the source code that corresponds to the GPL-licensed material. The source code is available for download at http://www.suse.com/download-linux/source-code.html. Also, for up to three years after distribution of the SUSE product, upon request, SUSE will mail a copy of the source code. Requests should be sent by e-mail to mailto:sle_source_request@suse.com or as otherwise instructed at http://www.suse.com/download-linux/source-code.html. SUSE may charge a reasonable fee to recover distribution costs.
8 More Information and Feedback #
Read the READMEs on the media.
Get detailed changelog information about a particular package from the RPM:
rpm --changelog -qp <FILENAME>.rpm
<FILENAME>. is the name of the RPM.
Check the
ChangeLog
file in the top level of first medium for a chronological log of all changes made to the updated packages.Find more information in the
docu
directory of first medium of the SUSE Enterprise Storage media. This directory includes a PDF version of the SUSE Enterprise Storage Administration Guide.http://www.suse.com/documentation/ses/ contains additional or updated documentation for SUSE Enterprise Storage.
Visit http://www.suse.com/products/ for the latest product news from SUSE and http://www.suse.com/download-linux/source-code.html for additional information on the source code of SUSE Linux Enterprise products.
Copyright © 2015- 2019 SUSE LLC. This release notes document is licensed under a Creative Commons Attribution-ShareAlike 3.0 United States License (CC-BY-SA-3.0 US, https://creativecommons.org/licenses/by-sa/3.0/us/).
Thanks for using SUSE Enterprise Storage in your business.
The SUSE Enterprise Storage Team.