How to manually patch a SES cluster

This document (000019793) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Enterprise Storage 5.5
SUSE Enterprise Storage 6

Situation

Customer needs to patch a SES Cluster.
"salt-run state.orch ceph.stage.0" is the recommended method.
Manually patching the cluster is is possible if stage.0 is not a viable way.

Resolution

It is best to patch the cluster, when the cluster is healthy. However there may be situations where the cluster is not healthy and the resolution to getting a healthy cluster involves patching the cluster.

- Make sure all nodes are registered and have access to the SLES/SES repos.
   - "SUSEConnect --status-text"
   - "zypper lr -E"
   - "zypper ref"
   - "zypper lu"

- Patch the nodes in the following order: admin, mon/mgrs, osd's, mds, rgw, igw, ganesha, etc...
- If roles are collocated, the order is the same, but some services will get patch earlier.

- "ceph version" command will provide current version of each ceph daemon.
- "uname -a" command will provide version of running kernel.

Admin node:
- Patch the admin "zypper up" or "zypper patch", then reboot. Ensure the node does boot.
   - If "deepsea" package was updated, then run on the admin node:
       salt '*' saltutil.sync_all
       Note: Its okay to run the command even if deepsea package was not updated. The command will sync deepsea modules across all minions.
   - Validate desired packages were installed. "rpm -qa | egrep 'kernel|ceph|salt|deepsea'"
   - Validate that ceph daemons for this node roles are running. "ceph -s", "ceph osd tree" are good tools. So is systemctl...

Mon nodes:
- Patch one of the mon/mgr nodes with "zypper up" or "zypper patch", then reboot. Ensure the node does boot.
   - Validate desired packages were installed. "rpm -qa | egrep 'kernel|ceph|salt'"
   - Validate that ceph daemons for this node roles are running. "ceph -s", "ceph osd tree" are good tools. So is systemctl...
   - After confirming the node and its services are running, then repeat for each node with mon/mgr role.

OSD nodes:
- After all mon/mgr nodes have been patch and rebooted:
   - Set the noout flag "ceph osd set noout",
       or
       ceph osd add-noout osd.OSD_ID
       For example:
       for i in $(ceph osd ls-tree OSD_NODE_NAME);do echo "osd: $i"; ceph osd add-noout osd.$i; done
       Verify with:
       ceph health detail | grep noout

   - Patch one of the OSD nodes "zypper up" or "zypper patch", then reboot the OSD node. Ensure the node does boot.
   - Validate desired packages were installed. "rpm -qa | egrep 'kernel|ceph|salt'"
   - Validate that ceph daemons for this node roles are running. "ceph -s", "ceph osd tree" are good tools. So is systemctl...
   - If "ceph osd add-noout osd.OSD_ID" was used, then use ceph "osd rm-noout osd.OSD_ID" to remove flag.
       For example:
       for i in $(ceph osd ls-tree OSD_NODE_NAME);do echo "osd: $i"; ceph osd rm-noout osd.$i; done
       Verify with:
       ceph health detail | grep noout
   - After confirming the node and its services are running, then repeat for each node with OSD role.
   - When all OSD nodes have been patch, remove the noout flag "ceph osd unset noout".

MDS Nodes:
- mds nodes will have an active and standby, configuration. Patching the standby node first may be best, as then cephfs only fails over once.
- After all ODS nodes have been patch and rebooted, patch one of the mds nodes "zypper up" or "zypper patch", then reboot the mds nodes. Ensure the node does boot.
   - Validate desired packages were installed. "rpm -qa | egrep 'kernel|ceph|salt'"
   - Validate that ceph daemons for this node roles are running. "ceph -s", "ceph osd tree" are good tools. So is systemctl...
   - After confirming the node and its services are running, then repeat for each node with mds role.

Application nodes:
- Repeat the process for all rgw's, all igw's, all ganesha nodes.

Validate all daemons are running the desired version of ceph with "ceph version" command.

Cause

Cluster needs to be updated to ensure current code is running.

Status

Top Issue

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.