How to manually patch a SES cluster

This document (000019793) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Enterprise Storage 5.5
SUSE Enterprise Storage 6

Situation

Customer needs to patch a SES Cluster. 
"salt-run state.orch ceph.stage.0" is the recommended method.  
Manually patching the cluster is is possible if stage.0 is not a viable way. 

Resolution

It is best to patch the cluster, when the cluster is healthy. However there may be situations where the cluster is not healthy and the resolution to getting a healthy cluster involves patching the cluster.  

- Make sure all nodes are registered and have access to the SLES/SES repos.
    - "SUSEConnect --status-text"
    - "zypper lr -E"
    - "zypper ref" 
    - "zypper lu"
    
- Patch the nodes in the following order: admin, mon/mgrs, osd's, mds, rgw, igw, ganesha, etc... 
- If roles are collocated, the order is the same, but some services will get patch earlier.

- "ceph version" command will provide current version of each ceph daemon. 
- "uname -a" command will provide version of running kernel.

Admin node:
- Patch the admin "zypper up" or "zypper patch", then reboot.  Ensure the node does boot.
    - If "deepsea" package was updated, then run on the admin node:
        salt '*' saltutil.sync_all
        Note: Its okay to run the command even if deepsea package was not updated.  The command will sync deepsea modules across all minions.
    - Validate desired packages were installed. "rpm -qa | egrep 'kernel|ceph|salt|deepsea'"    
    - Validate that ceph daemons for this node roles are running.  "ceph -s", "ceph osd tree" are good tools. So is systemctl...

Mon nodes:
- Patch one of the mon/mgr nodes with "zypper up" or "zypper patch", then reboot. Ensure the node does boot.
    - Validate desired packages were installed. "rpm -qa | egrep 'kernel|ceph|salt'"
    - Validate that ceph daemons for this node roles are running.  "ceph -s", "ceph osd tree" are good tools. So is systemctl...
    - After confirming the node and its services are running, then repeat for each node with mon/mgr role.

OSD nodes:
- After all mon/mgr nodes have been patch and rebooted: 
    - Set the noout flag "ceph osd set noout", 
        or 
        ceph osd add-noout osd.OSD_ID
        For example:
        for i in $(ceph osd ls-tree OSD_NODE_NAME);do echo "osd: $i"; ceph osd add-noout osd.$i; done
        Verify with:
        ceph health detail | grep noout
        
    - Patch one of the OSD nodes "zypper up" or "zypper patch", then reboot the OSD node. Ensure the node does boot.
    - Validate desired packages were installed. "rpm -qa | egrep 'kernel|ceph|salt'"    
    - Validate that ceph daemons for this node roles are running.  "ceph -s", "ceph osd tree" are good tools. So is systemctl...
    - If "ceph osd add-noout osd.OSD_ID" was used, then use ceph "osd rm-noout osd.OSD_ID" to remove flag.
        For example:
        for i in $(ceph osd ls-tree OSD_NODE_NAME);do echo "osd: $i"; ceph osd rm-noout osd.$i; done
        Verify with:
        ceph health detail | grep noout
    - After confirming the node and its services are running, then repeat for each node with OSD role.
    - When all OSD nodes have been patch, remove the noout flag "ceph osd unset noout".
    
MDS Nodes:
- mds nodes will have an active and standby, configuration. Patching the standby node first may be best, as then cephfs only fails over once.
- After all ODS nodes have been patch and rebooted, patch one of the mds nodes "zypper up" or "zypper patch",  then reboot the mds nodes. Ensure the node does boot.
    - Validate desired packages were installed. "rpm -qa | egrep 'kernel|ceph|salt'"
    - Validate that ceph daemons for this node roles are running.  "ceph -s", "ceph osd tree" are good tools. So is systemctl...
    - After confirming the node and its services are running, then repeat for each node with mds role.

Application nodes:
- Repeat the process for all rgw's, all igw's, all ganesha nodes.   

Validate all daemons are running the desired version of ceph with "ceph version" command.  

Cause

Cluster needs to be updated to ensure current code is running.

Status

Top Issue

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000019793
  • Creation Date: 23-Nov-2020
  • Modified Date:23-Nov-2020
    • SUSE Enterprise Storage

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.


SUSE Customer Support Quick Reference Guide SUSE Technical Support Handbook Update Advisories
Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Go to Customer Center