Back to normal – Azure US East

Share
Share

Well, I am happy to report that we are back to normal operation in the Azure US East region as of Wednesday April 27 1:46 A.M. EST. Late last week Azure had network issues and a storage service outage in the US East region. This affected our update servers in this region. The server monitoring in place for the update infrastructure alerted us of the issues and we were ready to take action. However, with the platform having trouble there was little we could do until the underlying issues in Azure were fixed. After the framework was back to regular operation and our systems came back online at first it appeared that things were back to regular operation. However, shortly after the update servers were back online we noticed issues when repositories were being accessed. One of the update servers could not be convinced to provide access at all and needed to be rebuilt from scratch. As the underlying issue was storage service related we also decided to re-sync all the repositories from a known good source. This process takes a long time as we need to pull 200+ GB of data over the network. After the server was rebuilt it returned to regular operation. While this was going on another server fell over leaving us with no working registration mechanism in US East. The second failure can also be traced back to the underlying platform issues. Again, as the platform issue was storage related we did not want to take chances and started from scratch. We preserved the failed systems for additional analysis and hope to learn some lessons to allow us to implement faster recovery mechanisms for the future.

We apologize that it took us a long time after the Azure outage to fully restore our services, but we wanted to be certain that we had no corrupt data that might trigger additional problems. We will work diligently to find ways to improve our resilience against such outages.

We believe that we preserved all the existing registration data, and thus all your instances should receive updates as previously. However should you experience issues during any zypper operation please run

/usr/sbin/registercloudguest –force-new

as root. Once again, apologies for any inconvenience.

Share
(Visited 1 times, 1 visits today)

Leave a Reply

Your email address will not be published. Required fields are marked *

No comments yet

Robert Schweikert
1,942 views