In German there is a saying “Erstens kommt es anders und zweitens als man denkt” which roughly translates to “Things often don’t turn out as planned”. In a rehash to the events that happened over and just before the weekend of September 27 let me try and provide some insight into why we are were we are today.
For a while now the Public Cloud Development Team has been working on a new setup for our update infrastructure to move from a custom “per cloud provider” approach to a design that is portable to many cloud frameworks, including private cloud installations. This new set up streamlines maintenance and primarily relies on products available from SUSE rather than having custom implementations of part of the functionality of these products in the public cloud. Additionally SUSE Linux Enterprise Server 12 is just around the corner and the integration of this new version along with all the changes to the backend update infrastructure made the release of SUSE Linux Enterprise 12 without a switch in the update infrastructure in Amazon EC2 doubtful. Therefore, putting the new design into place has been a priority for some time. The infrastructure has been running for a while and we have done testing to ensure a good experience. Plans for the transition were just beginning to emerge with a transition procedure having already been developed. The plan was to communicate these changes to the update infrastructure, then release new images that take advantage of this new setup and release the upgrade procedure for running instances. All this was planned to occur in a nice gentle fashion over an extended period of time that would allow our SUSE Linux Enterprise users in Amazon to transition at their own pace, while we would maintain the existing and the new infrastructure in parallel for at least 6 month, maybe longer, depending on feedback.
The above represents the state, and plans, as of Thursday September 25, 2014. After this we ended up Shellshocked. As with any security vulnerability this was taken very seriously at SUSE and the initial fix was released on the day of the disclosure, i.e. on Thursday the 25th of September, with a follow up fix starting to push to the mirrors on Sunday September 28, 2014. Dealing with the bash patches in and of itself would have been more or less routine. Although, the hype generated in such publicized cases always creates a little extra anxiety. The routine in this case includes verification that the updated packages from the SUSE mirrors show up in the update infrastructure, verify that these can be pulled on instances, as well as build and release new images with the fixes already included. This “routine” work generally takes a few hours, as is demonstrated by the availability of new images in EC2 that already contain the follow up fix to the initial disclosure. For more details about the vulnerabilities see , , , .
However, to add a little more fun to the mix on Friday September 26th, changes to the back-end mirror infrastructure occurred that should have theoretically had no effect on existing update servers. But well, that was not the case. The changes to the back-end pretty much left us dead in the water with respect to updates. Fixing the new update infrastructure was time consuming but technically not very difficult. Therefore, this was done as quickly as the fingers on the keyboard and the running machines allowed. Why the changes to the mirror back-end broke the existing update infrastructure is as of the time of this writing unknown and will take some time to unravel. At this point on Friday we had the shellshock patches but could no longer get them out to our users, not a very pleasant feeling from this end and I am certain very frustrating on the other side. Therefore, the most speedy way forward was to press the new infrastructure into service, otherwise the patch would not be available to anyone for some time to come, an unacceptable condition. This brings us to the state we have today in EC2.
We have new images in the market place that contain the fixes for CVE-2014-6271 and images containing the fixes for the other bash vulnerabilities have been published but are not yet integrated in Amazon Marketplace (See the list at the end for AMI IDs). These images, the once released on Friday night and the images released on Sunday, use the new update infrastructure. They also bring along cloud-init as the initialization framework, rather than a home cooked framework that was developed before cloud-init existed and, most notably a change in the default user setting. The default login user is now the “ec2-user”, as requested by Amazon, rather than being “root” in previous images. For running instances the change procedure to upgrade an instance to use the new infrastructure was also pressed into service. The process is very simple, as root execute:
1.) wget http://ec2-54-197-240-216.compute-1.amazonaws.com/instanceInfraUpgrade.noarch.rpm
2.) zypper –non-interactive in instanceInfraUpgrade.noarch.rpm
In order to get the bash fixes, instances need to use the new EC2 update infrastructure provided by SUSE. This is accomplished using the procedure above on running instances. Instances started after Saturday September 27, 2:00 P.M. EDT already have the shellshock fixes integrated and also use the new infrastructure.
The good news is that the switch to the new infrastructure is basically completed. The not so good news is that it all happened without proper announcements and communication to you our users. Believe me, from my side I felt like I was in the “A Series of Unfortunate Events” movie as every time I turned around something that was supposed to be more or less routine reared yet another ugly head that needed to be addressed.
Apologies for the inconvenience, sometimes even the best laid out plans fall apart.