It was a dark and stormy day in early November, when the SUSE security team was invited to a heavily NDAed phone conference. During this call, they were briefed about three vulnerabilities involving speculative execution features of various CPUs.
What followed were many intense weeks of preparing the mitigating patches for what became known to the public as Meltdown and Spectre. Our kernel teams were heavily involved in a joint effort of the upstream kernel community, engineers from RedHat, from Intel and other CPU vendors, to create the PTI patch set (addressing Meltdown) and a variety of other changes designed to protect against Spectre.
This type of collaborative effort is Open Source at its best. This is what makes Linux so successful.
However, our work did not stop there, and this part of the story is about why having Enterprise support may be a good thing.
After an initial patch set against mainline was ready, parts of it began to get merged into mainline – and while the PTI patch set is upstream by now, some of the Spectre patches are still under, well, lively discussion. So your level of protection against Spectre depends quite a bit on whether your favorite Linux distributor does have a team of kernel experts that maintain their own kernel (and decided to apply Spectre patches), or not. If your Linux distributor is just rebuilding and repackaging the code from upstream projects, you’re probably still waiting for Spectre protection to arrive on your machines.
The biggest effort so far, however, has been in backporting this patch set to all kernel versions we have under support. And that’s 10+ kernel versions times 10,000 lines worth of patches.
It may seem like a trivial detail, but if you updated your SUSE machines, did you notice that your third party video drivers, and storage drivers, etc, just kept working? It’s worth pointing out that this is because our engineers managed to do these backports in a way that retains the entire kernel ABI – allowing third party kernel modules to continue to work without waiting for the third party vendor scrambling to rebuild their drivers for all the Linux versions they’re supporting. And this is because, after many years of working with Enterprise customers, they understand the importance of such “small” details, and have the expertise to do these things.
Once our kernel teams had completed a backport, QA Engineers would pounce on the kernel and hammer on it, in order to root out any regressions and ensure that the overall systems continued to work. Our openQA test automation ran several suites of kernel tests on each candidate kernel, each run comprising some 4,000 individual test cases, on several different hardware platforms, on bare metal as well as a guest under KVM and XEN. Independent of this, the SUSE Labs performed 72-hour stress tests on each kernel, exercising all of the modified code paths heavily, and performed a battery of performance tests.
Yes, we did find several regressions, which the kernel developers duly fixed.
And at the end of the day, we made it! When Meltdown and Spectre were disclosed, we were ready to release updates for the most widely used code streams, and were able to cover all other code streams under full support or LTSS within the next few days.
And even now, the story continues, as I write this. Where the first round of patches mainly focused on bringing up the defenses, we are now increasingly turning to fine-tuning the mitigating patches, improving their effectiveness, their ease of use, and reducing some of the performance impact of these changes.
Does everybody in the world need Enterprise support? No.
But if your business relies on the integrity and the protection of your Linux systems, you want a partner that has demonstrated the ability to effectively collaborate with the community in creating the necessary defenses, and to support the customer by delivering a comprehensive set of updates, in a timely manner.
Like SUSE Engineering.