Meltdown is one of the biggest and complex security vulnerabilities that happened recently and impacted almost everyone. I am a SUSE live patching engineer and wanted to share with you how unique fixing this vulnerability was in terms of scope and complexity.
My goal was to see if I could also create a live patch for Meltdown. My journey turned out to be an exhaustive research project on kernel internals, which I felt would be great to share with everyone.
So, I have put together a four-part blog, so you can consume the technical details in bite-sizes. I will submit the four-parts on daily basis.
Part 1: Key technical obstacles for live patching
Part 2: Virtual address mappings and the Meltdown vulnerability. Patching kGraft itself!
Part 3: Changes needed for Translation Lookaside Buffer (TLB) flushing primitives
Part 4: Conclusion
When I first looked at the Meltdown vulnerability, it was immediately clear to me that it would be extremely hard, if not impossible, to build a live patch no matter what technique is used.
The most prevalent obstacles were technical in nature:
- There are only very few places in the kernel which aren’t covered by its Function Tracer (ftrace), the mechanism kGraft relies on for doing the redirections into replacement code. Unfortunately, the entry code (more on that later), touched heavily by the mitigating Kernel Page Table Isolation (KPTI) patch set, is among them.
- kGraft’s consistency model is task based: upon application of a new live patch, kGraft will start a transitioning period where it switches each task (think “thread”) independently to the new implementation. A task is considered safe for switching at those points where it isn’t executing in any of the to be replaced code. This consistency guarantee is sufficient in almost all cases of practical interest as semantic changes, if any, are usually localized. Not so with KPTI cutting down the address mappings: these can be shared among different threads of the same process, for example.
For the first obstacle, let me briefly introduce what entry code is. In Linux, the world is divided into privilege limited user space and fully privileged kernel space. A CPU logical core is always executing either kernel code with full privileges or privilege restricted user space code. Program code loaded by users runs in user mode and is restricted to a fairly limited set of operations like integer and floating point computations, alteration of control flow and accesses to the process’ virtual memory. Whenever it wants to do anything more sophisticated, help from the kernel is required and the program must issue a special ‘syscall’ instruction telling the CPU to switch into kernel mode, thus drop all confinements and redirect execution to a fixed address as told as part of the kernel’s boot-time setup. The sequence of instructions found at that address is what is commonly called “entry code”. Coded in assembly language, its main job is to setup a basic execution environment for the later stages written in C and to issue some function call into those as appropriate. Once the called function returns, the entry code prepares to resume execution of the user space program. Finally, interrupts and exceptions like page faults form another set of architecturally defined events causing execution to enter the kernel. Similar to the syscall case, a target address to some entry code location is associated with each of those. They are organized within the “Interrupt Descriptor Table” or “IDT” for short.
With the entry code not being organized into conventional functions, it is apparent that kGraft is not able to replace it with a live patch since it itself relies on the Linux kernel’s Function Tracer (ftrace). However, I wondered if it would perhaps be possible to still achieve something equivalent by installing a new set of entry code addresses at the running CPU during operation?
My team’s feedback on that admittedly half-baked idea revealed quite a number of open questions.
- How to deal with live patch module removal. Even if the original IDT had been restored at removal, nothing would prevent the CPU from execution of the about to be removed entry code replacement.
- Consistency vs changing semantics of the address mappings.
- To keep the performance impact somewhere near acceptable bounds, the CPU’s Process Context IDentifiers (PCID) feature would have to be enabled on a live system. Likewise, global pages (PGE) would have to be turned off.
Yet they encouraged me to jump into the adventure and see how far it would be possible to get. It should be noted at this point, that this was never meant for production release since the result is just too complex and intrusive to be suitable for production deployment.